re.findall()

Python re.findall() Method Explained

The re.findall() method returns all non-overlapping matches of a pattern in a string as a list of strings or tuples.

Syntax

python

re.findall(pattern, string, flags=0)

Key Characteristics:

  • Returns all matches as a list
  • Returns empty list if no matches found
  • For patterns with groups, returns list of tuples
  • Finds all non-overlapping matches

Example 1: Extracting All Numbers from Text

python

import re

text = "I bought 5 apples for $3.50, 2 bananas for $1.25, and 10 oranges for $7.80."

result = re.findall(r"\d{3}", string)
print(result)

# Find all numbers (integers and decimals)
numbers = re.findall(r'\d+\.?\d*', text)
print("All numbers found:", numbers)

# Find all prices (numbers with $ sign)
prices = re.findall(r'\$\d+\.?\d*', text)
print("Prices found:", prices)

# Find only whole numbers
whole_numbers = re.findall(r'\b\d+\b', text)
print("Whole numbers:", whole_numbers)
Let me explain this regex pattern in simple terms:

re.findall(r'\d+\.?\d*', text)
This pattern finds all numbers in a text, including:

Whole numbers (like 5, 100, 42)

Decimal numbers (like 3.14, 0.5, 99.99)

Breaking it down:
\d+
\d = any digit (0-9)

+ = one or more times

Meaning: "Find one or more digits" (the whole number part)

\.?
\. = a literal dot (the decimal point)

? = zero or one time (optional)

Meaning: "Maybe find a decimal point, if it exists"

\d*
\d = any digit (0-9)

* = zero or more times

Meaning: "Find zero or more digits" (the decimal part)

What it matches:
✅ Whole numbers: 123, 7, 0
✅ Decimal numbers: 3.14, 0.5, 99.99
✅ Numbers with decimal point but no decimals: 100. (though unusual)

What it doesn't match:
❌ Negative numbers: -5 (no minus sign support)
❌ Numbers with commas: 1,000
❌ Scientific notation: 1.5e10
❌ Currency symbols: $50

Examples:
python
import re

text = "I have 5 apples, 3.14 pi, temperature 98.6°, and 1000 points."
numbers = re.findall(r'\d+\.?\d*', text)

print(numbers)  # Output: ['5', '3.14', '98.6', '1000']
Simple analogy:
Think of it as a pattern that finds:

Some digits + maybe a dot + maybe some more digits

So it catches both:

123 (digits + no dot + no digits)

123.45 (digits + dot + digits)

It's like a net that catches all the numbers floating in your text! 🎣

Output:

text

All numbers found: ['5', '3.50', '2', '1.25', '10', '7.80']
Prices found: ['$3.50', '$1.25', '$7.80']
Whole numbers: ['5', '2', '10']

Example 2: Extracting Email Addresses

python

import re

text = """
Contact us at: support@company.com, sales@example.org 
or info@sub.domain.co.uk. For emergencies: emergency@company.com.
Invalid emails: user@com, @domain.com, user@.com
"""

# Extract all valid email addresses
emails = re.findall(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}', text)
print("Email addresses found:")
for email in emails:
    print(f"  - {email}")

# Extract usernames and domains separately using groups
email_parts = re.findall(r'([a-zA-Z0-9._%+-]+)@([a-zA-Z0-9.-]+\.[a-zA-Z]{2,})', text)
print("\nEmail parts (username@domain):")
for username, domain in email_parts:
    print(f"  - Username: {username}, Domain: {domain}")

Output:

text

Email addresses found:
  - support@company.com
  - sales@example.org
  - info@sub.domain.co.uk
  - emergency@company.com

Email parts (username@domain):
  - Username: support, Domain: company.com
  - Username: sales, Domain: example.org
  - Username: info, Domain: sub.domain.co.uk
  - Username: emergency, Domain: company.com

Example 3: Parsing HTML-like Content

python

import re

html_content = """
<div class="product">
    <h3>Laptop</h3>
    <p class="price">$999.99</p>
    <p class="rating">4.5 stars</p>
</div>
<div class="product">
    <h3>Smartphone</h3>
    <p class="price">$599.50</p>
    <p class="rating">4.2 stars</p>
</div>
<div class="product">
    <h3>Tablet</h3>
    <p class="price">$399.00</p>
    <p class="rating">4.7 stars</p>
</div>
"""

# Extract all product names (text between <h3> tags)
product_names = re.findall(r'<h3>(.*?)</h3>', html_content)
print("Product names:", product_names)

# Extract all prices
prices = re.findall(r'<p class="price">\$(.*?)</p>', html_content)
print("Prices: $", prices)

# Extract all ratings
ratings = re.findall(r'<p class="rating">(.*?) stars</p>', html_content)
print("Ratings:", ratings)

# Extract complete product info using multiple groups
product_info = re.findall(r'<h3>(.*?)</h3>.*?<p class="price">\$(.*?)</p>.*?<p class="rating">(.*?) stars</p>', 
                         html_content, re.DOTALL)
print("\nComplete product info:")
for name, price, rating in product_info:
    print(f"  - {name}: ${price}, Rating: {rating}/5")

Output:

text

Product names: ['Laptop', 'Smartphone', 'Tablet']
Prices: $ ['999.99', '599.50', '399.00']
Ratings: ['4.5', '4.2', '4.7']

Complete product info:
  - Laptop: $999.99, Rating: 4.5/5
  - Smartphone: $599.50, Rating: 4.2/5
  - Tablet: $399.00, Rating: 4.7/5

Key Points to Remember:

  1. Returns list: Always returns a list (empty if no matches)
  2. Groups behavior:
    • No groups → list of strings
    • One group → list of strings
    • Multiple groups → list of tuples
  3. Non-overlapping: Finds all matches that don’t overlap
  4. Case sensitivity: Use re.IGNORECASE flag for case-insensitive matching
  5. Multiline matching: Use re.MULTILINE flag for ^ and $ to match line boundaries

python

# Example with flags
text = "Apple apple APPLE"
matches = re.findall(r'apple', text, re.IGNORECASE)
print(matches)  # Output: ['Apple', 'apple', 'APPLE']

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *