re.findall()
Python re.findall() Method Explained
The re.findall() method returns all non-overlapping matches of a pattern in a string as a list of strings or tuples.
Syntax
python
re.findall(pattern, string, flags=0)
Key Characteristics:
- Returns all matches as a list
- Returns empty list if no matches found
- For patterns with groups, returns list of tuples
- Finds all non-overlapping matches
Example 1: Extracting All Numbers from Text
python
import re
text = "I bought 5 apples for $3.50, 2 bananas for $1.25, and 10 oranges for $7.80."
result = re.findall(r"\d{3}", string)
print(result)
# Find all numbers (integers and decimals)
numbers = re.findall(r'\d+\.?\d*', text)
print("All numbers found:", numbers)
# Find all prices (numbers with $ sign)
prices = re.findall(r'\$\d+\.?\d*', text)
print("Prices found:", prices)
# Find only whole numbers
whole_numbers = re.findall(r'\b\d+\b', text)
print("Whole numbers:", whole_numbers)
Let me explain this regex pattern in simple terms:
re.findall(r'\d+\.?\d*', text)
This pattern finds all numbers in a text, including:
Whole numbers (like 5, 100, 42)
Decimal numbers (like 3.14, 0.5, 99.99)
Breaking it down:
\d+
\d = any digit (0-9)
+ = one or more times
Meaning: "Find one or more digits" (the whole number part)
\.?
\. = a literal dot (the decimal point)
? = zero or one time (optional)
Meaning: "Maybe find a decimal point, if it exists"
\d*
\d = any digit (0-9)
* = zero or more times
Meaning: "Find zero or more digits" (the decimal part)
What it matches:
✅ Whole numbers: 123, 7, 0
✅ Decimal numbers: 3.14, 0.5, 99.99
✅ Numbers with decimal point but no decimals: 100. (though unusual)
What it doesn't match:
❌ Negative numbers: -5 (no minus sign support)
❌ Numbers with commas: 1,000
❌ Scientific notation: 1.5e10
❌ Currency symbols: $50
Examples:
python
import re
text = "I have 5 apples, 3.14 pi, temperature 98.6°, and 1000 points."
numbers = re.findall(r'\d+\.?\d*', text)
print(numbers) # Output: ['5', '3.14', '98.6', '1000']
Simple analogy:
Think of it as a pattern that finds:
Some digits + maybe a dot + maybe some more digits
So it catches both:
123 (digits + no dot + no digits)
123.45 (digits + dot + digits)
It's like a net that catches all the numbers floating in your text! 🎣
Output:
text
All numbers found: ['5', '3.50', '2', '1.25', '10', '7.80'] Prices found: ['$3.50', '$1.25', '$7.80'] Whole numbers: ['5', '2', '10']
Example 2: Extracting Email Addresses
python
import re
text = """
Contact us at: support@company.com, sales@example.org
or info@sub.domain.co.uk. For emergencies: emergency@company.com.
Invalid emails: user@com, @domain.com, user@.com
"""
# Extract all valid email addresses
emails = re.findall(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}', text)
print("Email addresses found:")
for email in emails:
print(f" - {email}")
# Extract usernames and domains separately using groups
email_parts = re.findall(r'([a-zA-Z0-9._%+-]+)@([a-zA-Z0-9.-]+\.[a-zA-Z]{2,})', text)
print("\nEmail parts (username@domain):")
for username, domain in email_parts:
print(f" - Username: {username}, Domain: {domain}")
Output:
text
Email addresses found: - support@company.com - sales@example.org - info@sub.domain.co.uk - emergency@company.com Email parts (username@domain): - Username: support, Domain: company.com - Username: sales, Domain: example.org - Username: info, Domain: sub.domain.co.uk - Username: emergency, Domain: company.com
Example 3: Parsing HTML-like Content
python
import re
html_content = """
<div class="product">
<h3>Laptop</h3>
<p class="price">$999.99</p>
<p class="rating">4.5 stars</p>
</div>
<div class="product">
<h3>Smartphone</h3>
<p class="price">$599.50</p>
<p class="rating">4.2 stars</p>
</div>
<div class="product">
<h3>Tablet</h3>
<p class="price">$399.00</p>
<p class="rating">4.7 stars</p>
</div>
"""
# Extract all product names (text between <h3> tags)
product_names = re.findall(r'<h3>(.*?)</h3>', html_content)
print("Product names:", product_names)
# Extract all prices
prices = re.findall(r'<p class="price">\$(.*?)</p>', html_content)
print("Prices: $", prices)
# Extract all ratings
ratings = re.findall(r'<p class="rating">(.*?) stars</p>', html_content)
print("Ratings:", ratings)
# Extract complete product info using multiple groups
product_info = re.findall(r'<h3>(.*?)</h3>.*?<p class="price">\$(.*?)</p>.*?<p class="rating">(.*?) stars</p>',
html_content, re.DOTALL)
print("\nComplete product info:")
for name, price, rating in product_info:
print(f" - {name}: ${price}, Rating: {rating}/5")
Output:
text
Product names: ['Laptop', 'Smartphone', 'Tablet'] Prices: $ ['999.99', '599.50', '399.00'] Ratings: ['4.5', '4.2', '4.7'] Complete product info: - Laptop: $999.99, Rating: 4.5/5 - Smartphone: $599.50, Rating: 4.2/5 - Tablet: $399.00, Rating: 4.7/5
Key Points to Remember:
- Returns list: Always returns a list (empty if no matches)
- Groups behavior:
- No groups → list of strings
- One group → list of strings
- Multiple groups → list of tuples
- Non-overlapping: Finds all matches that don’t overlap
- Case sensitivity: Use
re.IGNORECASEflag for case-insensitive matching - Multiline matching: Use
re.MULTILINEflag for ^ and $ to match line boundaries
python
# Example with flags text = "Apple apple APPLE" matches = re.findall(r'apple', text, re.IGNORECASE) print(matches) # Output: ['Apple', 'apple', 'APPLE']