re.findall()

Python re.findall() Method Explained

The re.findall() method returns all non-overlapping matches of a pattern in a string as a list of strings or tuples.

Syntax

python

re.findall(pattern, string, flags=0)

Key Characteristics:

  • Returns all matches as a list
  • Returns empty list if no matches found
  • For patterns with groups, returns list of tuples
  • Finds all non-overlapping matches

Example 1: Extracting All Numbers from Text

python

import re

text = "I bought 5 apples for $3.50, 2 bananas for $1.25, and 10 oranges for $7.80."

result = re.findall(r"\d{3}", string)
print(result)

# Find all numbers (integers and decimals)
numbers = re.findall(r'\d+\.?\d*', text)
print("All numbers found:", numbers)

# Find all prices (numbers with $ sign)
prices = re.findall(r'\$\d+\.?\d*', text)
print("Prices found:", prices)

# Find only whole numbers
whole_numbers = re.findall(r'\b\d+\b', text)
print("Whole numbers:", whole_numbers)
Let me explain this regex pattern in simple terms:

re.findall(r'\d+\.?\d*', text)
This pattern finds all numbers in a text, including:

Whole numbers (like 5, 100, 42)

Decimal numbers (like 3.14, 0.5, 99.99)

Breaking it down:
\d+
\d = any digit (0-9)

+ = one or more times

Meaning: "Find one or more digits" (the whole number part)

\.?
\. = a literal dot (the decimal point)

? = zero or one time (optional)

Meaning: "Maybe find a decimal point, if it exists"

\d*
\d = any digit (0-9)

* = zero or more times

Meaning: "Find zero or more digits" (the decimal part)

What it matches:
✅ Whole numbers: 123, 7, 0
✅ Decimal numbers: 3.14, 0.5, 99.99
✅ Numbers with decimal point but no decimals: 100. (though unusual)

What it doesn't match:
❌ Negative numbers: -5 (no minus sign support)
❌ Numbers with commas: 1,000
❌ Scientific notation: 1.5e10
❌ Currency symbols: $50

Examples:
python
import re

text = "I have 5 apples, 3.14 pi, temperature 98.6°, and 1000 points."
numbers = re.findall(r'\d+\.?\d*', text)

print(numbers)  # Output: ['5', '3.14', '98.6', '1000']
Simple analogy:
Think of it as a pattern that finds:

Some digits + maybe a dot + maybe some more digits

So it catches both:

123 (digits + no dot + no digits)

123.45 (digits + dot + digits)

It's like a net that catches all the numbers floating in your text! 🎣

Output:

text

All numbers found: ['5', '3.50', '2', '1.25', '10', '7.80']
Prices found: ['$3.50', '$1.25', '$7.80']
Whole numbers: ['5', '2', '10']

Example 2: Extracting Email Addresses

python

import re

text = """
Contact us at: support@company.com, sales@example.org 
or info@sub.domain.co.uk. For emergencies: emergency@company.com.
Invalid emails: user@com, @domain.com, user@.com
"""

# Extract all valid email addresses
emails = re.findall(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}', text)
print("Email addresses found:")
for email in emails:
    print(f"  - {email}")

# Extract usernames and domains separately using groups
email_parts = re.findall(r'([a-zA-Z0-9._%+-]+)@([a-zA-Z0-9.-]+\.[a-zA-Z]{2,})', text)
print("\nEmail parts (username@domain):")
for username, domain in email_parts:
    print(f"  - Username: {username}, Domain: {domain}")

Output:

text

Email addresses found:
  - support@company.com
  - sales@example.org
  - info@sub.domain.co.uk
  - emergency@company.com

Email parts (username@domain):
  - Username: support, Domain: company.com
  - Username: sales, Domain: example.org
  - Username: info, Domain: sub.domain.co.uk
  - Username: emergency, Domain: company.com

Example 3: Parsing HTML-like Content

python

import re

html_content = """
<div class="product">
    <h3>Laptop</h3>
    <p class="price">$999.99</p>
    <p class="rating">4.5 stars</p>
</div>
<div class="product">
    <h3>Smartphone</h3>
    <p class="price">$599.50</p>
    <p class="rating">4.2 stars</p>
</div>
<div class="product">
    <h3>Tablet</h3>
    <p class="price">$399.00</p>
    <p class="rating">4.7 stars</p>
</div>
"""

# Extract all product names (text between <h3> tags)
product_names = re.findall(r'<h3>(.*?)</h3>', html_content)
print("Product names:", product_names)

# Extract all prices
prices = re.findall(r'<p class="price">\$(.*?)</p>', html_content)
print("Prices: $", prices)

# Extract all ratings
ratings = re.findall(r'<p class="rating">(.*?) stars</p>', html_content)
print("Ratings:", ratings)

# Extract complete product info using multiple groups
product_info = re.findall(r'<h3>(.*?)</h3>.*?<p class="price">\$(.*?)</p>.*?<p class="rating">(.*?) stars</p>', 
                         html_content, re.DOTALL)
print("\nComplete product info:")
for name, price, rating in product_info:
    print(f"  - {name}: ${price}, Rating: {rating}/5")

Output:

text

Product names: ['Laptop', 'Smartphone', 'Tablet']
Prices: $ ['999.99', '599.50', '399.00']
Ratings: ['4.5', '4.2', '4.7']

Complete product info:
  - Laptop: $999.99, Rating: 4.5/5
  - Smartphone: $599.50, Rating: 4.2/5
  - Tablet: $399.00, Rating: 4.7/5

Key Points to Remember:

  1. Returns list: Always returns a list (empty if no matches)
  2. Groups behavior:
    • No groups → list of strings
    • One group → list of strings
    • Multiple groups → list of tuples
  3. Non-overlapping: Finds all matches that don’t overlap
  4. Case sensitivity: Use re.IGNORECASE flag for case-insensitive matching
  5. Multiline matching: Use re.MULTILINE flag for ^ and $ to match line boundaries

python

# Example with flags
text = "Apple apple APPLE"
matches = re.findall(r'apple', text, re.IGNORECASE)
print(matches)  # Output: ['Apple', 'apple', 'APPLE']

Similar Posts

  • String Alignment and Padding in Python

    String Alignment and Padding in Python In Python, you can align and pad strings to make them visually consistent in output. The main methods used for this are: 1. str.ljust(width, fillchar) Left-aligns the string and fills remaining space with a specified character (default: space). Syntax: python string.ljust(width, fillchar=’ ‘) Example: python text = “Python” print(text.ljust(10)) #…

  • Date/Time Objects

    Creating and Manipulating Date/Time Objects in Python 1. Creating Date and Time Objects Creating Date Objects python from datetime import date, time, datetime # Create date objects date1 = date(2023, 12, 25) # Christmas 2023 date2 = date(2024, 1, 1) # New Year 2024 date3 = date(2023, 6, 15) # Random date print(“Date Objects:”) print(f”Christmas:…

  • Predefined Character Classes

    Predefined Character Classes Pattern Description Equivalent . Matches any character except newline \d Matches any digit [0-9] \D Matches any non-digit [^0-9] \w Matches any word character [a-zA-Z0-9_] \W Matches any non-word character [^a-zA-Z0-9_] \s Matches any whitespace character [ \t\n\r\f\v] \S Matches any non-whitespace character [^ \t\n\r\f\v] 1. Literal Character a Matches: The exact character…

  • Inheritance in OOP Python: Rectangle & Cuboid Example

    Rectangle Inheritance in OOP Python: Rectangle & Cuboid Example Inheritance in object-oriented programming (OOP) allows a new class (the child class) to inherit properties and methods from an existing class (the parent class). This is a powerful concept for code reusability ♻️ and establishing a logical “is-a” relationship between classes. For instance, a Cuboid is…

  • re module

    The re module is Python’s built-in module for regular expressions (regex). It provides functions and methods to work with strings using pattern matching, allowing you to search, extract, replace, and split text based on complex patterns. Key Functions in the re Module 1. Searching and Matching python import re text = “The quick brown fox jumps over the lazy dog” # re.search()…

Leave a Reply

Your email address will not be published. Required fields are marked *