re.findall()

Python re.findall() Method Explained

The re.findall() method returns all non-overlapping matches of a pattern in a string as a list of strings or tuples.

Syntax

python

re.findall(pattern, string, flags=0)

Key Characteristics:

  • Returns all matches as a list
  • Returns empty list if no matches found
  • For patterns with groups, returns list of tuples
  • Finds all non-overlapping matches

Example 1: Extracting All Numbers from Text

python

import re

text = "I bought 5 apples for $3.50, 2 bananas for $1.25, and 10 oranges for $7.80."

result = re.findall(r"\d{3}", string)
print(result)

# Find all numbers (integers and decimals)
numbers = re.findall(r'\d+\.?\d*', text)
print("All numbers found:", numbers)

# Find all prices (numbers with $ sign)
prices = re.findall(r'\$\d+\.?\d*', text)
print("Prices found:", prices)

# Find only whole numbers
whole_numbers = re.findall(r'\b\d+\b', text)
print("Whole numbers:", whole_numbers)
Let me explain this regex pattern in simple terms:

re.findall(r'\d+\.?\d*', text)
This pattern finds all numbers in a text, including:

Whole numbers (like 5, 100, 42)

Decimal numbers (like 3.14, 0.5, 99.99)

Breaking it down:
\d+
\d = any digit (0-9)

+ = one or more times

Meaning: "Find one or more digits" (the whole number part)

\.?
\. = a literal dot (the decimal point)

? = zero or one time (optional)

Meaning: "Maybe find a decimal point, if it exists"

\d*
\d = any digit (0-9)

* = zero or more times

Meaning: "Find zero or more digits" (the decimal part)

What it matches:
✅ Whole numbers: 123, 7, 0
✅ Decimal numbers: 3.14, 0.5, 99.99
✅ Numbers with decimal point but no decimals: 100. (though unusual)

What it doesn't match:
❌ Negative numbers: -5 (no minus sign support)
❌ Numbers with commas: 1,000
❌ Scientific notation: 1.5e10
❌ Currency symbols: $50

Examples:
python
import re

text = "I have 5 apples, 3.14 pi, temperature 98.6°, and 1000 points."
numbers = re.findall(r'\d+\.?\d*', text)

print(numbers)  # Output: ['5', '3.14', '98.6', '1000']
Simple analogy:
Think of it as a pattern that finds:

Some digits + maybe a dot + maybe some more digits

So it catches both:

123 (digits + no dot + no digits)

123.45 (digits + dot + digits)

It's like a net that catches all the numbers floating in your text! 🎣

Output:

text

All numbers found: ['5', '3.50', '2', '1.25', '10', '7.80']
Prices found: ['$3.50', '$1.25', '$7.80']
Whole numbers: ['5', '2', '10']

Example 2: Extracting Email Addresses

python

import re

text = """
Contact us at: support@company.com, sales@example.org 
or info@sub.domain.co.uk. For emergencies: emergency@company.com.
Invalid emails: user@com, @domain.com, user@.com
"""

# Extract all valid email addresses
emails = re.findall(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}', text)
print("Email addresses found:")
for email in emails:
    print(f"  - {email}")

# Extract usernames and domains separately using groups
email_parts = re.findall(r'([a-zA-Z0-9._%+-]+)@([a-zA-Z0-9.-]+\.[a-zA-Z]{2,})', text)
print("\nEmail parts (username@domain):")
for username, domain in email_parts:
    print(f"  - Username: {username}, Domain: {domain}")

Output:

text

Email addresses found:
  - support@company.com
  - sales@example.org
  - info@sub.domain.co.uk
  - emergency@company.com

Email parts (username@domain):
  - Username: support, Domain: company.com
  - Username: sales, Domain: example.org
  - Username: info, Domain: sub.domain.co.uk
  - Username: emergency, Domain: company.com

Example 3: Parsing HTML-like Content

python

import re

html_content = """
<div class="product">
    <h3>Laptop</h3>
    <p class="price">$999.99</p>
    <p class="rating">4.5 stars</p>
</div>
<div class="product">
    <h3>Smartphone</h3>
    <p class="price">$599.50</p>
    <p class="rating">4.2 stars</p>
</div>
<div class="product">
    <h3>Tablet</h3>
    <p class="price">$399.00</p>
    <p class="rating">4.7 stars</p>
</div>
"""

# Extract all product names (text between <h3> tags)
product_names = re.findall(r'<h3>(.*?)</h3>', html_content)
print("Product names:", product_names)

# Extract all prices
prices = re.findall(r'<p class="price">\$(.*?)</p>', html_content)
print("Prices: $", prices)

# Extract all ratings
ratings = re.findall(r'<p class="rating">(.*?) stars</p>', html_content)
print("Ratings:", ratings)

# Extract complete product info using multiple groups
product_info = re.findall(r'<h3>(.*?)</h3>.*?<p class="price">\$(.*?)</p>.*?<p class="rating">(.*?) stars</p>', 
                         html_content, re.DOTALL)
print("\nComplete product info:")
for name, price, rating in product_info:
    print(f"  - {name}: ${price}, Rating: {rating}/5")

Output:

text

Product names: ['Laptop', 'Smartphone', 'Tablet']
Prices: $ ['999.99', '599.50', '399.00']
Ratings: ['4.5', '4.2', '4.7']

Complete product info:
  - Laptop: $999.99, Rating: 4.5/5
  - Smartphone: $599.50, Rating: 4.2/5
  - Tablet: $399.00, Rating: 4.7/5

Key Points to Remember:

  1. Returns list: Always returns a list (empty if no matches)
  2. Groups behavior:
    • No groups → list of strings
    • One group → list of strings
    • Multiple groups → list of tuples
  3. Non-overlapping: Finds all matches that don’t overlap
  4. Case sensitivity: Use re.IGNORECASE flag for case-insensitive matching
  5. Multiline matching: Use re.MULTILINE flag for ^ and $ to match line boundaries

python

# Example with flags
text = "Apple apple APPLE"
matches = re.findall(r'apple', text, re.IGNORECASE)
print(matches)  # Output: ['Apple', 'apple', 'APPLE']

Similar Posts

  • Combined Character Classes

    Combined Character Classes Explained with Examples 1. [a-zA-Z0-9_] – Word characters (same as \w) Description: Matches any letter (lowercase or uppercase), any digit, or underscore Example 1: Extract all word characters from text python import re text = “User_name123! Email: test@example.com” result = re.findall(r'[a-zA-Z0-9_]’, text) print(result) # [‘U’, ‘s’, ‘e’, ‘r’, ‘_’, ‘n’, ‘a’, ‘m’, ‘e’, ‘1’, ‘2’,…

  • Formatting Date and Time in Python

    Formatting Date and Time in Python Python provides powerful formatting options for dates and times using the strftime() method and parsing using strptime() method. 1. Basic Formatting with strftime() Date Formatting python from datetime import date, datetime # Current date today = date.today() print(“Date Formatting Examples:”) print(f”Default: {today}”) print(f”YYYY-MM-DD: {today.strftime(‘%Y-%m-%d’)}”) print(f”MM/DD/YYYY: {today.strftime(‘%m/%d/%Y’)}”) print(f”DD-MM-YYYY: {today.strftime(‘%d-%m-%Y’)}”) print(f”Full month: {today.strftime(‘%B %d, %Y’)}”) print(f”Abbr…

  • Case Conversion Methods in Python

    Case Conversion Methods in Python (Syntax + Examples) Python provides several built-in string methods to convert text between different cases (uppercase, lowercase, title case, etc.). Below are the key methods with syntax and examples: 1. upper() – Convert to Uppercase Purpose: Converts all characters in a string to uppercase.Syntax: python string.upper() Examples: python text = “Hello, World!”…

  • Python Primitive Data Types & Functions: Explained with Examples

    1. Primitive Data Types Primitive data types are the most basic building blocks in Python. They represent simple, single values and are immutable (cannot be modified after creation). Key Primitive Data Types Type Description Example int Whole numbers (positive/negative) x = 10 float Decimal numbers y = 3.14 bool Boolean (True/False) is_valid = True str…

  • String Validation Methods

    Complete List of Python String Validation Methods Python provides several built-in string methods to check if a string meets certain criteria. These methods return True or False and are useful for input validation, data cleaning, and text processing. 1. Case Checking Methods Method Description Example isupper() Checks if all characters are uppercase “HELLO”.isupper() → True islower() Checks if all…

Leave a Reply

Your email address will not be published. Required fields are marked *