re.findall()

Python re.findall() Method Explained

The re.findall() method returns all non-overlapping matches of a pattern in a string as a list of strings or tuples.

Syntax

python

re.findall(pattern, string, flags=0)

Key Characteristics:

  • Returns all matches as a list
  • Returns empty list if no matches found
  • For patterns with groups, returns list of tuples
  • Finds all non-overlapping matches

Example 1: Extracting All Numbers from Text

python

import re

text = "I bought 5 apples for $3.50, 2 bananas for $1.25, and 10 oranges for $7.80."

result = re.findall(r"\d{3}", string)
print(result)

# Find all numbers (integers and decimals)
numbers = re.findall(r'\d+\.?\d*', text)
print("All numbers found:", numbers)

# Find all prices (numbers with $ sign)
prices = re.findall(r'\$\d+\.?\d*', text)
print("Prices found:", prices)

# Find only whole numbers
whole_numbers = re.findall(r'\b\d+\b', text)
print("Whole numbers:", whole_numbers)
Let me explain this regex pattern in simple terms:

re.findall(r'\d+\.?\d*', text)
This pattern finds all numbers in a text, including:

Whole numbers (like 5, 100, 42)

Decimal numbers (like 3.14, 0.5, 99.99)

Breaking it down:
\d+
\d = any digit (0-9)

+ = one or more times

Meaning: "Find one or more digits" (the whole number part)

\.?
\. = a literal dot (the decimal point)

? = zero or one time (optional)

Meaning: "Maybe find a decimal point, if it exists"

\d*
\d = any digit (0-9)

* = zero or more times

Meaning: "Find zero or more digits" (the decimal part)

What it matches:
✅ Whole numbers: 123, 7, 0
✅ Decimal numbers: 3.14, 0.5, 99.99
✅ Numbers with decimal point but no decimals: 100. (though unusual)

What it doesn't match:
❌ Negative numbers: -5 (no minus sign support)
❌ Numbers with commas: 1,000
❌ Scientific notation: 1.5e10
❌ Currency symbols: $50

Examples:
python
import re

text = "I have 5 apples, 3.14 pi, temperature 98.6°, and 1000 points."
numbers = re.findall(r'\d+\.?\d*', text)

print(numbers)  # Output: ['5', '3.14', '98.6', '1000']
Simple analogy:
Think of it as a pattern that finds:

Some digits + maybe a dot + maybe some more digits

So it catches both:

123 (digits + no dot + no digits)

123.45 (digits + dot + digits)

It's like a net that catches all the numbers floating in your text! 🎣

Output:

text

All numbers found: ['5', '3.50', '2', '1.25', '10', '7.80']
Prices found: ['$3.50', '$1.25', '$7.80']
Whole numbers: ['5', '2', '10']

Example 2: Extracting Email Addresses

python

import re

text = """
Contact us at: support@company.com, sales@example.org 
or info@sub.domain.co.uk. For emergencies: emergency@company.com.
Invalid emails: user@com, @domain.com, user@.com
"""

# Extract all valid email addresses
emails = re.findall(r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}', text)
print("Email addresses found:")
for email in emails:
    print(f"  - {email}")

# Extract usernames and domains separately using groups
email_parts = re.findall(r'([a-zA-Z0-9._%+-]+)@([a-zA-Z0-9.-]+\.[a-zA-Z]{2,})', text)
print("\nEmail parts (username@domain):")
for username, domain in email_parts:
    print(f"  - Username: {username}, Domain: {domain}")

Output:

text

Email addresses found:
  - support@company.com
  - sales@example.org
  - info@sub.domain.co.uk
  - emergency@company.com

Email parts (username@domain):
  - Username: support, Domain: company.com
  - Username: sales, Domain: example.org
  - Username: info, Domain: sub.domain.co.uk
  - Username: emergency, Domain: company.com

Example 3: Parsing HTML-like Content

python

import re

html_content = """
<div class="product">
    <h3>Laptop</h3>
    <p class="price">$999.99</p>
    <p class="rating">4.5 stars</p>
</div>
<div class="product">
    <h3>Smartphone</h3>
    <p class="price">$599.50</p>
    <p class="rating">4.2 stars</p>
</div>
<div class="product">
    <h3>Tablet</h3>
    <p class="price">$399.00</p>
    <p class="rating">4.7 stars</p>
</div>
"""

# Extract all product names (text between <h3> tags)
product_names = re.findall(r'<h3>(.*?)</h3>', html_content)
print("Product names:", product_names)

# Extract all prices
prices = re.findall(r'<p class="price">\$(.*?)</p>', html_content)
print("Prices: $", prices)

# Extract all ratings
ratings = re.findall(r'<p class="rating">(.*?) stars</p>', html_content)
print("Ratings:", ratings)

# Extract complete product info using multiple groups
product_info = re.findall(r'<h3>(.*?)</h3>.*?<p class="price">\$(.*?)</p>.*?<p class="rating">(.*?) stars</p>', 
                         html_content, re.DOTALL)
print("\nComplete product info:")
for name, price, rating in product_info:
    print(f"  - {name}: ${price}, Rating: {rating}/5")

Output:

text

Product names: ['Laptop', 'Smartphone', 'Tablet']
Prices: $ ['999.99', '599.50', '399.00']
Ratings: ['4.5', '4.2', '4.7']

Complete product info:
  - Laptop: $999.99, Rating: 4.5/5
  - Smartphone: $599.50, Rating: 4.2/5
  - Tablet: $399.00, Rating: 4.7/5

Key Points to Remember:

  1. Returns list: Always returns a list (empty if no matches)
  2. Groups behavior:
    • No groups → list of strings
    • One group → list of strings
    • Multiple groups → list of tuples
  3. Non-overlapping: Finds all matches that don’t overlap
  4. Case sensitivity: Use re.IGNORECASE flag for case-insensitive matching
  5. Multiline matching: Use re.MULTILINE flag for ^ and $ to match line boundaries

python

# Example with flags
text = "Apple apple APPLE"
matches = re.findall(r'apple', text, re.IGNORECASE)
print(matches)  # Output: ['Apple', 'apple', 'APPLE']

Similar Posts

  • group() and groups()

    Python re group() and groups() Methods Explained The group() and groups() methods are used with match objects to extract captured groups from regex patterns. They work on the result of re.search(), re.match(), or re.finditer(). group() Method groups() Method Example 1: Basic Group Extraction python import retext = “John Doe, age 30, email: john.doe@email.com”# Pattern with multiple capture groupspattern = r'(\w+)\s+(\w+),\s+age\s+(\d+),\s+email:\s+([\w.]+@[\w.]+)’///The Pattern: r'(\w+)\s+(\w+),\s+age\s+(\d+),\s+email:\s+([\w.]+@[\w.]+)’Breakdown by Capture…

  •  Duck Typing

    Python, Polymorphism allows us to use a single interface (like a function or a method) for objects of different types. Duck Typing is a specific style of polymorphism common in dynamically-typed languages like Python. What is Duck Typing? 🦆 The name comes from the saying: “If it walks like a duck and it quacks like…

  • Static Methods

    The primary use of a static method in Python classes is to define a function that logically belongs to the class but doesn’t need access to the instance’s data (like self) or the class’s state (like cls). They are essentially regular functions that are grouped within a class namespace. Key Characteristics and Use Cases General…

  • recursive function

    A recursive function is a function that calls itself to solve a problem. It works by breaking down a larger problem into smaller, identical subproblems until it reaches a base case, which is a condition that stops the recursion and provides a solution to the simplest subproblem. The two main components of a recursive function…

  • Mathematical Functions

    1. abs() Syntax: abs(x)Description: Returns the absolute value (non-negative value) of a number. Examples: python # 1. Basic negative numbers print(abs(-10)) # 10 # 2. Positive numbers remain unchanged print(abs(5.5)) # 5.5 # 3. Floating point negative numbers print(abs(-3.14)) # 3.14 # 4. Zero remains zero print(abs(0)) # 0 # 5. Complex numbers (returns magnitude) print(abs(3 +…

  • Indexing and Slicing in Python Lists Read

    Indexing and Slicing in Python Lists Read Indexing and slicing are fundamental operations to access and extract elements from a list in Python. 1. Indexing (Accessing Single Elements) Example 1: Basic Indexing python fruits = [“apple”, “banana”, “cherry”, “date”, “fig”] # Positive indexing print(fruits[0]) # “apple” (1st element) print(fruits[2]) # “cherry” (3rd element) # Negative indexing print(fruits[-1]) # “fig”…

Leave a Reply

Your email address will not be published. Required fields are marked *