re module

The re module is Python’s built-in module for regular expressions (regex). It provides functions and methods to work with strings using pattern matching, allowing you to search, extract, replace, and split text based on complex patterns.

Key Functions in the re Module

1. Searching and Matching

python

import re

text = "The quick brown fox jumps over the lazy dog"

# re.search() - finds first occurrence anywhere in string
result = re.search(r"fox", text)
print(result.group())  # Output: fox

# re.match() - checks only at the beginning of string
result = re.match(r"The", text)
print(result.group())  # Output: The

2. Finding All Matches

python

# re.findall() - returns all matches as a list
text = "Email me at john@example.com or jane@test.org"
emails = re.findall(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', text)
print(emails)  # Output: ['john@example.com', 'jane@test.org']

3. Splitting Strings

python

# re.split() - split by pattern
text = "apple,banana;cherry:date"
result = re.split(r'[,;:]', text)
print(result)  # Output: ['apple', 'banana', 'cherry', 'date']

4. Replacing Text

python

# re.sub() - replace patterns
text = "My number is 123-456-7890"
masked = re.sub(r'\d{3}-\d{3}-\d{4}', 'XXX-XXX-XXXX', text)
print(masked)  # Output: "My number is XXX-XXX-XXXX"

Common Regex Patterns

python

import re

# Match digits
re.findall(r'\d+', "Price: $100, $200")  # ['100', '200']

# Match words
re.findall(r'\w+', "Hello, world!")  # ['Hello', 'world']

# Email validation
email_pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
is_valid = re.match(email_pattern, "test@example.com") is not None

# Phone number extraction
text = "Call 555-1234 or 555-5678"
phones = re.findall(r'\d{3}-\d{4}', text)  # ['555-1234', '555-5678']

Using Compiled Patterns

For better performance with repeated use:

python

import re

# Compile pattern once
pattern = re.compile(r'\d{3}-\d{3}-\d{4}')

# Use compiled pattern multiple times
texts = [
    "Call 123-456-7890",
    "Contact: 987-654-3210",
    "No number here"
]

for text in texts:
    match = pattern.search(text)
    if match:
        print(f"Found: {match.group()}")

Flags for Pattern Matching

python

import re

text = "Hello\nWorld\nPython"

# re.IGNORECASE (or re.I) - case insensitive
re.findall(r'hello', text, re.IGNORECASE)  # ['Hello']

# re.MULTILINE (or re.M) - ^ and $ match start/end of lines
re.findall(r'^[A-Z]', text, re.MULTILINE)  # ['H', 'W', 'P']

# re.DOTALL (or re.S) - . matches newline too
re.findall(r'H.*P', text, re.DOTALL)  # ['Hello\nWorld\nP']

Common Use Cases

  1. Data Validation

python

def validate_password(password):
    # At least 8 chars, one uppercase, one lowercase, one digit
    pattern = r'^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}$'
    return re.match(pattern, password) is not None
  1. Data Extraction

python

def extract_dates(text):
    # Match dates in format YYYY-MM-DD
    return re.findall(r'\d{4}-\d{2}-\d{2}', text)
  1. Text Cleaning

python

def clean_text(text):
    # Remove extra whitespace and special characters
    text = re.sub(r'[^\w\s]', '', text)  # Remove punctuation
    text = re.sub(r'\s+', ' ', text)      # Normalize whitespace
    return text.strip()

The re module is essential for text processing, data validation, web scraping, and many other tasks where pattern matching in strings is required.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *