re module

The re module is Python’s built-in module for regular expressions (regex). It provides functions and methods to work with strings using pattern matching, allowing you to search, extract, replace, and split text based on complex patterns.

Key Functions in the re Module

1. Searching and Matching

python

import re

text = "The quick brown fox jumps over the lazy dog"

# re.search() - finds first occurrence anywhere in string
result = re.search(r"fox", text)
print(result.group())  # Output: fox

# re.match() - checks only at the beginning of string
result = re.match(r"The", text)
print(result.group())  # Output: The

2. Finding All Matches

python

# re.findall() - returns all matches as a list
text = "Email me at john@example.com or jane@test.org"
emails = re.findall(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', text)
print(emails)  # Output: ['john@example.com', 'jane@test.org']

3. Splitting Strings

python

# re.split() - split by pattern
text = "apple,banana;cherry:date"
result = re.split(r'[,;:]', text)
print(result)  # Output: ['apple', 'banana', 'cherry', 'date']

4. Replacing Text

python

# re.sub() - replace patterns
text = "My number is 123-456-7890"
masked = re.sub(r'\d{3}-\d{3}-\d{4}', 'XXX-XXX-XXXX', text)
print(masked)  # Output: "My number is XXX-XXX-XXXX"

Common Regex Patterns

python

import re

# Match digits
re.findall(r'\d+', "Price: $100, $200")  # ['100', '200']

# Match words
re.findall(r'\w+', "Hello, world!")  # ['Hello', 'world']

# Email validation
email_pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
is_valid = re.match(email_pattern, "test@example.com") is not None

# Phone number extraction
text = "Call 555-1234 or 555-5678"
phones = re.findall(r'\d{3}-\d{4}', text)  # ['555-1234', '555-5678']

Using Compiled Patterns

For better performance with repeated use:

python

import re

# Compile pattern once
pattern = re.compile(r'\d{3}-\d{3}-\d{4}')

# Use compiled pattern multiple times
texts = [
    "Call 123-456-7890",
    "Contact: 987-654-3210",
    "No number here"
]

for text in texts:
    match = pattern.search(text)
    if match:
        print(f"Found: {match.group()}")

Flags for Pattern Matching

python

import re

text = "Hello\nWorld\nPython"

# re.IGNORECASE (or re.I) - case insensitive
re.findall(r'hello', text, re.IGNORECASE)  # ['Hello']

# re.MULTILINE (or re.M) - ^ and $ match start/end of lines
re.findall(r'^[A-Z]', text, re.MULTILINE)  # ['H', 'W', 'P']

# re.DOTALL (or re.S) - . matches newline too
re.findall(r'H.*P', text, re.DOTALL)  # ['Hello\nWorld\nP']

Common Use Cases

  1. Data Validation

python

def validate_password(password):
    # At least 8 chars, one uppercase, one lowercase, one digit
    pattern = r'^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}$'
    return re.match(pattern, password) is not None
  1. Data Extraction

python

def extract_dates(text):
    # Match dates in format YYYY-MM-DD
    return re.findall(r'\d{4}-\d{2}-\d{2}', text)
  1. Text Cleaning

python

def clean_text(text):
    # Remove extra whitespace and special characters
    text = re.sub(r'[^\w\s]', '', text)  # Remove punctuation
    text = re.sub(r'\s+', ' ', text)      # Normalize whitespace
    return text.strip()

The re module is essential for text processing, data validation, web scraping, and many other tasks where pattern matching in strings is required.

Similar Posts

  • Special Character Classes Explained with Examples

    Special Character Classes Explained with Examples 1. [\\\^\-\]] – Escaped special characters in brackets Description: Matches literal backslash, caret, hyphen, or closing bracket characters inside character classes Example 1: Matching literal special characters python import re text = “Special chars: \\ ^ – ] [” result = re.findall(r'[\\\^\-\]]’, text) print(result) # [‘\\’, ‘^’, ‘-‘, ‘]’] # Matches…

  • For loop 13 and 14th class

    The range() Function in Python The range() function is a built-in Python function that generates a sequence of numbers. It’s commonly used in for loops to iterate a specific number of times. Basic Syntax There are three ways to use range(): 1. range(stop) – One Parameter Form Generates numbers from 0 up to (but not including) the stop value. python for i in range(5):…

  • sqlite3 create table

    The sqlite3 module is the standard library for working with the SQLite database in Python. It provides an interface compliant with the DB-API 2.0 specification, allowing you to easily connect to, create, and interact with SQLite databases using SQL commands directly from your Python code. It is particularly popular because SQLite is a serverless database…

  • positive lookbehind assertion

    A positive lookbehind assertion in Python’s re module is a zero-width assertion that checks if the pattern that precedes it is present, without including that pattern in the overall match. It’s the opposite of a lookahead. It is written as (?<=…). The key constraint for lookbehind assertions in Python is that the pattern inside the…

  • Iterators in Python

    Iterators in Python An iterator in Python is an object that is used to iterate over iterable objects like lists, tuples, dictionaries, and sets. An iterator can be thought of as a pointer to a container’s elements. To create an iterator, you use the iter() function. To get the next element from the iterator, you…

Leave a Reply

Your email address will not be published. Required fields are marked *