re module

The re module is Python’s built-in module for regular expressions (regex). It provides functions and methods to work with strings using pattern matching, allowing you to search, extract, replace, and split text based on complex patterns.

Key Functions in the re Module

1. Searching and Matching

python

import re

text = "The quick brown fox jumps over the lazy dog"

# re.search() - finds first occurrence anywhere in string
result = re.search(r"fox", text)
print(result.group())  # Output: fox

# re.match() - checks only at the beginning of string
result = re.match(r"The", text)
print(result.group())  # Output: The

2. Finding All Matches

python

# re.findall() - returns all matches as a list
text = "Email me at john@example.com or jane@test.org"
emails = re.findall(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', text)
print(emails)  # Output: ['john@example.com', 'jane@test.org']

3. Splitting Strings

python

# re.split() - split by pattern
text = "apple,banana;cherry:date"
result = re.split(r'[,;:]', text)
print(result)  # Output: ['apple', 'banana', 'cherry', 'date']

4. Replacing Text

python

# re.sub() - replace patterns
text = "My number is 123-456-7890"
masked = re.sub(r'\d{3}-\d{3}-\d{4}', 'XXX-XXX-XXXX', text)
print(masked)  # Output: "My number is XXX-XXX-XXXX"

Common Regex Patterns

python

import re

# Match digits
re.findall(r'\d+', "Price: $100, $200")  # ['100', '200']

# Match words
re.findall(r'\w+', "Hello, world!")  # ['Hello', 'world']

# Email validation
email_pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
is_valid = re.match(email_pattern, "test@example.com") is not None

# Phone number extraction
text = "Call 555-1234 or 555-5678"
phones = re.findall(r'\d{3}-\d{4}', text)  # ['555-1234', '555-5678']

Using Compiled Patterns

For better performance with repeated use:

python

import re

# Compile pattern once
pattern = re.compile(r'\d{3}-\d{3}-\d{4}')

# Use compiled pattern multiple times
texts = [
    "Call 123-456-7890",
    "Contact: 987-654-3210",
    "No number here"
]

for text in texts:
    match = pattern.search(text)
    if match:
        print(f"Found: {match.group()}")

Flags for Pattern Matching

python

import re

text = "Hello\nWorld\nPython"

# re.IGNORECASE (or re.I) - case insensitive
re.findall(r'hello', text, re.IGNORECASE)  # ['Hello']

# re.MULTILINE (or re.M) - ^ and $ match start/end of lines
re.findall(r'^[A-Z]', text, re.MULTILINE)  # ['H', 'W', 'P']

# re.DOTALL (or re.S) - . matches newline too
re.findall(r'H.*P', text, re.DOTALL)  # ['Hello\nWorld\nP']

Common Use Cases

  1. Data Validation

python

def validate_password(password):
    # At least 8 chars, one uppercase, one lowercase, one digit
    pattern = r'^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}$'
    return re.match(pattern, password) is not None
  1. Data Extraction

python

def extract_dates(text):
    # Match dates in format YYYY-MM-DD
    return re.findall(r'\d{4}-\d{2}-\d{2}', text)
  1. Text Cleaning

python

def clean_text(text):
    # Remove extra whitespace and special characters
    text = re.sub(r'[^\w\s]', '', text)  # Remove punctuation
    text = re.sub(r'\s+', ' ', text)      # Normalize whitespace
    return text.strip()

The re module is essential for text processing, data validation, web scraping, and many other tasks where pattern matching in strings is required.

Similar Posts

  • For loop 13 and 14th class

    The range() Function in Python The range() function is a built-in Python function that generates a sequence of numbers. It’s commonly used in for loops to iterate a specific number of times. Basic Syntax There are three ways to use range(): 1. range(stop) – One Parameter Form Generates numbers from 0 up to (but not including) the stop value. python for i in range(5):…

  • Predefined Character Classes

    Predefined Character Classes Pattern Description Equivalent . Matches any character except newline \d Matches any digit [0-9] \D Matches any non-digit [^0-9] \w Matches any word character [a-zA-Z0-9_] \W Matches any non-word character [^a-zA-Z0-9_] \s Matches any whitespace character [ \t\n\r\f\v] \S Matches any non-whitespace character [^ \t\n\r\f\v] 1. Literal Character a Matches: The exact character…

  • Else Block in Exception Handling in Python

    Else Block in Exception Handling in Python The else block in Python exception handling executes only if the try block completes successfully without any exceptions. It’s placed after all except blocks and before the finally block. Basic Syntax: python try: # Code that might raise an exception except SomeException: # Handle the exception else: # Code that runs only if no exception…

  • Default Arguments

    Default Arguments in Python Functions Default arguments allow you to specify default values for function parameters. If a value isn’t provided for that parameter when the function is called, Python uses the default value instead. Basic Syntax python def function_name(parameter=default_value): # function body Simple Examples Example 1: Basic Default Argument python def greet(name=”Guest”): print(f”Hello, {name}!”)…

  • String Validation Methods

    Complete List of Python String Validation Methods Python provides several built-in string methods to check if a string meets certain criteria. These methods return True or False and are useful for input validation, data cleaning, and text processing. 1. Case Checking Methods Method Description Example isupper() Checks if all characters are uppercase “HELLO”.isupper() → True islower() Checks if all…

Leave a Reply

Your email address will not be published. Required fields are marked *