re module

The re module is Python’s built-in module for regular expressions (regex). It provides functions and methods to work with strings using pattern matching, allowing you to search, extract, replace, and split text based on complex patterns.

Key Functions in the re Module

1. Searching and Matching

python

import re

text = "The quick brown fox jumps over the lazy dog"

# re.search() - finds first occurrence anywhere in string
result = re.search(r"fox", text)
print(result.group())  # Output: fox

# re.match() - checks only at the beginning of string
result = re.match(r"The", text)
print(result.group())  # Output: The

2. Finding All Matches

python

# re.findall() - returns all matches as a list
text = "Email me at john@example.com or jane@test.org"
emails = re.findall(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', text)
print(emails)  # Output: ['john@example.com', 'jane@test.org']

3. Splitting Strings

python

# re.split() - split by pattern
text = "apple,banana;cherry:date"
result = re.split(r'[,;:]', text)
print(result)  # Output: ['apple', 'banana', 'cherry', 'date']

4. Replacing Text

python

# re.sub() - replace patterns
text = "My number is 123-456-7890"
masked = re.sub(r'\d{3}-\d{3}-\d{4}', 'XXX-XXX-XXXX', text)
print(masked)  # Output: "My number is XXX-XXX-XXXX"

Common Regex Patterns

python

import re

# Match digits
re.findall(r'\d+', "Price: $100, $200")  # ['100', '200']

# Match words
re.findall(r'\w+', "Hello, world!")  # ['Hello', 'world']

# Email validation
email_pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
is_valid = re.match(email_pattern, "test@example.com") is not None

# Phone number extraction
text = "Call 555-1234 or 555-5678"
phones = re.findall(r'\d{3}-\d{4}', text)  # ['555-1234', '555-5678']

Using Compiled Patterns

For better performance with repeated use:

python

import re

# Compile pattern once
pattern = re.compile(r'\d{3}-\d{3}-\d{4}')

# Use compiled pattern multiple times
texts = [
    "Call 123-456-7890",
    "Contact: 987-654-3210",
    "No number here"
]

for text in texts:
    match = pattern.search(text)
    if match:
        print(f"Found: {match.group()}")

Flags for Pattern Matching

python

import re

text = "Hello\nWorld\nPython"

# re.IGNORECASE (or re.I) - case insensitive
re.findall(r'hello', text, re.IGNORECASE)  # ['Hello']

# re.MULTILINE (or re.M) - ^ and $ match start/end of lines
re.findall(r'^[A-Z]', text, re.MULTILINE)  # ['H', 'W', 'P']

# re.DOTALL (or re.S) - . matches newline too
re.findall(r'H.*P', text, re.DOTALL)  # ['Hello\nWorld\nP']

Common Use Cases

  1. Data Validation

python

def validate_password(password):
    # At least 8 chars, one uppercase, one lowercase, one digit
    pattern = r'^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}$'
    return re.match(pattern, password) is not None
  1. Data Extraction

python

def extract_dates(text):
    # Match dates in format YYYY-MM-DD
    return re.findall(r'\d{4}-\d{2}-\d{2}', text)
  1. Text Cleaning

python

def clean_text(text):
    # Remove extra whitespace and special characters
    text = re.sub(r'[^\w\s]', '', text)  # Remove punctuation
    text = re.sub(r'\s+', ' ', text)      # Normalize whitespace
    return text.strip()

The re module is essential for text processing, data validation, web scraping, and many other tasks where pattern matching in strings is required.

Similar Posts

  • Date/Time Objects

    Creating and Manipulating Date/Time Objects in Python 1. Creating Date and Time Objects Creating Date Objects python from datetime import date, time, datetime # Create date objects date1 = date(2023, 12, 25) # Christmas 2023 date2 = date(2024, 1, 1) # New Year 2024 date3 = date(2023, 6, 15) # Random date print(“Date Objects:”) print(f”Christmas:…

  • Vs code

    What is VS Code? 💻 Visual Studio Code (VS Code) is a free, lightweight, and powerful code editor developed by Microsoft. It supports multiple programming languages (Python, JavaScript, Java, etc.) with: VS Code is cross-platform (Windows, macOS, Linux) and widely used for web development, data science, and general programming. 🌐📊✍️ How to Install VS Code…

  • Alternation and Grouping

    Complete List of Alternation and Grouping in Python Regular Expressions Grouping Constructs Capturing Groups Pattern Description Example (…) Capturing group (abc) (?P<name>…) Named capturing group (?P<word>\w+) \1, \2, etc. Backreferences to groups (a)\1 matches “aa” (?P=name) Named backreference (?P<word>\w+) (?P=word) Non-Capturing Groups Pattern Description Example (?:…) Non-capturing group (?:abc)+ (?i:…) Case-insensitive group (?i:hello) (?s:…) DOTALL group (. matches…

  • Indexing and Slicing for Writing (Modifying) Lists in Python

    Indexing and Slicing for Writing (Modifying) Lists in Python Indexing and slicing aren’t just for reading lists – they’re powerful tools for modifying lists as well. Let’s explore how to use them to change list contents with detailed examples. 1. Modifying Single Elements (Indexing for Writing) You can directly assign new values to specific indices. Example 1:…

  • Instance Variables,methods

    Instance Variables Instance variables are variables defined within a class but outside of any method. They are unique to each instance (object) of a class. This means that if you create multiple objects from the same class, each object will have its own separate copy of the instance variables. They are used to store the…

  • Strings in Python Indexing,Traversal

    Strings in Python and Indexing Strings in Python are sequences of characters enclosed in single quotes (‘ ‘), double quotes (” “), or triple quotes (”’ ”’ or “”” “””). They are immutable sequences of Unicode code points used to represent text. String Characteristics Creating Strings python single_quoted = ‘Hello’ double_quoted = “World” triple_quoted = ”’This is…

Leave a Reply

Your email address will not be published. Required fields are marked *