re module
The re module is Python’s built-in module for regular expressions (regex). It provides functions and methods to work with strings using pattern matching, allowing you to search, extract, replace, and split text based on complex patterns.
Key Functions in the re Module
1. Searching and Matching
python
import re text = "The quick brown fox jumps over the lazy dog" # re.search() - finds first occurrence anywhere in string result = re.search(r"fox", text) print(result.group()) # Output: fox # re.match() - checks only at the beginning of string result = re.match(r"The", text) print(result.group()) # Output: The
2. Finding All Matches
python
# re.findall() - returns all matches as a list
text = "Email me at john@example.com or jane@test.org"
emails = re.findall(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', text)
print(emails) # Output: ['john@example.com', 'jane@test.org']
3. Splitting Strings
python
# re.split() - split by pattern text = "apple,banana;cherry:date" result = re.split(r'[,;:]', text) print(result) # Output: ['apple', 'banana', 'cherry', 'date']
4. Replacing Text
python
# re.sub() - replace patterns
text = "My number is 123-456-7890"
masked = re.sub(r'\d{3}-\d{3}-\d{4}', 'XXX-XXX-XXXX', text)
print(masked) # Output: "My number is XXX-XXX-XXXX"
Common Regex Patterns
python
import re
# Match digits
re.findall(r'\d+', "Price: $100, $200") # ['100', '200']
# Match words
re.findall(r'\w+', "Hello, world!") # ['Hello', 'world']
# Email validation
email_pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
is_valid = re.match(email_pattern, "test@example.com") is not None
# Phone number extraction
text = "Call 555-1234 or 555-5678"
phones = re.findall(r'\d{3}-\d{4}', text) # ['555-1234', '555-5678']
Using Compiled Patterns
For better performance with repeated use:
python
import re
# Compile pattern once
pattern = re.compile(r'\d{3}-\d{3}-\d{4}')
# Use compiled pattern multiple times
texts = [
"Call 123-456-7890",
"Contact: 987-654-3210",
"No number here"
]
for text in texts:
match = pattern.search(text)
if match:
print(f"Found: {match.group()}")
Flags for Pattern Matching
python
import re text = "Hello\nWorld\nPython" # re.IGNORECASE (or re.I) - case insensitive re.findall(r'hello', text, re.IGNORECASE) # ['Hello'] # re.MULTILINE (or re.M) - ^ and $ match start/end of lines re.findall(r'^[A-Z]', text, re.MULTILINE) # ['H', 'W', 'P'] # re.DOTALL (or re.S) - . matches newline too re.findall(r'H.*P', text, re.DOTALL) # ['Hello\nWorld\nP']
Common Use Cases
- Data Validation
python
def validate_password(password):
# At least 8 chars, one uppercase, one lowercase, one digit
pattern = r'^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}$'
return re.match(pattern, password) is not None
- Data Extraction
python
def extract_dates(text):
# Match dates in format YYYY-MM-DD
return re.findall(r'\d{4}-\d{2}-\d{2}', text)
- Text Cleaning
python
def clean_text(text):
# Remove extra whitespace and special characters
text = re.sub(r'[^\w\s]', '', text) # Remove punctuation
text = re.sub(r'\s+', ' ', text) # Normalize whitespace
return text.strip()
The re module is essential for text processing, data validation, web scraping, and many other tasks where pattern matching in strings is required.