re.I, re.S, re.X
Python re Flags: re.I, re.S, re.X Explained
Flags modify how regular expressions work. They’re used as optional parameters in re functions like re.search(), re.findall(), etc.
string = "The Euro STOXX 600 index, which tracks all stock markets across Europe including the FTSE, fell by 11.48% – the worst day since it launched in 1998. The panic selling prompted by the coronavirus has wiped £2.7tn off the value of STOXX 600 shares since its all-time peak on 19 February."
Source: https://www.theguardian.com/
Examples:
>>> import re
>>> result = re.findall(r"the", string)
>>> result
['the', 'the', 'the', 'the']
>>> result = re.findall(r"the", string, re.I)
>>> result
['The', 'the', 'the', 'The', 'the', 'the']
>>> string2 = "Hello\nPython"
>>> result = re.search(r".+", string2)
>>> result
<re.Match object; span=(0, 5), match='Hello'>
>>> result = re.search(r".+", string2, re.S)
>>> result
<re.Match object; span=(0, 12), match='Hello\nPython'>
>>> result = re.search(r""".+\s #Beginning of the string
(.+ex) #Searching for index
.+ #Middle of the string
(\d\d\s.+). #Date at the end""", string, re.X)
>>> result.groups()
('index', '19 February')
1. re.I or re.IGNORECASE
Purpose: Makes the pattern matching case-insensitive
Without re.I (Case-sensitive):
python
import re
text = "Hello WORLD hello World"
# Case-sensitive search
matches = re.findall(r'hello', text)
print("Case-sensitive:", matches) # Output: ['hello']
# Only finds lowercase 'hello'
With re.I (Case-insensitive):
python
# Case-insensitive search
matches = re.findall(r'hello', text, re.I)
print("Case-insensitive:", matches) # Output: ['Hello', 'hello']
# Finds both 'Hello' and 'hello'
Practical Example:
python
emails = "John@Email.com, mary@EMAIL.COM, bob@email.com"
# Extract emails regardless of case
email_matches = re.findall(r'[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}', emails, re.I)
print("Emails found:", email_matches)
# Output: ['John@Email.com', 'mary@EMAIL.COM', 'bob@email.com']
2. re.S or re.DOTALL
Purpose: Makes the dot . match EVERYTHING including newlines
Without re.S (Default behavior):
python
text = "First line\nSecond line\nThird line"
# Dot doesn't match newlines by default
match = re.search(r'First.*line', text)
print("Without DOTALL:", match) # Output: None (fails because of \n)
# Dot stops at newline
With re.S (Dot matches everything):
python
# Dot matches everything including newlines
match = re.search(r'First.*line', text, re.S)
print("With DOTALL:", match.group() if match else None)
# Output: 'First line\nSecond line\nThird line'
Practical Example:
python
html_content = """
<div>
<h3>Product Name</h3>
<p>Product Description</p>
</div>
"""
# Extract content across multiple lines
match = re.search(r'<h3>(.*?)</h3>.*?<p>(.*?)</p>', html_content, re.S)
if match:
print("Title:", match.group(1).strip()) # Output: 'Product Name'
print("Description:", match.group(2).strip()) # Output: 'Product Description'
3. re.X or re.VERBOSE
Purpose: Allows you to write readable regex with comments and whitespace
Without re.X (Normal regex):
python
# Hard to read complex pattern
pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
With re.X (Verbose mode):
python
# Same pattern, but readable with comments and spacing
pattern = r"""
^ # Start of string
[a-zA-Z0-9._%+-]+ # Local part (username)
@ # Literal @ symbol
[a-zA-Z0-9.-]+ # Domain name
\. # Literal dot
[a-zA-Z]{2,} # TLD (2+ letters)
$ # End of string
"""
email = "user@example.com"
match = re.search(pattern, email, re.X)
print("Valid email:", bool(match)) # Output: True
Practical Example:
python
# Complex pattern for phone numbers without VERBOSE
phone_pattern = r'\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}'
# Same pattern with VERBOSE (much more readable)
phone_pattern_verbose = r"""
\(? # Optional opening parenthesis
\d{3} # Area code (3 digits)
\)? # Optional closing parenthesis
[-.\s]? # Optional separator: hyphen, dot, or space
\d{3} # Exchange code (3 digits)
[-.\s]? # Optional separator
\d{4} # Line number (4 digits)
"""
phones = "Call (123) 456-7890 or 123.456.7890"
matches = re.findall(phone_pattern_verbose, phones, re.X)
print("Phone numbers found:", matches)
# Output: ['(123) 456-7890', '123.456.7890']
Combining Multiple Flags
You can combine flags using the | operator:
python
text = "Hello\nWORLD\nhello\nWorld"
# Case-insensitive + DOTALL
matches = re.findall(r'hello.*world', text, re.I | re.S)
print("Combined flags:", matches)
# Output: ['Hello\nWORLD\nhello\nWorld']
# Verbose + Case-insensitive
pattern = r"""
hello # Match hello
\s+ # One or more whitespace
world # Match world
"""
matches = re.findall(pattern, text, re.X | re.I)
print("Verbose + Case-insensitive:", matches)
# Output: ['Hello\nWORLD', 'hello\nWorld']
Summary Table:
| Flag | Full Name | Purpose |
|---|---|---|
re.I | IGNORECASE | Case-insensitive matching |
re.S | DOTALL | Dot matches everything (including newlines) |
re.X | VERBOSE | Allow comments and whitespace in patterns |
These flags make regex patterns more powerful and maintainable!