re.I, re.S, re.X

Python re Flags: re.I, re.S, re.X Explained

Flags modify how regular expressions work. They’re used as optional parameters in re functions like re.search()re.findall(), etc.

string = "The Euro STOXX 600 index, which tracks all stock markets across Europe including the FTSE, fell by 11.48% – the worst day since it launched in 1998. The panic selling prompted by the coronavirus has wiped £2.7tn off the value of STOXX 600 shares since its all-time peak on 19 February."

Source: https://www.theguardian.com/



Examples:

>>> import re
>>> result = re.findall(r"the", string)
>>> result
['the', 'the', 'the', 'the']
>>> result = re.findall(r"the", string, re.I)
>>> result
['The', 'the', 'the', 'The', 'the', 'the']


>>> string2 = "Hello\nPython"
>>> result = re.search(r".+", string2)
>>> result
<re.Match object; span=(0, 5), match='Hello'>
>>> result = re.search(r".+", string2, re.S)
>>> result
<re.Match object; span=(0, 12), match='Hello\nPython'>


>>> result = re.search(r""".+\s #Beginning of the string
			(.+ex) #Searching for index
			.+ #Middle of the string
			(\d\d\s.+). #Date at the end""", string, re.X)
>>> result.groups()
('index', '19 February')

1. re.I or re.IGNORECASE

Purpose: Makes the pattern matching case-insensitive

Without re.I (Case-sensitive):

python

import re

text = "Hello WORLD hello World"

# Case-sensitive search
matches = re.findall(r'hello', text)
print("Case-sensitive:", matches)  # Output: ['hello']

# Only finds lowercase 'hello'

With re.I (Case-insensitive):

python

# Case-insensitive search
matches = re.findall(r'hello', text, re.I)
print("Case-insensitive:", matches)  # Output: ['Hello', 'hello']

# Finds both 'Hello' and 'hello'

Practical Example:

python

emails = "John@Email.com, mary@EMAIL.COM, bob@email.com"
# Extract emails regardless of case
email_matches = re.findall(r'[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}', emails, re.I)
print("Emails found:", email_matches)
# Output: ['John@Email.com', 'mary@EMAIL.COM', 'bob@email.com']

2. re.S or re.DOTALL

Purpose: Makes the dot . match EVERYTHING including newlines

Without re.S (Default behavior):

python

text = "First line\nSecond line\nThird line"

# Dot doesn't match newlines by default
match = re.search(r'First.*line', text)
print("Without DOTALL:", match)  # Output: None (fails because of \n)

# Dot stops at newline

With re.S (Dot matches everything):

python

# Dot matches everything including newlines
match = re.search(r'First.*line', text, re.S)
print("With DOTALL:", match.group() if match else None)
# Output: 'First line\nSecond line\nThird line'

Practical Example:

python

html_content = """
<div>
    <h3>Product Name</h3>
    <p>Product Description</p>
</div>
"""

# Extract content across multiple lines
match = re.search(r'<h3>(.*?)</h3>.*?<p>(.*?)</p>', html_content, re.S)
if match:
    print("Title:", match.group(1).strip())    # Output: 'Product Name'
    print("Description:", match.group(2).strip())  # Output: 'Product Description'

3. re.X or re.VERBOSE

Purpose: Allows you to write readable regex with comments and whitespace

Without re.X (Normal regex):

python

# Hard to read complex pattern
pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'

With re.X (Verbose mode):

python

# Same pattern, but readable with comments and spacing
pattern = r"""
^                   # Start of string
[a-zA-Z0-9._%+-]+   # Local part (username)
@                   # Literal @ symbol
[a-zA-Z0-9.-]+      # Domain name
\.                  # Literal dot
[a-zA-Z]{2,}        # TLD (2+ letters)
$                   # End of string
"""

email = "user@example.com"
match = re.search(pattern, email, re.X)
print("Valid email:", bool(match))  # Output: True

Practical Example:

python

# Complex pattern for phone numbers without VERBOSE
phone_pattern = r'\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}'

# Same pattern with VERBOSE (much more readable)
phone_pattern_verbose = r"""
\(?                # Optional opening parenthesis
\d{3}              # Area code (3 digits)
\)?                # Optional closing parenthesis
[-.\s]?            # Optional separator: hyphen, dot, or space
\d{3}              # Exchange code (3 digits)
[-.\s]?            # Optional separator
\d{4}              # Line number (4 digits)
"""

phones = "Call (123) 456-7890 or 123.456.7890"
matches = re.findall(phone_pattern_verbose, phones, re.X)
print("Phone numbers found:", matches)
# Output: ['(123) 456-7890', '123.456.7890']

Combining Multiple Flags

You can combine flags using the | operator:

python

text = "Hello\nWORLD\nhello\nWorld"

# Case-insensitive + DOTALL
matches = re.findall(r'hello.*world', text, re.I | re.S)
print("Combined flags:", matches)
# Output: ['Hello\nWORLD\nhello\nWorld']

# Verbose + Case-insensitive
pattern = r"""
hello   # Match hello
\s+     # One or more whitespace
world   # Match world
"""

matches = re.findall(pattern, text, re.X | re.I)
print("Verbose + Case-insensitive:", matches)
# Output: ['Hello\nWORLD', 'hello\nWorld']

Summary Table:

FlagFull NamePurpose
re.IIGNORECASECase-insensitive matching
re.SDOTALLDot matches everything (including newlines)
re.XVERBOSEAllow comments and whitespace in patterns

These flags make regex patterns more powerful and maintainable!

Similar Posts

  • How to create Class

    🟥 Rectangle Properties Properties are the nouns that describe a rectangle. They are the characteristics that define a specific rectangle’s dimensions and position. Examples: 📐 Rectangle Methods Methods are the verbs that describe what a rectangle can do or what can be done to it. They are the actions that allow you to calculate information…

  • Dynamically Typed vs. Statically Typed Languages 🔄↔️

    Dynamically Typed vs. Statically Typed Languages 🔄↔️ Dynamically Typed Languages 🚀 Python Pros: Cons: Statically Typed Languages 🔒 Java Pros: Cons: Key Differences 🆚 Feature Dynamically Typed Statically Typed Type Checking Runtime Compile-time Variable Types Can change during execution Fixed after declaration Error Detection Runtime exceptions Compile-time failures Speed Slower (runtime checks) Faster (optimized binaries)…

  • Raw Strings in Python

    Raw Strings in Python’s re Module Raw strings (prefixed with r) are highly recommended when working with regular expressions because they treat backslashes (\) as literal characters, preventing Python from interpreting them as escape sequences. path = ‘C:\Users\Documents’ pattern = r’C:\Users\Documents’ .4.1.1. Escape sequences Unless an ‘r’ or ‘R’ prefix is present, escape sequences in string and bytes literals are interpreted according…

  • Demo And course Content

    What is Python? Python is a high-level, interpreted, and general-purpose programming language known for its simplicity and readability. It supports multiple programming paradigms, including: Python’s design philosophy emphasizes code readability (using indentation instead of braces) and developer productivity. History of Python Fun Fact: Python is named after Monty Python’s Flying Circus (a British comedy show), not the snake! 🐍🎭 Top Career Paths After Learning Core Python 🐍…

  • re.subn()

    Python re.subn() Method Explained The re.subn() method is similar to re.sub() but with one key difference: it returns a tuple containing both the modified string and the number of substitutions made. This is useful when you need to know how many replacements occurred. Syntax python re.subn(pattern, repl, string, count=0, flags=0) Returns: (modified_string, number_of_substitutions) Example 1: Basic Usage with Count Tracking python import re…

Leave a Reply

Your email address will not be published. Required fields are marked *