Alternation and Grouping

Complete List of Alternation and Grouping in Python Regular Expressions

Grouping Constructs

Capturing Groups

PatternDescriptionExample
(...)Capturing group(abc)
(?P<name>...)Named capturing group(?P<word>\w+)
\1\2, etc.Backreferences to groups(a)\1 matches “aa”
(?P=name)Named backreference(?P<word>\w+) (?P=word)

Non-Capturing Groups

PatternDescriptionExample
(?:...)Non-capturing group(?:abc)+
(?i:...)Case-insensitive group(?i:hello)
(?s:...)DOTALL group (. matches newline)(?s:.*)
(?m:...)MULTILINE group(?m:^.*$)
(?x:...)VERBOSE group (ignore whitespace)(?x:a b c)

Lookaround Groups (Zero-width Assertions)

PatternDescriptionExample
(?=...)Positive lookaheadword(?=ing)
(?!...)Negative lookaheadword(?!ing)
(?<=...)Positive lookbehind(?<=pre)word
(?<!...)Negative lookbehind(?<!un)word

Alternation Constructs

PatternDescriptionExample
``Alternation (OR)`catdog`
[abc]Character class (single char OR)[abc]
`(?(id)yesno)`Conditional pattern`(?(1)ab)`

Atomic Groups (Python regex module only)

PatternDescriptionExample
(?>...)Atomic group(?>a+)
(?(DEFINE)...)Definition group(?(DEFINE)(?<word>\w+))

Detailed Explanation with Examples

1. (...) – Capturing Groups

Example 1: Basic capturing

python

import re
text = "John Doe, Jane Smith"
result = re.findall(r'(\w+) (\w+)', text)
print(result)  # [('John', 'Doe'), ('Jane', 'Smith')]

Example 2: Extracting date components

python

text = "2024-01-15 2023-12-25"
result = re.findall(r'(\d{4})-(\d{2})-(\d{2})', text)
print(result)  # [('2024', '01', '15'), ('2023', '12', '25')]

2. (?P<name>...) – Named Capturing Groups

Example 1: Named groups

python

text = "John: 25, Jane: 30"
result = re.findall(r'(?P<name>\w+): (?P<age>\d+)', text)
print(result)  # [('John', '25'), ('Jane', '30')]

Example 2: Using groupdict()

python

text = "Name: John, Age: 25"
match = re.search(r'Name: (?P<name>\w+), Age: (?P<age>\d+)', text)
print(match.groupdict())  # {'name': 'John', 'age': '25'}

3. \1\2 – Backreferences

Example 1: Matching repeated words

python

text = "the the quick brown fox fox jumps"
result = re.findall(r'(\b\w+\b) \1', text)
print(result)  # ['the', 'fox']

Example 2: Validating quotes

python

text = 'She said "hello" and he said "goodbye"'
result = re.findall(r'(["\'])(.*?)\1', text)
print(result)  # [('"', 'hello'), ('"', 'goodbye')]

4. (?P=name) – Named Backreferences

Example 1: Named backreference

python

text = "John said: hello hello"
result = re.findall(r'(?P<word>\w+) (?P=word)', text)
print(result)  # ['hello']

Example 2: HTML tag matching

python

text = "<div>content</div> <p>text</p>"
result = re.findall(r'<(?P<tag>\w+)>.*?</(?P=tag)>', text)
print(result)  # ['div', 'p']

5. (?:...) – Non-capturing Groups

Example 1: Grouping without capturing

python

text = "abcabc ababab"
result = re.findall(r'(?:abc){2}', text)
print(result)  # ['abcabc']

Example 2: Alternation without capturing

python

text = "cat dog bird"
result = re.findall(r'(?:cat|dog|bird)', text)
print(result)  # ['cat', 'dog', 'bird']

6. (?i:...) – Case-insensitive Groups

Example 1: Case-insensitive matching

python

text = "Hello HELLO hello"
result = re.findall(r'(?i:hello)', text)
print(result)  # ['Hello', 'HELLO', 'hello']

Example 2: Mixed case sensitivity

python

text = "Python python PYTHON"
result = re.findall(r'(?i:py)thon', text)
print(result)  # ['Python', 'python', 'PYTHON']

7. (?s:...) – DOTALL Groups

Example 1: Dot matches newline within group

python

text = "start\nmiddle\nend"
result = re.findall(r'(?s:start.*end)', text)
print(result)  # ['start\nmiddle\nend']

Example 2: Mixed dot behavior

python

text = "line1\nline2\nline3"
result = re.findall(r'line1(?s:.*)line3', text)
print(result)  # ['line1\nline2\nline3']

8. (?m:...) – MULTILINE Groups

Example 1: Multiline matching within group

python

text = "first line\nsecond line\nthird line"
result = re.findall(r'(?m:^.*$)', text)
print(result)  # ['first line', 'second line', 'third line']

9. (?x:...) – VERBOSE Groups

Example 1: Ignore whitespace in pattern

python

text = "hello world"
pattern = r'(?x:hello\ world)'
result = re.findall(pattern, text)
print(result)  # ['hello world']

10. (?=...) – Positive Lookahead

Example 1: Match word before specific pattern

python

text = "running sitting walking"
result = re.findall(r'\w+(?=ing)', text)
print(result)  # ['runn', 'sitt', 'walk']

Example 2: Password validation

python

password = "Pass123!"
has_digit = bool(re.search(r'(?=.*\d)', password))
print(has_digit)  # True

11. (?!...) – Negative Lookahead

Example 1: Exclude specific endings

python

text = "happy unhappy really"
result = re.findall(r'\b\w+(?!un)\b', text)
print(result)  # ['happy', 'really']

12. (?<=...) – Positive Lookbehind

Example 1: Match after specific pattern

python

text = "$100 €200 ¥300"
result = re.findall(r'(?<=\$)\d+', text)
print(result)  # ['100']

13. (?<!...) – Negative Lookbehind

Example 1: Exclude specific prefixes

python

text = "happy unhappy really"
result = re.findall(r'(?<!un)\b\w+', text)
print(result)  # ['happy', 'really']

14. | – Alternation

Example 1: Basic OR operation

python

text = "cat dog bird fish"
result = re.findall(r'cat|dog|bird', text)
print(result)  # ['cat', 'dog', 'bird']

Example 2: Complex alternation

python

text = "color colour favor favour"
result = re.findall(r'colou?r|favou?r', text)
print(result)  # ['color', 'colour', 'favor', 'favour']

15. (?(id)yes|no) – Conditional Patterns

Example 1: Conditional matching

python

text = "<div>content</div> <p>text</p>"
result = re.findall(r'<(div|p)>(.*?)</(?(1)\1|div)>', text)
print(result)  # [('div', 'content'), ('p', 'text')]

16. (?>...) – Atomic Groups (regex module)

Example 1: Prevent backtracking

python

import regex
text = "aaaab"
result = regex.findall(r'(?>a+)b', text)
print(result)  # ['aaaab']

17. (?(DEFINE)...) – Definition Groups (regex module)

Example 1: Define reusable patterns

python

import regex
pattern = r'(?(DEFINE)(?<number>\d+))(?&number)'
text = "123 456"
result = regex.findall(pattern, text)
print(result)  # ['123', '456']

Advanced Examples

Complex Grouping with Multiple Techniques

python

# Email validation with detailed groups
email_pattern = r'''
    (?P<local>[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+)   # Local part
    @                                              # @ symbol
    (?P<domain>[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)  # Domain
    (?:\.(?P<tld>[a-zA-Z]{2,}))+                   # TLD (non-capturing)
'''

text = "user@example.com admin@test.co.uk"
matches = re.finditer(email_pattern, text, re.VERBOSE)
for match in matches:
    print(match.groupdict())

Nested Groups and Backreferences

python

# Matching nested structures (simplified)
text = "((a)(b)) ((c)(d))"
result = re.findall(r'\(([^()]*(?:\([^()]*\)[^()]*)*)\)', text)
print(result)  # ['(a)(b)', '(c)(d)']

Lookaround with Groups

python

# Extract numbers preceded by currency symbols
text = "Price: $100, Cost: €200, Value: ¥300"
result = re.findall(r'(?<=[$€¥])(\d+)', text)
print(result)  # ['100', '200', '300']

Group Numbering Rules

  1. Left to right: Groups are numbered by opening parentheses
  2. Nested groups: Outer groups get lower numbers
  3. Non-capturing groups: Don’t affect numbering
  4. Named groups: Still get numbers in addition to names

python

# Group numbering example
text = "2024-01-15"
pattern = r'((\d{4})-(\d{2}))-(\d{2})'
match = re.search(pattern, text)
print("All groups:", match.groups())
print("Group 1:", match.group(1))  # 2024-01
print("Group 2:", match.group(2))  # 2024
print("Group 3:", match.group(3))  # 01
print("Group 4:", match.group(4))  # 15

Similar Posts

  • Quantifiers (Repetition)

    Quantifiers (Repetition) in Python Regular Expressions – Detailed Explanation Basic Quantifiers 1. * – 0 or more occurrences (Greedy) Description: Matches the preceding element zero or more times Example 1: Match zero or more digits python import re text = “123 4567 89″ result = re.findall(r’\d*’, text) print(result) # [‘123’, ”, ‘4567’, ”, ’89’, ”] # Matches…

  • Strings in Python Indexing,Traversal

    Strings in Python and Indexing Strings in Python are sequences of characters enclosed in single quotes (‘ ‘), double quotes (” “), or triple quotes (”’ ”’ or “”” “””). They are immutable sequences of Unicode code points used to represent text. String Characteristics Creating Strings python single_quoted = ‘Hello’ double_quoted = “World” triple_quoted = ”’This is…

  • Unlock the Power of Python: What is Python, History, Uses, & 7 Amazing Applications

    What is Python and History of python, different sectors python used Python is one of the most popular programming languages worldwide, known for its versatility and beginner-friendliness . From web development to data science and machine learning, Python has become an indispensable tool for developers and tech professionals across various industries . This blog post…

  • re module

    The re module is Python’s built-in module for regular expressions (regex). It provides functions and methods to work with strings using pattern matching, allowing you to search, extract, replace, and split text based on complex patterns. Key Functions in the re Module 1. Searching and Matching python import re text = “The quick brown fox jumps over the lazy dog” # re.search()…

  • How to create Class

    🟥 Rectangle Properties Properties are the nouns that describe a rectangle. They are the characteristics that define a specific rectangle’s dimensions and position. Examples: 📐 Rectangle Methods Methods are the verbs that describe what a rectangle can do or what can be done to it. They are the actions that allow you to calculate information…

  • Python Installation Guide: Easy Steps for Windows, macOS, and Linux

    Installing Python is a straightforward process, and it can be done on various operating systems like Windows, macOS, and Linux. Below are step-by-step instructions for installing Python on each platform. 1. Installing Python on Windows Step 1: Download Python Step 2: Run the Installer Step 3: Verify Installation If Python is installed correctly, it will…

Leave a Reply

Your email address will not be published. Required fields are marked *