Alternation and Grouping

Complete List of Alternation and Grouping in Python Regular Expressions

Grouping Constructs

Capturing Groups

PatternDescriptionExample
(...)Capturing group(abc)
(?P<name>...)Named capturing group(?P<word>\w+)
\1\2, etc.Backreferences to groups(a)\1 matches “aa”
(?P=name)Named backreference(?P<word>\w+) (?P=word)

Non-Capturing Groups

PatternDescriptionExample
(?:...)Non-capturing group(?:abc)+
(?i:...)Case-insensitive group(?i:hello)
(?s:...)DOTALL group (. matches newline)(?s:.*)
(?m:...)MULTILINE group(?m:^.*$)
(?x:...)VERBOSE group (ignore whitespace)(?x:a b c)

Lookaround Groups (Zero-width Assertions)

PatternDescriptionExample
(?=...)Positive lookaheadword(?=ing)
(?!...)Negative lookaheadword(?!ing)
(?<=...)Positive lookbehind(?<=pre)word
(?<!...)Negative lookbehind(?<!un)word

Alternation Constructs

PatternDescriptionExample
``Alternation (OR)`catdog`
[abc]Character class (single char OR)[abc]
`(?(id)yesno)`Conditional pattern`(?(1)ab)`

Atomic Groups (Python regex module only)

PatternDescriptionExample
(?>...)Atomic group(?>a+)
(?(DEFINE)...)Definition group(?(DEFINE)(?<word>\w+))

Detailed Explanation with Examples

1. (...) – Capturing Groups

Example 1: Basic capturing

python

import re
text = "John Doe, Jane Smith"
result = re.findall(r'(\w+) (\w+)', text)
print(result)  # [('John', 'Doe'), ('Jane', 'Smith')]

Example 2: Extracting date components

python

text = "2024-01-15 2023-12-25"
result = re.findall(r'(\d{4})-(\d{2})-(\d{2})', text)
print(result)  # [('2024', '01', '15'), ('2023', '12', '25')]

2. (?P<name>...) – Named Capturing Groups

Example 1: Named groups

python

text = "John: 25, Jane: 30"
result = re.findall(r'(?P<name>\w+): (?P<age>\d+)', text)
print(result)  # [('John', '25'), ('Jane', '30')]

Example 2: Using groupdict()

python

text = "Name: John, Age: 25"
match = re.search(r'Name: (?P<name>\w+), Age: (?P<age>\d+)', text)
print(match.groupdict())  # {'name': 'John', 'age': '25'}

3. \1\2 – Backreferences

Example 1: Matching repeated words

python

text = "the the quick brown fox fox jumps"
result = re.findall(r'(\b\w+\b) \1', text)
print(result)  # ['the', 'fox']

Example 2: Validating quotes

python

text = 'She said "hello" and he said "goodbye"'
result = re.findall(r'(["\'])(.*?)\1', text)
print(result)  # [('"', 'hello'), ('"', 'goodbye')]

4. (?P=name) – Named Backreferences

Example 1: Named backreference

python

text = "John said: hello hello"
result = re.findall(r'(?P<word>\w+) (?P=word)', text)
print(result)  # ['hello']

Example 2: HTML tag matching

python

text = "<div>content</div> <p>text</p>"
result = re.findall(r'<(?P<tag>\w+)>.*?</(?P=tag)>', text)
print(result)  # ['div', 'p']

5. (?:...) – Non-capturing Groups

Example 1: Grouping without capturing

python

text = "abcabc ababab"
result = re.findall(r'(?:abc){2}', text)
print(result)  # ['abcabc']

Example 2: Alternation without capturing

python

text = "cat dog bird"
result = re.findall(r'(?:cat|dog|bird)', text)
print(result)  # ['cat', 'dog', 'bird']

6. (?i:...) – Case-insensitive Groups

Example 1: Case-insensitive matching

python

text = "Hello HELLO hello"
result = re.findall(r'(?i:hello)', text)
print(result)  # ['Hello', 'HELLO', 'hello']

Example 2: Mixed case sensitivity

python

text = "Python python PYTHON"
result = re.findall(r'(?i:py)thon', text)
print(result)  # ['Python', 'python', 'PYTHON']

7. (?s:...) – DOTALL Groups

Example 1: Dot matches newline within group

python

text = "start\nmiddle\nend"
result = re.findall(r'(?s:start.*end)', text)
print(result)  # ['start\nmiddle\nend']

Example 2: Mixed dot behavior

python

text = "line1\nline2\nline3"
result = re.findall(r'line1(?s:.*)line3', text)
print(result)  # ['line1\nline2\nline3']

8. (?m:...) – MULTILINE Groups

Example 1: Multiline matching within group

python

text = "first line\nsecond line\nthird line"
result = re.findall(r'(?m:^.*$)', text)
print(result)  # ['first line', 'second line', 'third line']

9. (?x:...) – VERBOSE Groups

Example 1: Ignore whitespace in pattern

python

text = "hello world"
pattern = r'(?x:hello\ world)'
result = re.findall(pattern, text)
print(result)  # ['hello world']

10. (?=...) – Positive Lookahead

Example 1: Match word before specific pattern

python

text = "running sitting walking"
result = re.findall(r'\w+(?=ing)', text)
print(result)  # ['runn', 'sitt', 'walk']

Example 2: Password validation

python

password = "Pass123!"
has_digit = bool(re.search(r'(?=.*\d)', password))
print(has_digit)  # True

11. (?!...) – Negative Lookahead

Example 1: Exclude specific endings

python

text = "happy unhappy really"
result = re.findall(r'\b\w+(?!un)\b', text)
print(result)  # ['happy', 'really']

12. (?<=...) – Positive Lookbehind

Example 1: Match after specific pattern

python

text = "$100 €200 ¥300"
result = re.findall(r'(?<=\$)\d+', text)
print(result)  # ['100']

13. (?<!...) – Negative Lookbehind

Example 1: Exclude specific prefixes

python

text = "happy unhappy really"
result = re.findall(r'(?<!un)\b\w+', text)
print(result)  # ['happy', 'really']

14. | – Alternation

Example 1: Basic OR operation

python

text = "cat dog bird fish"
result = re.findall(r'cat|dog|bird', text)
print(result)  # ['cat', 'dog', 'bird']

Example 2: Complex alternation

python

text = "color colour favor favour"
result = re.findall(r'colou?r|favou?r', text)
print(result)  # ['color', 'colour', 'favor', 'favour']

15. (?(id)yes|no) – Conditional Patterns

Example 1: Conditional matching

python

text = "<div>content</div> <p>text</p>"
result = re.findall(r'<(div|p)>(.*?)</(?(1)\1|div)>', text)
print(result)  # [('div', 'content'), ('p', 'text')]

16. (?>...) – Atomic Groups (regex module)

Example 1: Prevent backtracking

python

import regex
text = "aaaab"
result = regex.findall(r'(?>a+)b', text)
print(result)  # ['aaaab']

17. (?(DEFINE)...) – Definition Groups (regex module)

Example 1: Define reusable patterns

python

import regex
pattern = r'(?(DEFINE)(?<number>\d+))(?&number)'
text = "123 456"
result = regex.findall(pattern, text)
print(result)  # ['123', '456']

Advanced Examples

Complex Grouping with Multiple Techniques

python

# Email validation with detailed groups
email_pattern = r'''
    (?P<local>[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+)   # Local part
    @                                              # @ symbol
    (?P<domain>[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)  # Domain
    (?:\.(?P<tld>[a-zA-Z]{2,}))+                   # TLD (non-capturing)
'''

text = "user@example.com admin@test.co.uk"
matches = re.finditer(email_pattern, text, re.VERBOSE)
for match in matches:
    print(match.groupdict())

Nested Groups and Backreferences

python

# Matching nested structures (simplified)
text = "((a)(b)) ((c)(d))"
result = re.findall(r'\(([^()]*(?:\([^()]*\)[^()]*)*)\)', text)
print(result)  # ['(a)(b)', '(c)(d)']

Lookaround with Groups

python

# Extract numbers preceded by currency symbols
text = "Price: $100, Cost: €200, Value: ¥300"
result = re.findall(r'(?<=[$€¥])(\d+)', text)
print(result)  # ['100', '200', '300']

Group Numbering Rules

  1. Left to right: Groups are numbered by opening parentheses
  2. Nested groups: Outer groups get lower numbers
  3. Non-capturing groups: Don’t affect numbering
  4. Named groups: Still get numbers in addition to names

python

# Group numbering example
text = "2024-01-15"
pattern = r'((\d{4})-(\d{2}))-(\d{2})'
match = re.search(pattern, text)
print("All groups:", match.groups())
print("Group 1:", match.group(1))  # 2024-01
print("Group 2:", match.group(2))  # 2024
print("Group 3:", match.group(3))  # 01
print("Group 4:", match.group(4))  # 15

Similar Posts

  • Create a User-Defined Exception

    A user-defined exception in Python is a custom error class that you create to handle specific error conditions within your code. Instead of relying on built-in exceptions like ValueError, you define your own to make your code more readable and to provide more specific error messages. You create a user-defined exception by defining a new…

  • AttributeError: ‘NoneType’ Error in Python re

    AttributeError: ‘NoneType’ Error in Python re This error occurs when you try to call match object methods on None instead of an actual match object. It’s one of the most common errors when working with Python’s regex module. Why This Happens: The re.search(), re.match(), and re.fullmatch() functions return: When you try to call methods like .group(), .start(), or .span() on None, you get this error. Example That Causes…

  • Date/Time Objects

    Creating and Manipulating Date/Time Objects in Python 1. Creating Date and Time Objects Creating Date Objects python from datetime import date, time, datetime # Create date objects date1 = date(2023, 12, 25) # Christmas 2023 date2 = date(2024, 1, 1) # New Year 2024 date3 = date(2023, 6, 15) # Random date print(“Date Objects:”) print(f”Christmas:…

  • Escape Sequences in Python

    Escape Sequences in Python Regular Expressions – Detailed Explanation Escape sequences are used to match literal characters that would otherwise be interpreted as special regex metacharacters. 1. \\ – Backslash Description: Matches a literal backslash character Example 1: Matching file paths with backslashes python import re text = “C:\\Windows\\System32 D:\\Program Files\\” result = re.findall(r'[A-Z]:\\\w+’, text) print(result) #…

  • group() and groups()

    Python re group() and groups() Methods Explained The group() and groups() methods are used with match objects to extract captured groups from regex patterns. They work on the result of re.search(), re.match(), or re.finditer(). group() Method groups() Method Example 1: Basic Group Extraction python import retext = “John Doe, age 30, email: john.doe@email.com”# Pattern with multiple capture groupspattern = r'(\w+)\s+(\w+),\s+age\s+(\d+),\s+email:\s+([\w.]+@[\w.]+)’///The Pattern: r'(\w+)\s+(\w+),\s+age\s+(\d+),\s+email:\s+([\w.]+@[\w.]+)’Breakdown by Capture…

  • Basic Character Classes

    Basic Character Classes Pattern Description Example Matches [abc] Matches any single character in the brackets a, b, or c [^abc] Matches any single character NOT in the brackets d, 1, ! (not a, b, or c) [a-z] Matches any character in the range a to z a, b, c, …, z [A-Z] Matches any character in the range A to Z A, B, C, …, Z [0-9] Matches…

Leave a Reply

Your email address will not be published. Required fields are marked *