Alternation and Grouping

Complete List of Alternation and Grouping in Python Regular Expressions

Grouping Constructs

Capturing Groups

PatternDescriptionExample
(...)Capturing group(abc)
(?P<name>...)Named capturing group(?P<word>\w+)
\1\2, etc.Backreferences to groups(a)\1 matches “aa”
(?P=name)Named backreference(?P<word>\w+) (?P=word)

Non-Capturing Groups

PatternDescriptionExample
(?:...)Non-capturing group(?:abc)+
(?i:...)Case-insensitive group(?i:hello)
(?s:...)DOTALL group (. matches newline)(?s:.*)
(?m:...)MULTILINE group(?m:^.*$)
(?x:...)VERBOSE group (ignore whitespace)(?x:a b c)

Lookaround Groups (Zero-width Assertions)

PatternDescriptionExample
(?=...)Positive lookaheadword(?=ing)
(?!...)Negative lookaheadword(?!ing)
(?<=...)Positive lookbehind(?<=pre)word
(?<!...)Negative lookbehind(?<!un)word

Alternation Constructs

PatternDescriptionExample
``Alternation (OR)`catdog`
[abc]Character class (single char OR)[abc]
`(?(id)yesno)`Conditional pattern`(?(1)ab)`

Atomic Groups (Python regex module only)

PatternDescriptionExample
(?>...)Atomic group(?>a+)
(?(DEFINE)...)Definition group(?(DEFINE)(?<word>\w+))

Detailed Explanation with Examples

1. (...) – Capturing Groups

Example 1: Basic capturing

python

import re
text = "John Doe, Jane Smith"
result = re.findall(r'(\w+) (\w+)', text)
print(result)  # [('John', 'Doe'), ('Jane', 'Smith')]

Example 2: Extracting date components

python

text = "2024-01-15 2023-12-25"
result = re.findall(r'(\d{4})-(\d{2})-(\d{2})', text)
print(result)  # [('2024', '01', '15'), ('2023', '12', '25')]

2. (?P<name>...) – Named Capturing Groups

Example 1: Named groups

python

text = "John: 25, Jane: 30"
result = re.findall(r'(?P<name>\w+): (?P<age>\d+)', text)
print(result)  # [('John', '25'), ('Jane', '30')]

Example 2: Using groupdict()

python

text = "Name: John, Age: 25"
match = re.search(r'Name: (?P<name>\w+), Age: (?P<age>\d+)', text)
print(match.groupdict())  # {'name': 'John', 'age': '25'}

3. \1\2 – Backreferences

Example 1: Matching repeated words

python

text = "the the quick brown fox fox jumps"
result = re.findall(r'(\b\w+\b) \1', text)
print(result)  # ['the', 'fox']

Example 2: Validating quotes

python

text = 'She said "hello" and he said "goodbye"'
result = re.findall(r'(["\'])(.*?)\1', text)
print(result)  # [('"', 'hello'), ('"', 'goodbye')]

4. (?P=name) – Named Backreferences

Example 1: Named backreference

python

text = "John said: hello hello"
result = re.findall(r'(?P<word>\w+) (?P=word)', text)
print(result)  # ['hello']

Example 2: HTML tag matching

python

text = "<div>content</div> <p>text</p>"
result = re.findall(r'<(?P<tag>\w+)>.*?</(?P=tag)>', text)
print(result)  # ['div', 'p']

5. (?:...) – Non-capturing Groups

Example 1: Grouping without capturing

python

text = "abcabc ababab"
result = re.findall(r'(?:abc){2}', text)
print(result)  # ['abcabc']

Example 2: Alternation without capturing

python

text = "cat dog bird"
result = re.findall(r'(?:cat|dog|bird)', text)
print(result)  # ['cat', 'dog', 'bird']

6. (?i:...) – Case-insensitive Groups

Example 1: Case-insensitive matching

python

text = "Hello HELLO hello"
result = re.findall(r'(?i:hello)', text)
print(result)  # ['Hello', 'HELLO', 'hello']

Example 2: Mixed case sensitivity

python

text = "Python python PYTHON"
result = re.findall(r'(?i:py)thon', text)
print(result)  # ['Python', 'python', 'PYTHON']

7. (?s:...) – DOTALL Groups

Example 1: Dot matches newline within group

python

text = "start\nmiddle\nend"
result = re.findall(r'(?s:start.*end)', text)
print(result)  # ['start\nmiddle\nend']

Example 2: Mixed dot behavior

python

text = "line1\nline2\nline3"
result = re.findall(r'line1(?s:.*)line3', text)
print(result)  # ['line1\nline2\nline3']

8. (?m:...) – MULTILINE Groups

Example 1: Multiline matching within group

python

text = "first line\nsecond line\nthird line"
result = re.findall(r'(?m:^.*$)', text)
print(result)  # ['first line', 'second line', 'third line']

9. (?x:...) – VERBOSE Groups

Example 1: Ignore whitespace in pattern

python

text = "hello world"
pattern = r'(?x:hello\ world)'
result = re.findall(pattern, text)
print(result)  # ['hello world']

10. (?=...) – Positive Lookahead

Example 1: Match word before specific pattern

python

text = "running sitting walking"
result = re.findall(r'\w+(?=ing)', text)
print(result)  # ['runn', 'sitt', 'walk']

Example 2: Password validation

python

password = "Pass123!"
has_digit = bool(re.search(r'(?=.*\d)', password))
print(has_digit)  # True

11. (?!...) – Negative Lookahead

Example 1: Exclude specific endings

python

text = "happy unhappy really"
result = re.findall(r'\b\w+(?!un)\b', text)
print(result)  # ['happy', 'really']

12. (?<=...) – Positive Lookbehind

Example 1: Match after specific pattern

python

text = "$100 €200 ¥300"
result = re.findall(r'(?<=\$)\d+', text)
print(result)  # ['100']

13. (?<!...) – Negative Lookbehind

Example 1: Exclude specific prefixes

python

text = "happy unhappy really"
result = re.findall(r'(?<!un)\b\w+', text)
print(result)  # ['happy', 'really']

14. | – Alternation

Example 1: Basic OR operation

python

text = "cat dog bird fish"
result = re.findall(r'cat|dog|bird', text)
print(result)  # ['cat', 'dog', 'bird']

Example 2: Complex alternation

python

text = "color colour favor favour"
result = re.findall(r'colou?r|favou?r', text)
print(result)  # ['color', 'colour', 'favor', 'favour']

15. (?(id)yes|no) – Conditional Patterns

Example 1: Conditional matching

python

text = "<div>content</div> <p>text</p>"
result = re.findall(r'<(div|p)>(.*?)</(?(1)\1|div)>', text)
print(result)  # [('div', 'content'), ('p', 'text')]

16. (?>...) – Atomic Groups (regex module)

Example 1: Prevent backtracking

python

import regex
text = "aaaab"
result = regex.findall(r'(?>a+)b', text)
print(result)  # ['aaaab']

17. (?(DEFINE)...) – Definition Groups (regex module)

Example 1: Define reusable patterns

python

import regex
pattern = r'(?(DEFINE)(?<number>\d+))(?&number)'
text = "123 456"
result = regex.findall(pattern, text)
print(result)  # ['123', '456']

Advanced Examples

Complex Grouping with Multiple Techniques

python

# Email validation with detailed groups
email_pattern = r'''
    (?P<local>[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+)   # Local part
    @                                              # @ symbol
    (?P<domain>[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)  # Domain
    (?:\.(?P<tld>[a-zA-Z]{2,}))+                   # TLD (non-capturing)
'''

text = "user@example.com admin@test.co.uk"
matches = re.finditer(email_pattern, text, re.VERBOSE)
for match in matches:
    print(match.groupdict())

Nested Groups and Backreferences

python

# Matching nested structures (simplified)
text = "((a)(b)) ((c)(d))"
result = re.findall(r'\(([^()]*(?:\([^()]*\)[^()]*)*)\)', text)
print(result)  # ['(a)(b)', '(c)(d)']

Lookaround with Groups

python

# Extract numbers preceded by currency symbols
text = "Price: $100, Cost: €200, Value: ¥300"
result = re.findall(r'(?<=[$€¥])(\d+)', text)
print(result)  # ['100', '200', '300']

Group Numbering Rules

  1. Left to right: Groups are numbered by opening parentheses
  2. Nested groups: Outer groups get lower numbers
  3. Non-capturing groups: Don’t affect numbering
  4. Named groups: Still get numbers in addition to names

python

# Group numbering example
text = "2024-01-15"
pattern = r'((\d{4})-(\d{2}))-(\d{2})'
match = re.search(pattern, text)
print("All groups:", match.groups())
print("Group 1:", match.group(1))  # 2024-01
print("Group 2:", match.group(2))  # 2024
print("Group 3:", match.group(3))  # 01
print("Group 4:", match.group(4))  # 15

Similar Posts

  • String Alignment and Padding in Python

    String Alignment and Padding in Python In Python, you can align and pad strings to make them visually consistent in output. The main methods used for this are: 1. str.ljust(width, fillchar) Left-aligns the string and fills remaining space with a specified character (default: space). Syntax: python string.ljust(width, fillchar=’ ‘) Example: python text = “Python” print(text.ljust(10)) #…

  • Variable Length Positional Arguments in Python

    Variable Length Positional Arguments in Python Variable length positional arguments allow a function to accept any number of positional arguments. This is done using the *args syntax. Syntax python def function_name(*args): # function body # args becomes a tuple containing all positional arguments Simple Examples Example 1: Basic *args python def print_numbers(*args): print(“Numbers received:”, args) print(“Type of…

  • re module

    The re module is Python’s built-in module for regular expressions (regex). It provides functions and methods to work with strings using pattern matching, allowing you to search, extract, replace, and split text based on complex patterns. Key Functions in the re Module 1. Searching and Matching python import re text = “The quick brown fox jumps over the lazy dog” # re.search()…

  • List of machine learning libraries in python

    Foundational Libraries: General Machine Learning Libraries: Deep Learning Libraries: Other Important Libraries: This is not an exhaustive list, but it covers many of the most important and widely used machine learning libraries in Python. The choice of which library to use often depends on the specific task at hand, the size and type of data,…

  • file properties and methods

    1. file.closed – Is the file door shut? Think of a file like a door. file.closed tells you if the door is open or closed. python # Open the file (open the door) f = open(“test.txt”, “w”) f.write(“Hello!”) print(f.closed) # Output: False (door is open) # Close the file (close the door) f.close() print(f.closed) # Output: True (door is…

Leave a Reply

Your email address will not be published. Required fields are marked *