Alternation and Grouping

Complete List of Alternation and Grouping in Python Regular Expressions

Grouping Constructs

Capturing Groups

PatternDescriptionExample
(...)Capturing group(abc)
(?P<name>...)Named capturing group(?P<word>\w+)
\1\2, etc.Backreferences to groups(a)\1 matches “aa”
(?P=name)Named backreference(?P<word>\w+) (?P=word)

Non-Capturing Groups

PatternDescriptionExample
(?:...)Non-capturing group(?:abc)+
(?i:...)Case-insensitive group(?i:hello)
(?s:...)DOTALL group (. matches newline)(?s:.*)
(?m:...)MULTILINE group(?m:^.*$)
(?x:...)VERBOSE group (ignore whitespace)(?x:a b c)

Lookaround Groups (Zero-width Assertions)

PatternDescriptionExample
(?=...)Positive lookaheadword(?=ing)
(?!...)Negative lookaheadword(?!ing)
(?<=...)Positive lookbehind(?<=pre)word
(?<!...)Negative lookbehind(?<!un)word

Alternation Constructs

PatternDescriptionExample
``Alternation (OR)`catdog`
[abc]Character class (single char OR)[abc]
`(?(id)yesno)`Conditional pattern`(?(1)ab)`

Atomic Groups (Python regex module only)

PatternDescriptionExample
(?>...)Atomic group(?>a+)
(?(DEFINE)...)Definition group(?(DEFINE)(?<word>\w+))

Detailed Explanation with Examples

1. (...) – Capturing Groups

Example 1: Basic capturing

python

import re
text = "John Doe, Jane Smith"
result = re.findall(r'(\w+) (\w+)', text)
print(result)  # [('John', 'Doe'), ('Jane', 'Smith')]

Example 2: Extracting date components

python

text = "2024-01-15 2023-12-25"
result = re.findall(r'(\d{4})-(\d{2})-(\d{2})', text)
print(result)  # [('2024', '01', '15'), ('2023', '12', '25')]

2. (?P<name>...) – Named Capturing Groups

Example 1: Named groups

python

text = "John: 25, Jane: 30"
result = re.findall(r'(?P<name>\w+): (?P<age>\d+)', text)
print(result)  # [('John', '25'), ('Jane', '30')]

Example 2: Using groupdict()

python

text = "Name: John, Age: 25"
match = re.search(r'Name: (?P<name>\w+), Age: (?P<age>\d+)', text)
print(match.groupdict())  # {'name': 'John', 'age': '25'}

3. \1\2 – Backreferences

Example 1: Matching repeated words

python

text = "the the quick brown fox fox jumps"
result = re.findall(r'(\b\w+\b) \1', text)
print(result)  # ['the', 'fox']

Example 2: Validating quotes

python

text = 'She said "hello" and he said "goodbye"'
result = re.findall(r'(["\'])(.*?)\1', text)
print(result)  # [('"', 'hello'), ('"', 'goodbye')]

4. (?P=name) – Named Backreferences

Example 1: Named backreference

python

text = "John said: hello hello"
result = re.findall(r'(?P<word>\w+) (?P=word)', text)
print(result)  # ['hello']

Example 2: HTML tag matching

python

text = "<div>content</div> <p>text</p>"
result = re.findall(r'<(?P<tag>\w+)>.*?</(?P=tag)>', text)
print(result)  # ['div', 'p']

5. (?:...) – Non-capturing Groups

Example 1: Grouping without capturing

python

text = "abcabc ababab"
result = re.findall(r'(?:abc){2}', text)
print(result)  # ['abcabc']

Example 2: Alternation without capturing

python

text = "cat dog bird"
result = re.findall(r'(?:cat|dog|bird)', text)
print(result)  # ['cat', 'dog', 'bird']

6. (?i:...) – Case-insensitive Groups

Example 1: Case-insensitive matching

python

text = "Hello HELLO hello"
result = re.findall(r'(?i:hello)', text)
print(result)  # ['Hello', 'HELLO', 'hello']

Example 2: Mixed case sensitivity

python

text = "Python python PYTHON"
result = re.findall(r'(?i:py)thon', text)
print(result)  # ['Python', 'python', 'PYTHON']

7. (?s:...) – DOTALL Groups

Example 1: Dot matches newline within group

python

text = "start\nmiddle\nend"
result = re.findall(r'(?s:start.*end)', text)
print(result)  # ['start\nmiddle\nend']

Example 2: Mixed dot behavior

python

text = "line1\nline2\nline3"
result = re.findall(r'line1(?s:.*)line3', text)
print(result)  # ['line1\nline2\nline3']

8. (?m:...) – MULTILINE Groups

Example 1: Multiline matching within group

python

text = "first line\nsecond line\nthird line"
result = re.findall(r'(?m:^.*$)', text)
print(result)  # ['first line', 'second line', 'third line']

9. (?x:...) – VERBOSE Groups

Example 1: Ignore whitespace in pattern

python

text = "hello world"
pattern = r'(?x:hello\ world)'
result = re.findall(pattern, text)
print(result)  # ['hello world']

10. (?=...) – Positive Lookahead

Example 1: Match word before specific pattern

python

text = "running sitting walking"
result = re.findall(r'\w+(?=ing)', text)
print(result)  # ['runn', 'sitt', 'walk']

Example 2: Password validation

python

password = "Pass123!"
has_digit = bool(re.search(r'(?=.*\d)', password))
print(has_digit)  # True

11. (?!...) – Negative Lookahead

Example 1: Exclude specific endings

python

text = "happy unhappy really"
result = re.findall(r'\b\w+(?!un)\b', text)
print(result)  # ['happy', 'really']

12. (?<=...) – Positive Lookbehind

Example 1: Match after specific pattern

python

text = "$100 €200 ¥300"
result = re.findall(r'(?<=\$)\d+', text)
print(result)  # ['100']

13. (?<!...) – Negative Lookbehind

Example 1: Exclude specific prefixes

python

text = "happy unhappy really"
result = re.findall(r'(?<!un)\b\w+', text)
print(result)  # ['happy', 'really']

14. | – Alternation

Example 1: Basic OR operation

python

text = "cat dog bird fish"
result = re.findall(r'cat|dog|bird', text)
print(result)  # ['cat', 'dog', 'bird']

Example 2: Complex alternation

python

text = "color colour favor favour"
result = re.findall(r'colou?r|favou?r', text)
print(result)  # ['color', 'colour', 'favor', 'favour']

15. (?(id)yes|no) – Conditional Patterns

Example 1: Conditional matching

python

text = "<div>content</div> <p>text</p>"
result = re.findall(r'<(div|p)>(.*?)</(?(1)\1|div)>', text)
print(result)  # [('div', 'content'), ('p', 'text')]

16. (?>...) – Atomic Groups (regex module)

Example 1: Prevent backtracking

python

import regex
text = "aaaab"
result = regex.findall(r'(?>a+)b', text)
print(result)  # ['aaaab']

17. (?(DEFINE)...) – Definition Groups (regex module)

Example 1: Define reusable patterns

python

import regex
pattern = r'(?(DEFINE)(?<number>\d+))(?&number)'
text = "123 456"
result = regex.findall(pattern, text)
print(result)  # ['123', '456']

Advanced Examples

Complex Grouping with Multiple Techniques

python

# Email validation with detailed groups
email_pattern = r'''
    (?P<local>[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+)   # Local part
    @                                              # @ symbol
    (?P<domain>[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)  # Domain
    (?:\.(?P<tld>[a-zA-Z]{2,}))+                   # TLD (non-capturing)
'''

text = "user@example.com admin@test.co.uk"
matches = re.finditer(email_pattern, text, re.VERBOSE)
for match in matches:
    print(match.groupdict())

Nested Groups and Backreferences

python

# Matching nested structures (simplified)
text = "((a)(b)) ((c)(d))"
result = re.findall(r'\(([^()]*(?:\([^()]*\)[^()]*)*)\)', text)
print(result)  # ['(a)(b)', '(c)(d)']

Lookaround with Groups

python

# Extract numbers preceded by currency symbols
text = "Price: $100, Cost: €200, Value: ¥300"
result = re.findall(r'(?<=[$€¥])(\d+)', text)
print(result)  # ['100', '200', '300']

Group Numbering Rules

  1. Left to right: Groups are numbered by opening parentheses
  2. Nested groups: Outer groups get lower numbers
  3. Non-capturing groups: Don’t affect numbering
  4. Named groups: Still get numbers in addition to names

python

# Group numbering example
text = "2024-01-15"
pattern = r'((\d{4})-(\d{2}))-(\d{2})'
match = re.search(pattern, text)
print("All groups:", match.groups())
print("Group 1:", match.group(1))  # 2024-01
print("Group 2:", match.group(2))  # 2024
print("Group 3:", match.group(3))  # 01
print("Group 4:", match.group(4))  # 15

Similar Posts

  • Unlock the Power of Python: What is Python, History, Uses, & 7 Amazing Applications

    What is Python and History of python, different sectors python used Python is one of the most popular programming languages worldwide, known for its versatility and beginner-friendliness . From web development to data science and machine learning, Python has become an indispensable tool for developers and tech professionals across various industries . This blog post…

  • Number Manipulation and F-Strings in Python, with examples:

    Python, mathematical operators are symbols that perform arithmetic operations on numerical values. Here’s a breakdown of the key operators: Basic Arithmetic Operators: Other Important Operators: Operator Precedence: Python follows the standard mathematical order of operations (often remembered by the acronym PEMDAS or BODMAS): Understanding these operators and their precedence is essential for performing calculations in…

  • Special Sequences in Python

    Special Sequences in Python Regular Expressions – Detailed Explanation Special sequences are escape sequences that represent specific character types or positions in regex patterns. 1. \A – Start of String Anchor Description: Matches only at the absolute start of the string (unaffected by re.MULTILINE flag) Example 1: Match only at absolute beginning python import re text = “Start here\nStart…

  • Quantifiers (Repetition)

    Quantifiers (Repetition) in Python Regular Expressions – Detailed Explanation Basic Quantifiers 1. * – 0 or more occurrences (Greedy) Description: Matches the preceding element zero or more times Example 1: Match zero or more digits python import re text = “123 4567 89″ result = re.findall(r’\d*’, text) print(result) # [‘123’, ”, ‘4567’, ”, ’89’, ”] # Matches…

Leave a Reply

Your email address will not be published. Required fields are marked *