Alternation and Grouping
Complete List of Alternation and Grouping in Python Regular Expressions
Grouping Constructs
Capturing Groups
| Pattern | Description | Example |
|---|---|---|
(...) | Capturing group | (abc) |
(?P<name>...) | Named capturing group | (?P<word>\w+) |
\1, \2, etc. | Backreferences to groups | (a)\1 matches “aa” |
(?P=name) | Named backreference | (?P<word>\w+) (?P=word) |
Non-Capturing Groups
| Pattern | Description | Example |
|---|---|---|
(?:...) | Non-capturing group | (?:abc)+ |
(?i:...) | Case-insensitive group | (?i:hello) |
(?s:...) | DOTALL group (. matches newline) | (?s:.*) |
(?m:...) | MULTILINE group | (?m:^.*$) |
(?x:...) | VERBOSE group (ignore whitespace) | (?x:a b c) |
Lookaround Groups (Zero-width Assertions)
| Pattern | Description | Example |
|---|---|---|
(?=...) | Positive lookahead | word(?=ing) |
(?!...) | Negative lookahead | word(?!ing) |
(?<=...) | Positive lookbehind | (?<=pre)word |
(?<!...) | Negative lookbehind | (?<!un)word |
Alternation Constructs
| Pattern | Description | Example | ||
|---|---|---|---|---|
| ` | ` | Alternation (OR) | `cat | dog` |
[abc] | Character class (single char OR) | [abc] | ||
| `(?(id)yes | no)` | Conditional pattern | `(?(1)a | b)` |
Atomic Groups (Python regex module only)
| Pattern | Description | Example |
|---|---|---|
(?>...) | Atomic group | (?>a+) |
(?(DEFINE)...) | Definition group | (?(DEFINE)(?<word>\w+)) |
Detailed Explanation with Examples
1. (...) – Capturing Groups
Example 1: Basic capturing
python
import re
text = "John Doe, Jane Smith"
result = re.findall(r'(\w+) (\w+)', text)
print(result) # [('John', 'Doe'), ('Jane', 'Smith')]
Example 2: Extracting date components
python
text = "2024-01-15 2023-12-25"
result = re.findall(r'(\d{4})-(\d{2})-(\d{2})', text)
print(result) # [('2024', '01', '15'), ('2023', '12', '25')]
2. (?P<name>...) – Named Capturing Groups
Example 1: Named groups
python
text = "John: 25, Jane: 30"
result = re.findall(r'(?P<name>\w+): (?P<age>\d+)', text)
print(result) # [('John', '25'), ('Jane', '30')]
Example 2: Using groupdict()
python
text = "Name: John, Age: 25"
match = re.search(r'Name: (?P<name>\w+), Age: (?P<age>\d+)', text)
print(match.groupdict()) # {'name': 'John', 'age': '25'}
3. \1, \2 – Backreferences
Example 1: Matching repeated words
python
text = "the the quick brown fox fox jumps" result = re.findall(r'(\b\w+\b) \1', text) print(result) # ['the', 'fox']
Example 2: Validating quotes
python
text = 'She said "hello" and he said "goodbye"'
result = re.findall(r'(["\'])(.*?)\1', text)
print(result) # [('"', 'hello'), ('"', 'goodbye')]
4. (?P=name) – Named Backreferences
Example 1: Named backreference
python
text = "John said: hello hello" result = re.findall(r'(?P<word>\w+) (?P=word)', text) print(result) # ['hello']
Example 2: HTML tag matching
python
text = "<div>content</div> <p>text</p>" result = re.findall(r'<(?P<tag>\w+)>.*?</(?P=tag)>', text) print(result) # ['div', 'p']
5. (?:...) – Non-capturing Groups
Example 1: Grouping without capturing
python
text = "abcabc ababab"
result = re.findall(r'(?:abc){2}', text)
print(result) # ['abcabc']
Example 2: Alternation without capturing
python
text = "cat dog bird" result = re.findall(r'(?:cat|dog|bird)', text) print(result) # ['cat', 'dog', 'bird']
6. (?i:...) – Case-insensitive Groups
Example 1: Case-insensitive matching
python
text = "Hello HELLO hello" result = re.findall(r'(?i:hello)', text) print(result) # ['Hello', 'HELLO', 'hello']
Example 2: Mixed case sensitivity
python
text = "Python python PYTHON" result = re.findall(r'(?i:py)thon', text) print(result) # ['Python', 'python', 'PYTHON']
7. (?s:...) – DOTALL Groups
Example 1: Dot matches newline within group
python
text = "start\nmiddle\nend" result = re.findall(r'(?s:start.*end)', text) print(result) # ['start\nmiddle\nend']
Example 2: Mixed dot behavior
python
text = "line1\nline2\nline3" result = re.findall(r'line1(?s:.*)line3', text) print(result) # ['line1\nline2\nline3']
8. (?m:...) – MULTILINE Groups
Example 1: Multiline matching within group
python
text = "first line\nsecond line\nthird line" result = re.findall(r'(?m:^.*$)', text) print(result) # ['first line', 'second line', 'third line']
9. (?x:...) – VERBOSE Groups
Example 1: Ignore whitespace in pattern
python
text = "hello world" pattern = r'(?x:hello\ world)' result = re.findall(pattern, text) print(result) # ['hello world']
10. (?=...) – Positive Lookahead
Example 1: Match word before specific pattern
python
text = "running sitting walking" result = re.findall(r'\w+(?=ing)', text) print(result) # ['runn', 'sitt', 'walk']
Example 2: Password validation
python
password = "Pass123!" has_digit = bool(re.search(r'(?=.*\d)', password)) print(has_digit) # True
11. (?!...) – Negative Lookahead
Example 1: Exclude specific endings
python
text = "happy unhappy really" result = re.findall(r'\b\w+(?!un)\b', text) print(result) # ['happy', 'really']
12. (?<=...) – Positive Lookbehind
Example 1: Match after specific pattern
python
text = "$100 €200 ¥300" result = re.findall(r'(?<=\$)\d+', text) print(result) # ['100']
13. (?<!...) – Negative Lookbehind
Example 1: Exclude specific prefixes
python
text = "happy unhappy really" result = re.findall(r'(?<!un)\b\w+', text) print(result) # ['happy', 'really']
14. | – Alternation
Example 1: Basic OR operation
python
text = "cat dog bird fish" result = re.findall(r'cat|dog|bird', text) print(result) # ['cat', 'dog', 'bird']
Example 2: Complex alternation
python
text = "color colour favor favour" result = re.findall(r'colou?r|favou?r', text) print(result) # ['color', 'colour', 'favor', 'favour']
15. (?(id)yes|no) – Conditional Patterns
Example 1: Conditional matching
python
text = "<div>content</div> <p>text</p>"
result = re.findall(r'<(div|p)>(.*?)</(?(1)\1|div)>', text)
print(result) # [('div', 'content'), ('p', 'text')]
16. (?>...) – Atomic Groups (regex module)
Example 1: Prevent backtracking
python
import regex text = "aaaab" result = regex.findall(r'(?>a+)b', text) print(result) # ['aaaab']
17. (?(DEFINE)...) – Definition Groups (regex module)
Example 1: Define reusable patterns
python
import regex pattern = r'(?(DEFINE)(?<number>\d+))(?&number)' text = "123 456" result = regex.findall(pattern, text) print(result) # ['123', '456']
Advanced Examples
Complex Grouping with Multiple Techniques
python
# Email validation with detailed groups
email_pattern = r'''
(?P<local>[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+) # Local part
@ # @ symbol
(?P<domain>[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?) # Domain
(?:\.(?P<tld>[a-zA-Z]{2,}))+ # TLD (non-capturing)
'''
text = "user@example.com admin@test.co.uk"
matches = re.finditer(email_pattern, text, re.VERBOSE)
for match in matches:
print(match.groupdict())
Nested Groups and Backreferences
python
# Matching nested structures (simplified) text = "((a)(b)) ((c)(d))" result = re.findall(r'\(([^()]*(?:\([^()]*\)[^()]*)*)\)', text) print(result) # ['(a)(b)', '(c)(d)']
Lookaround with Groups
python
# Extract numbers preceded by currency symbols text = "Price: $100, Cost: €200, Value: ¥300" result = re.findall(r'(?<=[$€¥])(\d+)', text) print(result) # ['100', '200', '300']
Group Numbering Rules
- Left to right: Groups are numbered by opening parentheses
- Nested groups: Outer groups get lower numbers
- Non-capturing groups: Don’t affect numbering
- Named groups: Still get numbers in addition to names
python
# Group numbering example
text = "2024-01-15"
pattern = r'((\d{4})-(\d{2}))-(\d{2})'
match = re.search(pattern, text)
print("All groups:", match.groups())
print("Group 1:", match.group(1)) # 2024-01
print("Group 2:", match.group(2)) # 2024
print("Group 3:", match.group(3)) # 01
print("Group 4:", match.group(4)) # 15