Alternation and Grouping

Complete List of Alternation and Grouping in Python Regular Expressions

Grouping Constructs

Capturing Groups

PatternDescriptionExample
(...)Capturing group(abc)
(?P<name>...)Named capturing group(?P<word>\w+)
\1\2, etc.Backreferences to groups(a)\1 matches “aa”
(?P=name)Named backreference(?P<word>\w+) (?P=word)

Non-Capturing Groups

PatternDescriptionExample
(?:...)Non-capturing group(?:abc)+
(?i:...)Case-insensitive group(?i:hello)
(?s:...)DOTALL group (. matches newline)(?s:.*)
(?m:...)MULTILINE group(?m:^.*$)
(?x:...)VERBOSE group (ignore whitespace)(?x:a b c)

Lookaround Groups (Zero-width Assertions)

PatternDescriptionExample
(?=...)Positive lookaheadword(?=ing)
(?!...)Negative lookaheadword(?!ing)
(?<=...)Positive lookbehind(?<=pre)word
(?<!...)Negative lookbehind(?<!un)word

Alternation Constructs

PatternDescriptionExample
``Alternation (OR)`catdog`
[abc]Character class (single char OR)[abc]
`(?(id)yesno)`Conditional pattern`(?(1)ab)`

Atomic Groups (Python regex module only)

PatternDescriptionExample
(?>...)Atomic group(?>a+)
(?(DEFINE)...)Definition group(?(DEFINE)(?<word>\w+))

Detailed Explanation with Examples

1. (...) – Capturing Groups

Example 1: Basic capturing

python

import re
text = "John Doe, Jane Smith"
result = re.findall(r'(\w+) (\w+)', text)
print(result)  # [('John', 'Doe'), ('Jane', 'Smith')]

Example 2: Extracting date components

python

text = "2024-01-15 2023-12-25"
result = re.findall(r'(\d{4})-(\d{2})-(\d{2})', text)
print(result)  # [('2024', '01', '15'), ('2023', '12', '25')]

2. (?P<name>...) – Named Capturing Groups

Example 1: Named groups

python

text = "John: 25, Jane: 30"
result = re.findall(r'(?P<name>\w+): (?P<age>\d+)', text)
print(result)  # [('John', '25'), ('Jane', '30')]

Example 2: Using groupdict()

python

text = "Name: John, Age: 25"
match = re.search(r'Name: (?P<name>\w+), Age: (?P<age>\d+)', text)
print(match.groupdict())  # {'name': 'John', 'age': '25'}

3. \1\2 – Backreferences

Example 1: Matching repeated words

python

text = "the the quick brown fox fox jumps"
result = re.findall(r'(\b\w+\b) \1', text)
print(result)  # ['the', 'fox']

Example 2: Validating quotes

python

text = 'She said "hello" and he said "goodbye"'
result = re.findall(r'(["\'])(.*?)\1', text)
print(result)  # [('"', 'hello'), ('"', 'goodbye')]

4. (?P=name) – Named Backreferences

Example 1: Named backreference

python

text = "John said: hello hello"
result = re.findall(r'(?P<word>\w+) (?P=word)', text)
print(result)  # ['hello']

Example 2: HTML tag matching

python

text = "<div>content</div> <p>text</p>"
result = re.findall(r'<(?P<tag>\w+)>.*?</(?P=tag)>', text)
print(result)  # ['div', 'p']

5. (?:...) – Non-capturing Groups

Example 1: Grouping without capturing

python

text = "abcabc ababab"
result = re.findall(r'(?:abc){2}', text)
print(result)  # ['abcabc']

Example 2: Alternation without capturing

python

text = "cat dog bird"
result = re.findall(r'(?:cat|dog|bird)', text)
print(result)  # ['cat', 'dog', 'bird']

6. (?i:...) – Case-insensitive Groups

Example 1: Case-insensitive matching

python

text = "Hello HELLO hello"
result = re.findall(r'(?i:hello)', text)
print(result)  # ['Hello', 'HELLO', 'hello']

Example 2: Mixed case sensitivity

python

text = "Python python PYTHON"
result = re.findall(r'(?i:py)thon', text)
print(result)  # ['Python', 'python', 'PYTHON']

7. (?s:...) – DOTALL Groups

Example 1: Dot matches newline within group

python

text = "start\nmiddle\nend"
result = re.findall(r'(?s:start.*end)', text)
print(result)  # ['start\nmiddle\nend']

Example 2: Mixed dot behavior

python

text = "line1\nline2\nline3"
result = re.findall(r'line1(?s:.*)line3', text)
print(result)  # ['line1\nline2\nline3']

8. (?m:...) – MULTILINE Groups

Example 1: Multiline matching within group

python

text = "first line\nsecond line\nthird line"
result = re.findall(r'(?m:^.*$)', text)
print(result)  # ['first line', 'second line', 'third line']

9. (?x:...) – VERBOSE Groups

Example 1: Ignore whitespace in pattern

python

text = "hello world"
pattern = r'(?x:hello\ world)'
result = re.findall(pattern, text)
print(result)  # ['hello world']

10. (?=...) – Positive Lookahead

Example 1: Match word before specific pattern

python

text = "running sitting walking"
result = re.findall(r'\w+(?=ing)', text)
print(result)  # ['runn', 'sitt', 'walk']

Example 2: Password validation

python

password = "Pass123!"
has_digit = bool(re.search(r'(?=.*\d)', password))
print(has_digit)  # True

11. (?!...) – Negative Lookahead

Example 1: Exclude specific endings

python

text = "happy unhappy really"
result = re.findall(r'\b\w+(?!un)\b', text)
print(result)  # ['happy', 'really']

12. (?<=...) – Positive Lookbehind

Example 1: Match after specific pattern

python

text = "$100 €200 ¥300"
result = re.findall(r'(?<=\$)\d+', text)
print(result)  # ['100']

13. (?<!...) – Negative Lookbehind

Example 1: Exclude specific prefixes

python

text = "happy unhappy really"
result = re.findall(r'(?<!un)\b\w+', text)
print(result)  # ['happy', 'really']

14. | – Alternation

Example 1: Basic OR operation

python

text = "cat dog bird fish"
result = re.findall(r'cat|dog|bird', text)
print(result)  # ['cat', 'dog', 'bird']

Example 2: Complex alternation

python

text = "color colour favor favour"
result = re.findall(r'colou?r|favou?r', text)
print(result)  # ['color', 'colour', 'favor', 'favour']

15. (?(id)yes|no) – Conditional Patterns

Example 1: Conditional matching

python

text = "<div>content</div> <p>text</p>"
result = re.findall(r'<(div|p)>(.*?)</(?(1)\1|div)>', text)
print(result)  # [('div', 'content'), ('p', 'text')]

16. (?>...) – Atomic Groups (regex module)

Example 1: Prevent backtracking

python

import regex
text = "aaaab"
result = regex.findall(r'(?>a+)b', text)
print(result)  # ['aaaab']

17. (?(DEFINE)...) – Definition Groups (regex module)

Example 1: Define reusable patterns

python

import regex
pattern = r'(?(DEFINE)(?<number>\d+))(?&number)'
text = "123 456"
result = regex.findall(pattern, text)
print(result)  # ['123', '456']

Advanced Examples

Complex Grouping with Multiple Techniques

python

# Email validation with detailed groups
email_pattern = r'''
    (?P<local>[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+)   # Local part
    @                                              # @ symbol
    (?P<domain>[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)  # Domain
    (?:\.(?P<tld>[a-zA-Z]{2,}))+                   # TLD (non-capturing)
'''

text = "user@example.com admin@test.co.uk"
matches = re.finditer(email_pattern, text, re.VERBOSE)
for match in matches:
    print(match.groupdict())

Nested Groups and Backreferences

python

# Matching nested structures (simplified)
text = "((a)(b)) ((c)(d))"
result = re.findall(r'\(([^()]*(?:\([^()]*\)[^()]*)*)\)', text)
print(result)  # ['(a)(b)', '(c)(d)']

Lookaround with Groups

python

# Extract numbers preceded by currency symbols
text = "Price: $100, Cost: €200, Value: ¥300"
result = re.findall(r'(?<=[$€¥])(\d+)', text)
print(result)  # ['100', '200', '300']

Group Numbering Rules

  1. Left to right: Groups are numbered by opening parentheses
  2. Nested groups: Outer groups get lower numbers
  3. Non-capturing groups: Don’t affect numbering
  4. Named groups: Still get numbers in addition to names

python

# Group numbering example
text = "2024-01-15"
pattern = r'((\d{4})-(\d{2}))-(\d{2})'
match = re.search(pattern, text)
print("All groups:", match.groups())
print("Group 1:", match.group(1))  # 2024-01
print("Group 2:", match.group(2))  # 2024
print("Group 3:", match.group(3))  # 01
print("Group 4:", match.group(4))  # 15

Similar Posts

  • Type Conversion Functions

    Type Conversion Functions in Python 🔄 Type conversion (or type casting) transforms data from one type to another. Python provides built-in functions for these conversions. Here’s a comprehensive guide with examples: 1. int(x) 🔢 Converts x to an integer. Python 2. float(x) afloat Converts x to a floating-point number. Python 3. str(x) 💬 Converts x…

  • sqlite3 create table

    The sqlite3 module is the standard library for working with the SQLite database in Python. It provides an interface compliant with the DB-API 2.0 specification, allowing you to easily connect to, create, and interact with SQLite databases using SQL commands directly from your Python code. It is particularly popular because SQLite is a serverless database…

  • Else Block in Exception Handling in Python

    Else Block in Exception Handling in Python The else block in Python exception handling executes only if the try block completes successfully without any exceptions. It’s placed after all except blocks and before the finally block. Basic Syntax: python try: # Code that might raise an exception except SomeException: # Handle the exception else: # Code that runs only if no exception…

  • Python Nested Lists

    Python Nested Lists: Explanation & Examples A nested list is a list that contains other lists as its elements. They are commonly used to represent matrices, tables, or hierarchical data structures. 1. Basic Nested List Creation python # A simple 2D list (matrix) matrix = [ [1, 2, 3], [4, 5, 6], [7, 8, 9]…

  • positive lookbehind assertion

    A positive lookbehind assertion in Python’s re module is a zero-width assertion that checks if the pattern that precedes it is present, without including that pattern in the overall match. It’s the opposite of a lookahead. It is written as (?<=…). The key constraint for lookbehind assertions in Python is that the pattern inside the…

  • Python Input Function: A Beginner’s Guide with Examples

    The input() function in Python is used to take user input from the keyboard. It allows your program to interact with the user by prompting them to enter data, which can then be used in your code. By default, the input() function returns the user’s input as a string. Syntax of input() python Copy input(prompt) Key Points About input() Basic Examples of input() Example…

Leave a Reply

Your email address will not be published. Required fields are marked *