group() and groups()

Python re group() and groups() Methods Explained

The group() and groups() methods are used with match objects to extract captured groups from regex patterns. They work on the result of re.search()re.match(), or re.finditer().

>>> import re
>>> result = re.search(r".+\s(.+ex).+(\d\d\s.+).", string)
>>> result.groups()
#('index', '19 February')
>>> result.group(1)
'index'
>>> result.group(2)
'19 February'
>>> result.group()
'The Euro STOXX 600 index, which tracks all stock markets across Europe including the FTSE, fell by 11.48% – the worst day since it launched in 1998. The panic selling prompted by the coronavirus has wiped £2.7tn off the value of STOXX 600 shares since its all-time peak on 19 February.'
>>> result.group(0)
'The Euro STOXX 600 index, which tracks all stock markets across Europe including the FTSE, fell by 11.48% – the worst day since it launched in 1998. The panic selling prompted by the coronavirus has wiped £2.7tn off the value of STOXX 600 shares since its all-time peak on 19 February.'
>>> result.group(1, 2)
('index', '19 February')
>>> result = re.findall(r".+\s(.+ex).+(\d\d\s.+).", string)
>>> result
[('index', '19 February')]
>>> result.group()
Traceback (most recent call last):
  File "<pyshell#130>", line 1, in <module>
    result.group()
AttributeError: 'list' object has no attribute 'group'
>>> result.groups()
Traceback (most recent call last):
  File "<pyshell#131>", line 1, in <module>
    result.groups()
AttributeError: 'list' object has no attribute 'groups'
The Pattern: r".+\s(.+ex).+(\d\d\s.+)."
Breakdown:
1. .+ - Greedy Match
. = any character except newline

+ = one or more times

Matches: As many characters as possible until the next part

2. \s - Whitespace Character
\s = any whitespace (space, tab, newline)

Matches: A single whitespace character

3. (.+ex) - First Capture Group
.+ = one or more of any character

ex = literal characters "ex"

( ) = capture this group

Matches: Any text that ends with "ex" (e.g., "index", "complex", "regex")

4. .+ - Greedy Match
Matches: More characters until the next part

5. (\d\d\s.+) - Second Capture Group
\d\d = exactly two digits (e.g., "11", "25", "99")

\s = whitespace character

.+ = one or more of any character

( ) = capture this group

Matches: Two digits followed by space and then more text

6. . - Final Character
. = any character except newline

Matches: One final character at the end

group() Method

  • Returns specific captured group(s) from a match
  • group(0) returns the entire match
  • group(1) returns the first capture group, group(2) the second, etc.

groups() Method

  • Returns all captured groups as a tuple
  • Returns empty tuple if no groups were captured

string = "The Euro STOXX 600 index, which tracks all stock markets across Europe including the FTSE, fell by 11.48% – the worst day since it launched in 1998. The panic selling prompted by the coronavirus has wiped £2.7tn off the value of STOXX 600 shares since its all-time peak on 19 February."

Example 1: Basic Group Extraction

python

import re

text = "John Doe, age 30, email: john.doe@email.com"

# Pattern with multiple capture groups
pattern = r'(\w+)\s+(\w+),\s+age\s+(\d+),\s+email:\s+([\w.]+@[\w.]+)'


///
The Pattern: r'(\w+)\s+(\w+),\s+age\s+(\d+),\s+email:\s+([\w.]+@[\w.]+)'
Breakdown by Capture Groups:
1. (\w+) - First Capture Group (First Name)
\w+ = one or more word characters (letters, digits, underscores)

( ) = capture this group

Matches: First name (e.g., "John")

2. \s+ - Whitespace
\s+ = one or more whitespace characters (spaces, tabs)

Matches: The space between first and last name

3. (\w+) - Second Capture Group (Last Name)
\w+ = one or more word characters

( ) = capture this group

Matches: Last name (e.g., "Doe")

4. ,\s+age\s+ - Literal Text
, = literal comma

\s+ = one or more whitespace characters

age = literal word "age"

\s+ = one or more whitespace characters

Matches: ", age " (with possible spaces)

5. (\d+) - Third Capture Group (Age)
\d+ = one or more digits

( ) = capture this group

Matches: Age (e.g., "30")

6. ,\s+email:\s+ - Literal Text
, = literal comma

\s+ = one or more whitespace characters

email: = literal "email:"

\s+ = one or more whitespace characters

Matches: ", email: " (with possible spaces)

7. ([\w.]+@[\w.]+) - Fourth Capture Group (Email)
[\w.]+ = one or more word characters or dots (username part)

@ = literal @ symbol

[\w.]+ = one or more word characters or dots (domain part)

( ) = capture this group

Matches: Email address (e.g., "john.doe@email.com")
///




match = re.search(pattern, text)

if match:
print("Entire match:", match.group(0))
print("First name:", match.group(1))
print("Last name:", match.group(2))
print("Age:", match.group(3))
print("Email:", match.group(4))

print("\nAll groups as tuple:", match.groups())

# Access multiple groups at once
print("First and last name:", match.group(1, 2))
else:
print("No match found")

Output:

text

Entire match: John Doe, age 30, email: john.doe@email.com
First name: John
Last name: Doe
Age: 30
Email: john.doe@email.com

All groups as tuple: ('John', 'Doe', '30', 'john.doe@email.com')
First and last name: ('John', 'Doe')

Example 2: Date Parsing with Groups

python

import re

dates = [
    "2023-12-25",
    "12/25/2023", 
    "25-12-2023",
    "Invalid date"
]

# Different date formats with capture groups
patterns = [
    r'(\d{4})-(\d{2})-(\d{2})',  # YYYY-MM-DD
    r'(\d{2})/(\d{2})/(\d{4})',  # MM/DD/YYYY
    r'(\d{2})-(\d{2})-(\d{4})'   # DD-MM-YYYY
]

for date in dates:
    matched = False
    for pattern in patterns:
        match = re.search(pattern, date)
        if match:
            print(f"Date: {date}")
            print(f"Groups: {match.groups()}")
            print(f"Year: {match.group(1)} in pattern: {pattern}")
            print("---")
            matched = True
            break
    
    if not matched:
        print(f"No valid date format found for: {date}")
        print("---")

Output:

text

Date: 2023-12-25
Groups: ('2023', '12', '25')
Year: 2023 in pattern: (\d{4})-(\d{2})-(\d{2})
---
Date: 12/25/2023
Groups: ('12', '25', '2023')
Year: 12 in pattern: (\d{2})/(\d{2})/(\d{4})
---
Date: 25-12-2023
Groups: ('25', '12', '2023')
Year: 25 in pattern: (\d{2})-(\d{2})-(\d{4})
---
No valid date format found for: Invalid date
---

Example 3: Advanced Group Usage with finditer()

python

import re

html_content = """
<div class="product">
    <h3>Laptop</h3>
    <p class="price">$999.99</p>
    <p class="rating">4.5 stars</p>
</div>
<div class="product">
    <h3>Smartphone</h3>
    <p class="price">$599.50</p>
    <p class="rating">4.2 stars</p>
</div>
"""

# Extract product information using named groups (Python 3.6+)
pattern = r'<h3>(?P<name>.*?)</h3>.*?<p class="price">\$(?P<price>.*?)</p>.*?<p class="rating">(?P<rating>.*?) stars</p>'

print("Product Information:")
for match in re.finditer(pattern, html_content, re.DOTALL):
    print(f"Product: {match.group('name')}")
    print(f"Price: ${match.group('price')}")
    print(f"Rating: {match.group('rating')}/5")
    print(f"All groups: {match.groups()}")
    print(f"Group dict: {match.groupdict()}")
    print("---")

# Example with optional groups
text = "Colors: red, green, blue"
pattern = r'Colors:\s*(?:(\w+)(?:,\s*(\w+))?(?:,\s*(\w+))?)?'

match = re.search(pattern, text)
if match:
    print("Color groups:", match.groups())
    print("Non-empty colors:", [color for color in match.groups() if color])

Output:

text

Product Information:
Product: Laptop
Price: $999.99
Rating: 4.5/5
All groups: ('Laptop', '999.99', '4.5')
Group dict: {'name': 'Laptop', 'price': '999.99', 'rating': '4.5'}
---
Product: Smartphone
Price: $599.50
Rating: 4.2/5
All groups: ('Smartphone', '599.50', '4.2')
Group dict: {'name': 'Smartphone', 'price': '599.50', 'rating': '4.2'}
---
Color groups: ('red', 'green', 'blue')
Non-empty colors: ['red', 'green', 'blue']

Key Differences:

MethodReturnsUsage
group()Specific group(s)match.group(0)match.group(1)match.group(1, 2)
groups()All groups as tuplematch.groups()
groupdict()Named groups as dictmatch.groupdict()

Important Notes:

  1. Group numbering starts at 1group(1) is the first capture group
  2. group(0): Always returns the entire matched string
  3. None groups: If a group didn’t participate in match, it returns None
  4. Named groups: Use (?P<name>pattern) syntax and access with group('name')

python

# Named groups example
text = "John: 30"
match = re.search(r'(?P<name>\w+):\s*(?P<age>\d+)', text)
if match:
    print(match.group('name'))  # John
    print(match.group('age'))   # 30
    print(match.groupdict())    # {'name': 'John', 'age': '30'}

These methods are essential for extracting specific parts of matched patterns in regular expressions!

Similar Posts

  • Dynamically Typed vs. Statically Typed Languages 🔄↔️

    Dynamically Typed vs. Statically Typed Languages 🔄↔️ Dynamically Typed Languages 🚀 Python Pros: Cons: Statically Typed Languages 🔒 Java Pros: Cons: Key Differences 🆚 Feature Dynamically Typed Statically Typed Type Checking Runtime Compile-time Variable Types Can change during execution Fixed after declaration Error Detection Runtime exceptions Compile-time failures Speed Slower (runtime checks) Faster (optimized binaries)…

  • Formatting Date and Time in Python

    Formatting Date and Time in Python Python provides powerful formatting options for dates and times using the strftime() method and parsing using strptime() method. 1. Basic Formatting with strftime() Date Formatting python from datetime import date, datetime # Current date today = date.today() print(“Date Formatting Examples:”) print(f”Default: {today}”) print(f”YYYY-MM-DD: {today.strftime(‘%Y-%m-%d’)}”) print(f”MM/DD/YYYY: {today.strftime(‘%m/%d/%Y’)}”) print(f”DD-MM-YYYY: {today.strftime(‘%d-%m-%Y’)}”) print(f”Full month: {today.strftime(‘%B %d, %Y’)}”) print(f”Abbr…

  • Raw Strings in Python

    Raw Strings in Python’s re Module Raw strings (prefixed with r) are highly recommended when working with regular expressions because they treat backslashes (\) as literal characters, preventing Python from interpreting them as escape sequences. path = ‘C:\Users\Documents’ pattern = r’C:\Users\Documents’ .4.1.1. Escape sequences Unless an ‘r’ or ‘R’ prefix is present, escape sequences in string and bytes literals are interpreted according…

  • Formatted printing

    C-Style String Formatting in Python Python supports C-style string formatting using the % operator, which provides similar functionality to C’s printf() function. This method is sometimes called “old-style” string formatting but remains useful in many scenarios. Basic Syntax python “format string” % (values) Control Characters (Format Specifiers) Format Specifier Description Example Output %s String “%s” % “hello” hello %d…

  • Examples of Python Exceptions

    Comprehensive Examples of Python Exceptions Here are examples of common Python exceptions with simple programs: 1. SyntaxError 2. IndentationError 3. NameError 4. TypeError 5. ValueError 6. IndexError 7. KeyError 8. ZeroDivisionError 9. FileNotFoundError 10. PermissionError 11. ImportError 12. AttributeError 13. RuntimeError 14. RecursionError 15. KeyboardInterrupt 16. MemoryError 17. OverflowError 18. StopIteration 19. AssertionError 20. UnboundLocalError…

  • append(), extend(), and insert() methods in Python lists

    append(), extend(), and insert() methods in Python lists, along with slicing where applicable. 1. append() Method Adds a single element to the end of the list. Examples: 2. extend() Method Adds multiple elements (iterable items) to the end of the list. Examples: 3. insert() Method Inserts an element at a specific position. Examples: Key Differences: Method Modifies List? Adds Single/Multiple Elements? Position append() ✅ Yes Single element (even if it’s a list) End…

Leave a Reply

Your email address will not be published. Required fields are marked *