non-capturing group, Named Groups,groupdict()
To create a non-capturing group in Python’s re module, you use the syntax (?:...). This groups a part of a regular expression together without creating a backreference for that group.
A capturing group (...) saves the matched text. You can then access this captured text using methods like group(1), group(2), etc. A non-capturing group (?:...) allows you to apply quantifiers (like *, +, or ?) or alternatives (using |) to a part of the expression without saving the content of that group.
Here’s an example to illustrate the difference:
Capturing vs. Non-Capturing Groups
Let’s say you want to match the string “cat” or “dog” followed by “s”.
1. Using a capturing group (...)
Python
import re
text = "cats and dogs"
pattern = r'(cat|dog)s'
match = re.search(pattern, text)
if match:
# The whole match is group(0)
print(f"Whole match: {match.group(0)}")
# The capturing group 'cat|dog' is group(1)
print(f"Captured group: {match.group(1)}")
- Output:
- Whole match: cats
- Captured group: cat
The (cat|dog) part is a capturing group. When a match is found, re.search saves “cat” as group(1).
2. Using a non-capturing group (?:...)
Python
import re
text = "cats and dogs"
pattern = r'(?:cat|dog)s'
match = re.search(pattern, text)
if match:
# The whole match is still group(0)
print(f"Whole match: {match.group(0)}")
# There is no group(1) because the group is non-capturing
# Trying to access match.group(1) would raise an IndexError
- Output:
- Whole match: cats
Here, (?:cat|dog) is a non-capturing group. It groups the alternatives cat and dog together so the s can apply to both, but it does not save the matched part. This makes the regex more efficient and prevents the creation of unnecessary backreferences.
In Python’s re module, you can use named groups to give a memorable name to a capturing group instead of referring to it by its number. This makes your code more readable and easier to maintain. You can then access the matched content of these named groups using the groupdict() method.
Named Groups
A named group is created using the syntax (?P<name>...), where name is the name you give to the group.
Example: Let’s say you want to extract the year, month, and day from a date string like “2025-09-19”. Instead of using numeric groups like group(1), group(2), and group(3), you can name them year, month, and day.
Python
import re
date_string = "2025-09-19"
pattern = r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})'
match = re.search(pattern, date_string)
if match:
# Access the captured data by name
print(f"Year: {match.group('year')}")
print(f"Month: {match.group('month')}")
print(f"Day: {match.group('day')}")
This code is much clearer than match.group(1), match.group(2), and match.group(3).
groupdict()
The groupdict() method is a powerful way to access all named captured groups at once. It returns a dictionary where the keys are the group names and the values are the corresponding matched substrings.
Example: Using the same date pattern as above, you can use groupdict() to get all the named groups in a single dictionary.
Python
import re
date_string = "2025-09-19"
pattern = r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})'
match = re.search(pattern, date_string)
if match:
# Get a dictionary of all named groups
date_info = match.groupdict()
print(f"Date information: {date_info}")
print(f"Year from dictionary: {date_info['year']}")
print(f"Month from dictionary: {date_info['month']}")
- Output:
- Date information:
{'year': '2025', 'month': '09', 'day': '19'} - Year from dictionary:
2025 - Month from dictionary:
09
- Date information:
groupdict() is especially useful when you need to process multiple pieces of information from a string, as it provides a structured and readable way to access the data without needing to remember the order of the groups.