Quantifiers (Repetition)

Quantifiers (Repetition) in Python Regular Expressions – Detailed Explanation

Basic Quantifiers

1. * – 0 or more occurrences (Greedy)

Description: Matches the preceding element zero or more times

Example 1: Match zero or more digits

python

import re
text = "123 4567 89"
result = re.findall(r'\d*', text)
print(result)  # ['123', '', '4567', '', '89', '']
# Matches sequences of digits (including empty matches between)

Example 2: Optional whitespace matching

python

text = "hello   world"
result = re.findall(r'hello\s*world', text)
print(result)  # ['hello   world']
# Matches "hello" followed by any amount of whitespace followed by "world"

2. + – 1 or more occurrences (Greedy)

Description: Matches the preceding element one or more times

Example 1: Extract all numbers from text

python

text = "Prices: $10, $25, $100, $500"
result = re.findall(r'\d+', text)
print(result)  # ['10', '25', '100', '500']
# Matches one or more consecutive digits

Example 2: Find repeated characters

python

text = "aaa bb cccc d eeeee"
result = re.findall(r'\w+', text)
print(result)  # ['aaa', 'bb', 'cccc', 'd', 'eeeee']
# Matches sequences of one or more word characters

3. ? – 0 or 1 occurrence (Greedy)

Description: Makes the preceding element optional (zero or one occurrence)

Example 1: Optional plural forms

python

text = "apple apples cat cats dog dogs"
result = re.findall(r'\w+s?', text)
print(result)  # ['apple', 'apples', 'cat', 'cats', 'dog', 'dogs']
# Matches words with optional 's' at the end

Example 2: Optional protocol in URLs

python

text = "http://example.com https://test.com ftp://files.com example.org"
result = re.findall(r'https?://\S+', text)
print(result)  # ['http://example.com', 'https://test.com', 'ftp://files.com']
# Matches http or https URLs (the 's' is optional)

4. {n} – Exactly n occurrences

Description: Matches exactly n occurrences of the preceding element

Example 1: Match exactly 3 digits

python

text = "123 4567 89 012 3456"
result = re.findall(r'\d{3}', text)
print(result)  # ['123', '456', '012', '345']
# Matches exactly 3 digits (4567 becomes 456, 3456 becomes 345)

Example 2: Validate phone number format

python

phone = "555-123-4567"
is_valid = bool(re.fullmatch(r'\d{3}-\d{3}-\d{4}', phone))
print(is_valid)  # True
# Exactly 3 digits, hyphen, 3 digits, hyphen, 4 digits

5. {n,} – n or more occurrences

Description: Matches n or more occurrences of the preceding element

Example 1: Find long numbers

python

text = "1 12 123 1234 12345 123456"
result = re.findall(r'\d{3,}', text)
print(result)  # ['1234', '12345', '123456']
# Matches numbers with 3 or more digits

Example 2: Find words with minimum length

python

text = "I am learning Python programming"
result = re.findall(r'\b\w{4,}\b', text)
print(result)  # ['learning', 'Python', 'programming']
# Matches words with 4 or more characters

6. {n,m} – Between n and m occurrences

Description: Matches between n and m occurrences of the preceding element

Example 1: Find medium-length numbers

python

text = "1 12 123 1234 12345 123456"
result = re.findall(r'\d{2,4}', text)
print(result)  # ['12', '123', '1234', '1234', '1234', '1234']
# Matches numbers with 2-4 digits (longer numbers are truncated)

Example 2: Validate password length

python

passwords = ["short", "good123", "verylongpassword", "ok"]
valid = [pwd for pwd in passwords if re.fullmatch(r'\w{6,12}', pwd)]
print(valid)  # ['good123']
# Passwords between 6 and 12 characters

Greedy vs Lazy Quantifiers

7. * vs *? – Greedy vs Lazy

Example 1: Greedy matching

python

text = "<div>content</div><p>more</p>"
result = re.findall(r'<.*>', text)
print(result)  # ['<div>content</div><p>more</p>']
# Greedy - matches everything between first < and last >

Example 2: Lazy matching

python

text = "<div>content</div><p>more</p>"
result = re.findall(r'<.*?>', text)
print(result)  # ['<div>', '</div>', '<p>', '</p>']
# Lazy - matches each tag individually

8. + vs +? – Greedy vs Lazy

Example 1: Greedy matching

python

text = "aaaabaaaab"
result = re.findall(r'a+b', text)
print(result)  # ['aaaab', 'aaaab']
# Greedy - matches as many 'a's as possible before 'b'

Example 2: Lazy matching

python

text = "aaaabaaaab"
result = re.findall(r'a+?b', text)
print(result)  # ['aaaab', 'aaaab']
# In this case, lazy behaves same as greedy due to the requirement of 'b'

9. ? vs ?? – Greedy vs Lazy

Example 1: Greedy optional

python

text = "abc"
result = re.findall(r'ab?c', text)
print(result)  # ['abc']
# Greedy - prefers matching the 'b' if possible

Example 2: Lazy optional

python

text = "abc"
result = re.findall(r'ab??c', text)
print(result)  # ['abc']
# Still matches 'b' because it's necessary for the pattern to match

10. {n,} vs {n,}? – Greedy vs Lazy

Example 1: Greedy range

python

text = "aaaaab"
result = re.findall(r'a{2,}b', text)
print(result)  # ['aaaaab']
# Greedy - matches as many 'a's as possible (5)

Example 2: Lazy range

python

text = "aaaaab"
result = re.findall(r'a{2,}?b', text)
print(result)  # ['aaaaab']
# Lazy - but still matches all 'a's because pattern requires 'b' at the end

11. {n,m} vs {n,m}? – Greedy vs Lazy

Example 1: Greedy bounded range

python

text = "aaaaab"
result = re.findall(r'a{2,4}b', text)
print(result)  # ['aaaab']
# Greedy - matches maximum allowed (4 'a's)

Example 2: Lazy bounded range

python

text = "aaaaab"
result = re.findall(r'a{2,4}?b', text)
print(result)  # ['aaaab']
# Lazy - but minimum is 2, and pattern requires 'b', so matches 4

Possessive Quantifiers (Python regex module only)

Note: These require the regex module (pip install regex), not the standard re module

12. *+ – Possessive quantifier

python

import regex
text = "aaaab"
result = regex.findall(r'a*+b', text)
print(result)  # ['aaaab']
# Possessive - matches all 'a's and doesn't backtrack

13. ++ – Possessive quantifier

python

text = "aaaab"
result = regex.findall(r'a++b', text)
print(result)  # ['aaaab']
# Possessive - matches all 'a's without backtracking

14. ?+ – Possessive quantifier

python

text = "ab"
result = regex.findall(r'a?+b', text)
print(result)  # ['ab']
# Possessive optional - matches 'a' if available without backtracking

15. {n}+ – Possessive quantifier

python

text = "aaaab"
result = regex.findall(r'a{3}+b', text)
print(result)  # ['aaaab']
# Possessive exact count - matches exactly 3 'a's without backtracking

16. {n,}+ – Possessive quantifier

python

text = "aaaab"
result = regex.findall(r'a{2,}+b', text)
print(result)  # ['aaaab']
# Possessive range - matches 2+ 'a's without backtracking

17. {n,m}+ – Possessive quantifier

python

text = "aaaab"
result = regex.findall(r'a{2,4}+b', text)
print(result)  # ['aaaab']
# Possessive bounded range - matches up to 4 'a's without backtracking

Key Differences Summary

Greedy: Matches as much as possible while still allowing the overall pattern to match
Lazy: Matches as little as possible while still allowing the overall pattern to match
Possessive: Matches as much as possible and never gives back (no backtracking)

python

# Demonstration of all three types
text = "aaaab"

# Greedy - matches all 'a's but can backtrack if needed
print(re.findall(r'a*b', text))    # ['aaaab']

# Lazy - matches minimal 'a's but enough to satisfy pattern  
print(re.findall(r'a*?b', text))   # ['aaaab'] - still needs to match 'b'

# Possessive - matches all 'a's and never backtracks
import regex
print(regex.findall(r'a*+b', text)) # ['aaaab']

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *