Quantifiers (Repetition)
Quantifiers (Repetition) in Python Regular Expressions – Detailed Explanation
Basic Quantifiers
1. * – 0 or more occurrences (Greedy)
Description: Matches the preceding element zero or more times
Example 1: Match zero or more digits
python
import re text = "123 4567 89" result = re.findall(r'\d*', text) print(result) # ['123', '', '4567', '', '89', ''] # Matches sequences of digits (including empty matches between)
Example 2: Optional whitespace matching
python
text = "hello world" result = re.findall(r'hello\s*world', text) print(result) # ['hello world'] # Matches "hello" followed by any amount of whitespace followed by "world"
2. + – 1 or more occurrences (Greedy)
Description: Matches the preceding element one or more times
Example 1: Extract all numbers from text
python
text = "Prices: $10, $25, $100, $500" result = re.findall(r'\d+', text) print(result) # ['10', '25', '100', '500'] # Matches one or more consecutive digits
Example 2: Find repeated characters
python
text = "aaa bb cccc d eeeee" result = re.findall(r'\w+', text) print(result) # ['aaa', 'bb', 'cccc', 'd', 'eeeee'] # Matches sequences of one or more word characters
3. ? – 0 or 1 occurrence (Greedy)
Description: Makes the preceding element optional (zero or one occurrence)
Example 1: Optional plural forms
python
text = "apple apples cat cats dog dogs" result = re.findall(r'\w+s?', text) print(result) # ['apple', 'apples', 'cat', 'cats', 'dog', 'dogs'] # Matches words with optional 's' at the end
Example 2: Optional protocol in URLs
python
text = "http://example.com https://test.com ftp://files.com example.org" result = re.findall(r'https?://\S+', text) print(result) # ['http://example.com', 'https://test.com', 'ftp://files.com'] # Matches http or https URLs (the 's' is optional)
4. {n} – Exactly n occurrences
Description: Matches exactly n occurrences of the preceding element
Example 1: Match exactly 3 digits
python
text = "123 4567 89 012 3456"
result = re.findall(r'\d{3}', text)
print(result) # ['123', '456', '012', '345']
# Matches exactly 3 digits (4567 becomes 456, 3456 becomes 345)
Example 2: Validate phone number format
python
phone = "555-123-4567"
is_valid = bool(re.fullmatch(r'\d{3}-\d{3}-\d{4}', phone))
print(is_valid) # True
# Exactly 3 digits, hyphen, 3 digits, hyphen, 4 digits
5. {n,} – n or more occurrences
Description: Matches n or more occurrences of the preceding element
Example 1: Find long numbers
python
text = "1 12 123 1234 12345 123456"
result = re.findall(r'\d{3,}', text)
print(result) # ['1234', '12345', '123456']
# Matches numbers with 3 or more digits
Example 2: Find words with minimum length
python
text = "I am learning Python programming"
result = re.findall(r'\b\w{4,}\b', text)
print(result) # ['learning', 'Python', 'programming']
# Matches words with 4 or more characters
6. {n,m} – Between n and m occurrences
Description: Matches between n and m occurrences of the preceding element
Example 1: Find medium-length numbers
python
text = "1 12 123 1234 12345 123456"
result = re.findall(r'\d{2,4}', text)
print(result) # ['12', '123', '1234', '1234', '1234', '1234']
# Matches numbers with 2-4 digits (longer numbers are truncated)
Example 2: Validate password length
python
passwords = ["short", "good123", "verylongpassword", "ok"]
valid = [pwd for pwd in passwords if re.fullmatch(r'\w{6,12}', pwd)]
print(valid) # ['good123']
# Passwords between 6 and 12 characters
Greedy vs Lazy Quantifiers
7. * vs *? – Greedy vs Lazy
Example 1: Greedy matching
python
text = "<div>content</div><p>more</p>" result = re.findall(r'<.*>', text) print(result) # ['<div>content</div><p>more</p>'] # Greedy - matches everything between first < and last >
Example 2: Lazy matching
python
text = "<div>content</div><p>more</p>" result = re.findall(r'<.*?>', text) print(result) # ['<div>', '</div>', '<p>', '</p>'] # Lazy - matches each tag individually
8. + vs +? – Greedy vs Lazy
Example 1: Greedy matching
python
text = "aaaabaaaab" result = re.findall(r'a+b', text) print(result) # ['aaaab', 'aaaab'] # Greedy - matches as many 'a's as possible before 'b'
Example 2: Lazy matching
python
text = "aaaabaaaab" result = re.findall(r'a+?b', text) print(result) # ['aaaab', 'aaaab'] # In this case, lazy behaves same as greedy due to the requirement of 'b'
9. ? vs ?? – Greedy vs Lazy
Example 1: Greedy optional
python
text = "abc" result = re.findall(r'ab?c', text) print(result) # ['abc'] # Greedy - prefers matching the 'b' if possible
Example 2: Lazy optional
python
text = "abc" result = re.findall(r'ab??c', text) print(result) # ['abc'] # Still matches 'b' because it's necessary for the pattern to match
10. {n,} vs {n,}? – Greedy vs Lazy
Example 1: Greedy range
python
text = "aaaaab"
result = re.findall(r'a{2,}b', text)
print(result) # ['aaaaab']
# Greedy - matches as many 'a's as possible (5)
Example 2: Lazy range
python
text = "aaaaab"
result = re.findall(r'a{2,}?b', text)
print(result) # ['aaaaab']
# Lazy - but still matches all 'a's because pattern requires 'b' at the end
11. {n,m} vs {n,m}? – Greedy vs Lazy
Example 1: Greedy bounded range
python
text = "aaaaab"
result = re.findall(r'a{2,4}b', text)
print(result) # ['aaaab']
# Greedy - matches maximum allowed (4 'a's)
Example 2: Lazy bounded range
python
text = "aaaaab"
result = re.findall(r'a{2,4}?b', text)
print(result) # ['aaaab']
# Lazy - but minimum is 2, and pattern requires 'b', so matches 4
Possessive Quantifiers (Python regex module only)
Note: These require the regex module (pip install regex), not the standard re module
12. *+ – Possessive quantifier
python
import regex text = "aaaab" result = regex.findall(r'a*+b', text) print(result) # ['aaaab'] # Possessive - matches all 'a's and doesn't backtrack
13. ++ – Possessive quantifier
python
text = "aaaab" result = regex.findall(r'a++b', text) print(result) # ['aaaab'] # Possessive - matches all 'a's without backtracking
14. ?+ – Possessive quantifier
python
text = "ab" result = regex.findall(r'a?+b', text) print(result) # ['ab'] # Possessive optional - matches 'a' if available without backtracking
15. {n}+ – Possessive quantifier
python
text = "aaaab"
result = regex.findall(r'a{3}+b', text)
print(result) # ['aaaab']
# Possessive exact count - matches exactly 3 'a's without backtracking
16. {n,}+ – Possessive quantifier
python
text = "aaaab"
result = regex.findall(r'a{2,}+b', text)
print(result) # ['aaaab']
# Possessive range - matches 2+ 'a's without backtracking
17. {n,m}+ – Possessive quantifier
python
text = "aaaab"
result = regex.findall(r'a{2,4}+b', text)
print(result) # ['aaaab']
# Possessive bounded range - matches up to 4 'a's without backtracking
Key Differences Summary
Greedy: Matches as much as possible while still allowing the overall pattern to match
Lazy: Matches as little as possible while still allowing the overall pattern to match
Possessive: Matches as much as possible and never gives back (no backtracking)
python
# Demonstration of all three types text = "aaaab" # Greedy - matches all 'a's but can backtrack if needed print(re.findall(r'a*b', text)) # ['aaaab'] # Lazy - matches minimal 'a's but enough to satisfy pattern print(re.findall(r'a*?b', text)) # ['aaaab'] - still needs to match 'b' # Possessive - matches all 'a's and never backtracks import regex print(regex.findall(r'a*+b', text)) # ['aaaab']