Quantifiers (Repetition)

Quantifiers (Repetition) in Python Regular Expressions – Detailed Explanation

Basic Quantifiers

1. * – 0 or more occurrences (Greedy)

Description: Matches the preceding element zero or more times

Example 1: Match zero or more digits

python

import re
text = "123 4567 89"
result = re.findall(r'\d*', text)
print(result)  # ['123', '', '4567', '', '89', '']
# Matches sequences of digits (including empty matches between)

Example 2: Optional whitespace matching

python

text = "hello   world"
result = re.findall(r'hello\s*world', text)
print(result)  # ['hello   world']
# Matches "hello" followed by any amount of whitespace followed by "world"

2. + – 1 or more occurrences (Greedy)

Description: Matches the preceding element one or more times

Example 1: Extract all numbers from text

python

text = "Prices: $10, $25, $100, $500"
result = re.findall(r'\d+', text)
print(result)  # ['10', '25', '100', '500']
# Matches one or more consecutive digits

Example 2: Find repeated characters

python

text = "aaa bb cccc d eeeee"
result = re.findall(r'\w+', text)
print(result)  # ['aaa', 'bb', 'cccc', 'd', 'eeeee']
# Matches sequences of one or more word characters

3. ? – 0 or 1 occurrence (Greedy)

Description: Makes the preceding element optional (zero or one occurrence)

Example 1: Optional plural forms

python

text = "apple apples cat cats dog dogs"
result = re.findall(r'\w+s?', text)
print(result)  # ['apple', 'apples', 'cat', 'cats', 'dog', 'dogs']
# Matches words with optional 's' at the end

Example 2: Optional protocol in URLs

python

text = "http://example.com https://test.com ftp://files.com example.org"
result = re.findall(r'https?://\S+', text)
print(result)  # ['http://example.com', 'https://test.com', 'ftp://files.com']
# Matches http or https URLs (the 's' is optional)

4. {n} – Exactly n occurrences

Description: Matches exactly n occurrences of the preceding element

Example 1: Match exactly 3 digits

python

text = "123 4567 89 012 3456"
result = re.findall(r'\d{3}', text)
print(result)  # ['123', '456', '012', '345']
# Matches exactly 3 digits (4567 becomes 456, 3456 becomes 345)

Example 2: Validate phone number format

python

phone = "555-123-4567"
is_valid = bool(re.fullmatch(r'\d{3}-\d{3}-\d{4}', phone))
print(is_valid)  # True
# Exactly 3 digits, hyphen, 3 digits, hyphen, 4 digits

5. {n,} – n or more occurrences

Description: Matches n or more occurrences of the preceding element

Example 1: Find long numbers

python

text = "1 12 123 1234 12345 123456"
result = re.findall(r'\d{3,}', text)
print(result)  # ['1234', '12345', '123456']
# Matches numbers with 3 or more digits

Example 2: Find words with minimum length

python

text = "I am learning Python programming"
result = re.findall(r'\b\w{4,}\b', text)
print(result)  # ['learning', 'Python', 'programming']
# Matches words with 4 or more characters

6. {n,m} – Between n and m occurrences

Description: Matches between n and m occurrences of the preceding element

Example 1: Find medium-length numbers

python

text = "1 12 123 1234 12345 123456"
result = re.findall(r'\d{2,4}', text)
print(result)  # ['12', '123', '1234', '1234', '1234', '1234']
# Matches numbers with 2-4 digits (longer numbers are truncated)

Example 2: Validate password length

python

passwords = ["short", "good123", "verylongpassword", "ok"]
valid = [pwd for pwd in passwords if re.fullmatch(r'\w{6,12}', pwd)]
print(valid)  # ['good123']
# Passwords between 6 and 12 characters

Greedy vs Lazy Quantifiers

7. * vs *? – Greedy vs Lazy

Example 1: Greedy matching

python

text = "<div>content</div><p>more</p>"
result = re.findall(r'<.*>', text)
print(result)  # ['<div>content</div><p>more</p>']
# Greedy - matches everything between first < and last >

Example 2: Lazy matching

python

text = "<div>content</div><p>more</p>"
result = re.findall(r'<.*?>', text)
print(result)  # ['<div>', '</div>', '<p>', '</p>']
# Lazy - matches each tag individually

8. + vs +? – Greedy vs Lazy

Example 1: Greedy matching

python

text = "aaaabaaaab"
result = re.findall(r'a+b', text)
print(result)  # ['aaaab', 'aaaab']
# Greedy - matches as many 'a's as possible before 'b'

Example 2: Lazy matching

python

text = "aaaabaaaab"
result = re.findall(r'a+?b', text)
print(result)  # ['aaaab', 'aaaab']
# In this case, lazy behaves same as greedy due to the requirement of 'b'

9. ? vs ?? – Greedy vs Lazy

Example 1: Greedy optional

python

text = "abc"
result = re.findall(r'ab?c', text)
print(result)  # ['abc']
# Greedy - prefers matching the 'b' if possible

Example 2: Lazy optional

python

text = "abc"
result = re.findall(r'ab??c', text)
print(result)  # ['abc']
# Still matches 'b' because it's necessary for the pattern to match

10. {n,} vs {n,}? – Greedy vs Lazy

Example 1: Greedy range

python

text = "aaaaab"
result = re.findall(r'a{2,}b', text)
print(result)  # ['aaaaab']
# Greedy - matches as many 'a's as possible (5)

Example 2: Lazy range

python

text = "aaaaab"
result = re.findall(r'a{2,}?b', text)
print(result)  # ['aaaaab']
# Lazy - but still matches all 'a's because pattern requires 'b' at the end

11. {n,m} vs {n,m}? – Greedy vs Lazy

Example 1: Greedy bounded range

python

text = "aaaaab"
result = re.findall(r'a{2,4}b', text)
print(result)  # ['aaaab']
# Greedy - matches maximum allowed (4 'a's)

Example 2: Lazy bounded range

python

text = "aaaaab"
result = re.findall(r'a{2,4}?b', text)
print(result)  # ['aaaab']
# Lazy - but minimum is 2, and pattern requires 'b', so matches 4

Possessive Quantifiers (Python regex module only)

Note: These require the regex module (pip install regex), not the standard re module

12. *+ – Possessive quantifier

python

import regex
text = "aaaab"
result = regex.findall(r'a*+b', text)
print(result)  # ['aaaab']
# Possessive - matches all 'a's and doesn't backtrack

13. ++ – Possessive quantifier

python

text = "aaaab"
result = regex.findall(r'a++b', text)
print(result)  # ['aaaab']
# Possessive - matches all 'a's without backtracking

14. ?+ – Possessive quantifier

python

text = "ab"
result = regex.findall(r'a?+b', text)
print(result)  # ['ab']
# Possessive optional - matches 'a' if available without backtracking

15. {n}+ – Possessive quantifier

python

text = "aaaab"
result = regex.findall(r'a{3}+b', text)
print(result)  # ['aaaab']
# Possessive exact count - matches exactly 3 'a's without backtracking

16. {n,}+ – Possessive quantifier

python

text = "aaaab"
result = regex.findall(r'a{2,}+b', text)
print(result)  # ['aaaab']
# Possessive range - matches 2+ 'a's without backtracking

17. {n,m}+ – Possessive quantifier

python

text = "aaaab"
result = regex.findall(r'a{2,4}+b', text)
print(result)  # ['aaaab']
# Possessive bounded range - matches up to 4 'a's without backtracking

Key Differences Summary

Greedy: Matches as much as possible while still allowing the overall pattern to match
Lazy: Matches as little as possible while still allowing the overall pattern to match
Possessive: Matches as much as possible and never gives back (no backtracking)

python

# Demonstration of all three types
text = "aaaab"

# Greedy - matches all 'a's but can backtrack if needed
print(re.findall(r'a*b', text))    # ['aaaab']

# Lazy - matches minimal 'a's but enough to satisfy pattern  
print(re.findall(r'a*?b', text))   # ['aaaab'] - still needs to match 'b'

# Possessive - matches all 'a's and never backtracks
import regex
print(regex.findall(r'a*+b', text)) # ['aaaab']

Similar Posts

  • Python Variables: A Complete Guide with Interview Q&A

    Here’s a detailed set of notes on Python variables that you can use to explain the concept to your students. These notes are structured to make it easy for beginners to understand. Python Variables: Notes for Students 1. What is a Variable? 2. Rules for Naming Variables Python has specific rules for naming variables: 3….

  • Python Program to Check Pangram Phrases

    Python Program to Check Pangram Phrases What is a Pangram? A pangram is a sentence or phrase that contains every letter of the alphabet at least once. Method 1: Using Set Operations python def is_pangram_set(phrase): “”” Check if a phrase is a pangram using set operations “”” # Convert to lowercase and remove non-alphabetic characters…

  • Finally Block in Exception Handling in Python

    Finally Block in Exception Handling in Python The finally block in Python exception handling executes regardless of whether an exception occurred or not. It’s always executed, making it perfect for cleanup operations like closing files, database connections, or releasing resources. Basic Syntax: python try: # Code that might raise an exception except SomeException: # Handle the exception else:…

  • Combined Character Classes

    Combined Character Classes Explained with Examples 1. [a-zA-Z0-9_] – Word characters (same as \w) Description: Matches any letter (lowercase or uppercase), any digit, or underscore Example 1: Extract all word characters from text python import re text = “User_name123! Email: test@example.com” result = re.findall(r'[a-zA-Z0-9_]’, text) print(result) # [‘U’, ‘s’, ‘e’, ‘r’, ‘_’, ‘n’, ‘a’, ‘m’, ‘e’, ‘1’, ‘2’,…

  • Vs code

    What is VS Code? πŸ’» Visual Studio Code (VS Code) is a free, lightweight, and powerful code editor developed by Microsoft. It supports multiple programming languages (Python, JavaScript, Java, etc.) with: VS Code is cross-platform (Windows, macOS, Linux) and widely used for web development, data science, and general programming. πŸŒπŸ“ŠβœοΈ How to Install VS Code…

  • Built-in Object & Attribute Functions in python

    1. type() Description: Returns the type of an object. python # 1. Basic types print(type(5)) # <class ‘int’> print(type(3.14)) # <class ‘float’> print(type(“hello”)) # <class ‘str’> print(type(True)) # <class ‘bool’> # 2. Collection types print(type([1, 2, 3])) # <class ‘list’> print(type((1, 2, 3))) # <class ‘tuple’> print(type({1, 2, 3})) # <class ‘set’> print(type({“a”: 1})) # <class…

Leave a Reply

Your email address will not be published. Required fields are marked *