re.subn()
Python re.subn() Method Explained
The re.subn() method is similar to re.sub() but with one key difference: it returns a tuple containing both the modified string and the number of substitutions made. This is useful when you need to know how many replacements occurred.
Syntax
python
re.subn(pattern, repl, string, count=0, flags=0)
Returns: (modified_string, number_of_substitutions)
Example 1: Basic Usage with Count Tracking
python
import re
text = "The color of the sky is blue. My favorite color is blue too."
# Replace 'color' with 'colour' and get count
result, count = re.subn(r'color', 'colour', text)
print("Modified text:", result)
print("Number of substitutions:", count)
# Replace with limit and get count
result_limited, count_limited = re.subn(r'color', 'colour', text, count=1)
print("\nLimited replacement:")
print("Modified text:", result_limited)
print("Number of substitutions:", count_limited)
# Case-insensitive replacement with count
text2 = "Color, COLOR, color, CoLoR"
result_ci, count_ci = re.subn(r'color', 'colour', text2, flags=re.IGNORECASE)
print("\nCase-insensitive:")
print("Modified text:", result_ci)
print("Number of substitutions:", count_ci)
Output:
text
Modified text: The colour of the sky is blue. My favourite colour is blue too. Number of substitutions: 2 Limited replacement: Modified text: The colour of the sky is blue. My favorite color is blue too. Number of substitutions: 1 Case-insensitive: Modified text: colour, colour, colour, colour Number of substitutions: 4
Example 2: Data Cleaning with Validation
python
import re
# Clean phone numbers and count valid ones
contacts = """
John: 123-456-7890
Jane: invalid-number
Bob: 555-1234
Alice: 987-654-3210
Invalid: 123
"""
# Remove non-digit characters and count valid phone numbers
cleaned, valid_count = re.subn(r'\D', '', contacts)
print("Cleaned numbers:", cleaned)
# Alternative: Format phone numbers and count valid ones
def format_phone(match):
digits = match.group(1).replace('-', '')
if len(digits) == 10:
return f"({digits[:3]}) {digits[3:6]}-{digits[6:]}"
return match.group(0) # Return original if invalid
formatted, formatted_count = re.subn(r'(\d{3}[-.]?\d{3}[-.]?\d{4})', format_phone, contacts)
print("\nFormatted contacts:")
print(formatted)
print("Valid phone numbers found:", formatted_count)
# Remove invalid lines (those without proper phone numbers)
clean_list, removed_count = re.subn(r'^.*:\s*(?!\d{3}[-.]?\d{3}[-.]?\d{4}).*$', '', contacts, flags=re.MULTILINE)
print("\nCleaned contact list:")
print(clean_list)
print("Invalid entries removed:", removed_count)
Output:
text
Cleaned numbers: John1234567890JaneinvalidnumberBob5551234Alice9876543210Invalid123 Formatted contacts: John: (123) 456-7890 Jane: invalid-number Bob: 555-1234 Alice: (987) 654-3210 Invalid: 123 Valid phone numbers found: 2 Cleaned contact list: John: 123-456-7890 Alice: 987-654-3210 Invalid entries removed: 3
Example 3: Advanced Text Processing with Statistics
python
import re
# Analyze text and perform multiple substitutions
text = """
The product costs $100.99, but with discount it's $89.99.
Shipping: $15.50. Total: $105.49. Tax: $9.45.
"""
# Convert USD to EUR (approx conversion rate)
def usd_to_eur(match):
usd_amount = float(match.group(1))
eur_amount = usd_amount * 0.85 # Approximate conversion
return f"€{eur_amount:.2f}"
# Apply currency conversion and count conversions
converted, conversion_count = re.subn(r'\$(\d+\.?\d*)', usd_to_eur, text)
print("Converted to EUR:")
print(converted)
print(f"Number of price conversions: {conversion_count}")
# Mask sensitive numbers (credit card-like patterns)
def mask_sensitive(match):
number = match.group(1).replace(' ', '').replace('-', '')
if len(number) >= 12: # Only mask if it looks like a card number
return 'XXXX-XXXX-XXXX-' + number[-4:]
return match.group(0) # Keep original if not sensitive
sensitive_text = "Card: 4111 1111 1111 1111, ID: 123-45-6789, Phone: 555-1234"
masked, masked_count = re.subn(r'(\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4})', mask_sensitive, sensitive_text)
print(f"\nMasked sensitive data ({masked_count} items masked):")
print(masked)
# Count specific word occurrences by replacing with themselves
word_count_text = "python is great. I love Python. python programming is fun."
python_count = re.subn(r'python', 'python', word_count_text, flags=re.IGNORECASE)[1]
print(f"\nThe word 'python' appears {python_count} times (case-insensitive)")
Output:
text
Converted to EUR: The product costs €85.84, but with discount it's €76.49. Shipping: €13.18. Total: €89.67. Tax: €8.03. Number of price conversions: 5 Masked sensitive data (1 items masked): Card: XXXX-XXXX-XXXX-1111, ID: 123-45-6789, Phone: 555-1234 The word 'python' appears 3 times (case-insensitive)
Key Advantages of re.subn():
- Returns substitution count: Perfect for validation and statistics
- Same functionality as re.sub(): All features of
re.sub()are available - Useful for debugging: Know exactly how many changes were made
- Data quality checks: Count successful/unsuccessful processing operations
When to Use re.subn():
- When you need to know how many replacements occurred
- For data validation and quality control
- When processing logs and need statistics
- For debugging complex text transformations
- When you need to verify that substitutions actually happened
The re.subn() method is extremely valuable for any application where you need both the transformed text and metrics about the transformation process!