re.subn()

Python re.subn() Method Explained

The re.subn() method is similar to re.sub() but with one key difference: it returns a tuple containing both the modified string and the number of substitutions made. This is useful when you need to know how many replacements occurred.

Syntax

python

re.subn(pattern, repl, string, count=0, flags=0)

Returns: (modified_string, number_of_substitutions)


Example 1: Basic Usage with Count Tracking

python

import re

text = "The color of the sky is blue. My favorite color is blue too."

# Replace 'color' with 'colour' and get count
result, count = re.subn(r'color', 'colour', text)
print("Modified text:", result)
print("Number of substitutions:", count)

# Replace with limit and get count
result_limited, count_limited = re.subn(r'color', 'colour', text, count=1)
print("\nLimited replacement:")
print("Modified text:", result_limited)
print("Number of substitutions:", count_limited)

# Case-insensitive replacement with count
text2 = "Color, COLOR, color, CoLoR"
result_ci, count_ci = re.subn(r'color', 'colour', text2, flags=re.IGNORECASE)
print("\nCase-insensitive:")
print("Modified text:", result_ci)
print("Number of substitutions:", count_ci)

Output:

text

Modified text: The colour of the sky is blue. My favourite colour is blue too.
Number of substitutions: 2

Limited replacement:
Modified text: The colour of the sky is blue. My favorite color is blue too.
Number of substitutions: 1

Case-insensitive:
Modified text: colour, colour, colour, colour
Number of substitutions: 4

Example 2: Data Cleaning with Validation

python

import re

# Clean phone numbers and count valid ones
contacts = """
John: 123-456-7890
Jane: invalid-number
Bob: 555-1234
Alice: 987-654-3210
Invalid: 123
"""

# Remove non-digit characters and count valid phone numbers
cleaned, valid_count = re.subn(r'\D', '', contacts)
print("Cleaned numbers:", cleaned)

# Alternative: Format phone numbers and count valid ones
def format_phone(match):
    digits = match.group(1).replace('-', '')
    if len(digits) == 10:
        return f"({digits[:3]}) {digits[3:6]}-{digits[6:]}"
    return match.group(0)  # Return original if invalid

formatted, formatted_count = re.subn(r'(\d{3}[-.]?\d{3}[-.]?\d{4})', format_phone, contacts)
print("\nFormatted contacts:")
print(formatted)
print("Valid phone numbers found:", formatted_count)

# Remove invalid lines (those without proper phone numbers)
clean_list, removed_count = re.subn(r'^.*:\s*(?!\d{3}[-.]?\d{3}[-.]?\d{4}).*$', '', contacts, flags=re.MULTILINE)
print("\nCleaned contact list:")
print(clean_list)
print("Invalid entries removed:", removed_count)

Output:

text

Cleaned numbers: John1234567890JaneinvalidnumberBob5551234Alice9876543210Invalid123

Formatted contacts:
John: (123) 456-7890
Jane: invalid-number
Bob: 555-1234
Alice: (987) 654-3210
Invalid: 123

Valid phone numbers found: 2

Cleaned contact list:
John: 123-456-7890
Alice: 987-654-3210

Invalid entries removed: 3

Example 3: Advanced Text Processing with Statistics

python

import re

# Analyze text and perform multiple substitutions
text = """
The product costs $100.99, but with discount it's $89.99.
Shipping: $15.50. Total: $105.49. Tax: $9.45.
"""

# Convert USD to EUR (approx conversion rate)
def usd_to_eur(match):
    usd_amount = float(match.group(1))
    eur_amount = usd_amount * 0.85  # Approximate conversion
    return f"€{eur_amount:.2f}"

# Apply currency conversion and count conversions
converted, conversion_count = re.subn(r'\$(\d+\.?\d*)', usd_to_eur, text)
print("Converted to EUR:")
print(converted)
print(f"Number of price conversions: {conversion_count}")

# Mask sensitive numbers (credit card-like patterns)
def mask_sensitive(match):
    number = match.group(1).replace(' ', '').replace('-', '')
    if len(number) >= 12:  # Only mask if it looks like a card number
        return 'XXXX-XXXX-XXXX-' + number[-4:]
    return match.group(0)  # Keep original if not sensitive

sensitive_text = "Card: 4111 1111 1111 1111, ID: 123-45-6789, Phone: 555-1234"
masked, masked_count = re.subn(r'(\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4})', mask_sensitive, sensitive_text)
print(f"\nMasked sensitive data ({masked_count} items masked):")
print(masked)

# Count specific word occurrences by replacing with themselves
word_count_text = "python is great. I love Python. python programming is fun."
python_count = re.subn(r'python', 'python', word_count_text, flags=re.IGNORECASE)[1]
print(f"\nThe word 'python' appears {python_count} times (case-insensitive)")

Output:

text

Converted to EUR:
The product costs €85.84, but with discount it's €76.49.
Shipping: €13.18. Total: €89.67. Tax: €8.03.
Number of price conversions: 5

Masked sensitive data (1 items masked):
Card: XXXX-XXXX-XXXX-1111, ID: 123-45-6789, Phone: 555-1234

The word 'python' appears 3 times (case-insensitive)

Key Advantages of re.subn():

  1. Returns substitution count: Perfect for validation and statistics
  2. Same functionality as re.sub(): All features of re.sub() are available
  3. Useful for debugging: Know exactly how many changes were made
  4. Data quality checks: Count successful/unsuccessful processing operations

When to Use re.subn():

  • When you need to know how many replacements occurred
  • For data validation and quality control
  • When processing logs and need statistics
  • For debugging complex text transformations
  • When you need to verify that substitutions actually happened

The re.subn() method is extremely valuable for any application where you need both the transformed text and metrics about the transformation process!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *