re.subn()

Python re.subn() Method Explained

The re.subn() method is similar to re.sub() but with one key difference: it returns a tuple containing both the modified string and the number of substitutions made. This is useful when you need to know how many replacements occurred.

Syntax

python

re.subn(pattern, repl, string, count=0, flags=0)

Returns: (modified_string, number_of_substitutions)


Example 1: Basic Usage with Count Tracking

python

import re

text = "The color of the sky is blue. My favorite color is blue too."

# Replace 'color' with 'colour' and get count
result, count = re.subn(r'color', 'colour', text)
print("Modified text:", result)
print("Number of substitutions:", count)

# Replace with limit and get count
result_limited, count_limited = re.subn(r'color', 'colour', text, count=1)
print("\nLimited replacement:")
print("Modified text:", result_limited)
print("Number of substitutions:", count_limited)

# Case-insensitive replacement with count
text2 = "Color, COLOR, color, CoLoR"
result_ci, count_ci = re.subn(r'color', 'colour', text2, flags=re.IGNORECASE)
print("\nCase-insensitive:")
print("Modified text:", result_ci)
print("Number of substitutions:", count_ci)

Output:

text

Modified text: The colour of the sky is blue. My favourite colour is blue too.
Number of substitutions: 2

Limited replacement:
Modified text: The colour of the sky is blue. My favorite color is blue too.
Number of substitutions: 1

Case-insensitive:
Modified text: colour, colour, colour, colour
Number of substitutions: 4

Example 2: Data Cleaning with Validation

python

import re

# Clean phone numbers and count valid ones
contacts = """
John: 123-456-7890
Jane: invalid-number
Bob: 555-1234
Alice: 987-654-3210
Invalid: 123
"""

# Remove non-digit characters and count valid phone numbers
cleaned, valid_count = re.subn(r'\D', '', contacts)
print("Cleaned numbers:", cleaned)

# Alternative: Format phone numbers and count valid ones
def format_phone(match):
    digits = match.group(1).replace('-', '')
    if len(digits) == 10:
        return f"({digits[:3]}) {digits[3:6]}-{digits[6:]}"
    return match.group(0)  # Return original if invalid

formatted, formatted_count = re.subn(r'(\d{3}[-.]?\d{3}[-.]?\d{4})', format_phone, contacts)
print("\nFormatted contacts:")
print(formatted)
print("Valid phone numbers found:", formatted_count)

# Remove invalid lines (those without proper phone numbers)
clean_list, removed_count = re.subn(r'^.*:\s*(?!\d{3}[-.]?\d{3}[-.]?\d{4}).*$', '', contacts, flags=re.MULTILINE)
print("\nCleaned contact list:")
print(clean_list)
print("Invalid entries removed:", removed_count)

Output:

text

Cleaned numbers: John1234567890JaneinvalidnumberBob5551234Alice9876543210Invalid123

Formatted contacts:
John: (123) 456-7890
Jane: invalid-number
Bob: 555-1234
Alice: (987) 654-3210
Invalid: 123

Valid phone numbers found: 2

Cleaned contact list:
John: 123-456-7890
Alice: 987-654-3210

Invalid entries removed: 3

Example 3: Advanced Text Processing with Statistics

python

import re

# Analyze text and perform multiple substitutions
text = """
The product costs $100.99, but with discount it's $89.99.
Shipping: $15.50. Total: $105.49. Tax: $9.45.
"""

# Convert USD to EUR (approx conversion rate)
def usd_to_eur(match):
    usd_amount = float(match.group(1))
    eur_amount = usd_amount * 0.85  # Approximate conversion
    return f"€{eur_amount:.2f}"

# Apply currency conversion and count conversions
converted, conversion_count = re.subn(r'\$(\d+\.?\d*)', usd_to_eur, text)
print("Converted to EUR:")
print(converted)
print(f"Number of price conversions: {conversion_count}")

# Mask sensitive numbers (credit card-like patterns)
def mask_sensitive(match):
    number = match.group(1).replace(' ', '').replace('-', '')
    if len(number) >= 12:  # Only mask if it looks like a card number
        return 'XXXX-XXXX-XXXX-' + number[-4:]
    return match.group(0)  # Keep original if not sensitive

sensitive_text = "Card: 4111 1111 1111 1111, ID: 123-45-6789, Phone: 555-1234"
masked, masked_count = re.subn(r'(\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4})', mask_sensitive, sensitive_text)
print(f"\nMasked sensitive data ({masked_count} items masked):")
print(masked)

# Count specific word occurrences by replacing with themselves
word_count_text = "python is great. I love Python. python programming is fun."
python_count = re.subn(r'python', 'python', word_count_text, flags=re.IGNORECASE)[1]
print(f"\nThe word 'python' appears {python_count} times (case-insensitive)")

Output:

text

Converted to EUR:
The product costs €85.84, but with discount it's €76.49.
Shipping: €13.18. Total: €89.67. Tax: €8.03.
Number of price conversions: 5

Masked sensitive data (1 items masked):
Card: XXXX-XXXX-XXXX-1111, ID: 123-45-6789, Phone: 555-1234

The word 'python' appears 3 times (case-insensitive)

Key Advantages of re.subn():

  1. Returns substitution count: Perfect for validation and statistics
  2. Same functionality as re.sub(): All features of re.sub() are available
  3. Useful for debugging: Know exactly how many changes were made
  4. Data quality checks: Count successful/unsuccessful processing operations

When to Use re.subn():

  • When you need to know how many replacements occurred
  • For data validation and quality control
  • When processing logs and need statistics
  • For debugging complex text transformations
  • When you need to verify that substitutions actually happened

The re.subn() method is extremely valuable for any application where you need both the transformed text and metrics about the transformation process!

Similar Posts

  • Generalization vs. Specialization

    Object-Oriented Programming: Generalization vs. Specialization Introduction Inheritance in OOP serves two primary purposes: Let’s explore these concepts with clear examples. 1. Specialization (Extending Functionality) Specialization involves creating a new class that inherits all features from a parent class and then adds new, specific features. The core idea is reusability—you build upon what already exists. Key Principle: Child Class =…

  • pop(), remove(), clear(), and del 

    pop(), remove(), clear(), and del with 5 examples each, including slicing where applicable: 1. pop([index]) Removes and returns the item at the given index. If no index is given, it removes the last item. Examples: 2. remove(x) Removes the first occurrence of the specified value x. Raises ValueError if not found. Examples: 3. clear() Removes all elements from the list, making it empty. Examples: 4. del Statement Deletes elements by index or slice (not a method, but a…

  • Dictionaries

    Python Dictionaries: Explanation with Examples A dictionary in Python is an unordered collection of items that stores data in key-value pairs. Dictionaries are: Creating a Dictionary python # Empty dictionary my_dict = {} # Dictionary with initial values student = { “name”: “John Doe”, “age”: 21, “courses”: [“Math”, “Physics”, “Chemistry”], “GPA”: 3.7 } Accessing Dictionary Elements…

  • Real-World Applications of Python Lists

    Python lists and their methods are used extensively in real-time applications across various domains. They are fundamental for organizing and manipulating ordered collections of data. Real-World Applications of Python Lists 1. Web Development In web development, lists are crucial for handling dynamic data. For example, a list can store user comments on a post, products…

  • The print() Function in Python

    The print() Function in Python: Complete Guide The print() function is Python’s built-in function for outputting data to the standard output (usually the console). Let’s explore all its arguments and capabilities in detail. Basic Syntax python print(*objects, sep=’ ‘, end=’\n’, file=sys.stdout, flush=False) Arguments Explained 1. *objects (Positional Arguments) The values to print. You can pass multiple items separated by commas. Examples:…

Leave a Reply

Your email address will not be published. Required fields are marked *