Special Sequences in Python

Special Sequences in Python Regular Expressions – Detailed Explanation

Special sequences are escape sequences that represent specific character types or positions in regex patterns.

1. \A – Start of String Anchor

Description: Matches only at the absolute start of the string (unaffected by re.MULTILINE flag)

Example 1: Match only at absolute beginning

python

import re

text = "Start here\nStart again"
result = re.findall(r'\AStart', text)
print(result)  # ['Start'] - Only matches at very beginning

Example 2: Comparison with ^ in multiline mode

python

text = "First line\nSecond line"
result_caret = re.findall(r'^.*', text, re.MULTILINE)
result_A = re.findall(r'\A.*', text)
print("^ matches:", result_caret)  # ['First line', 'Second line']
print(r"\A matches:", result_A)    # ['First line'] - Only first line

2. \Z – End of String Anchor

Description: Matches only at the absolute end of the string (unaffected by re.MULTILINE flag)

Example 1: Match only at absolute end

python

text = "Line one\nLine two\nEnd"
result = re.findall(r'End\Z', text)
print(result)  # ['End'] - Only matches at very end

Example 2: Validate file extension at end

python

filenames = ["document.txt", "image.jpg", "script.py.backup"]
valid_extensions = [f for f in filenames if re.search(r'\.txt\Z', f)]
print(valid_extensions)  # ['document.txt'] - Only exact .txt at end

3. \n – Newline Character

Description: Matches a newline character (line feed)

Example 1: Count lines in text

python

text = "Line 1\nLine 2\nLine 3\nLine 4"
line_count = len(re.findall(r'\n', text)) + 1
print("Number of lines:", line_count)  # 4

Example 2: Split on newlines

python

text = "Hello\nWorld\nPython\nProgramming"
lines = re.split(r'\n', text)
print(lines)  # ['Hello', 'World', 'Python', 'Programming']

4. \t – Tab Character

Description: Matches a tab character

Example 1: Find tab-separated values

python

text = "Name\tAge\tCity\nJohn\t25\tNYC\nJane\t30\tLA"
rows = re.split(r'\n', text)
for row in rows:
    columns = re.split(r'\t', row)
    print(columns)
# Output: ['Name', 'Age', 'City'], ['John', '25', 'NYC'], ['Jane', '30', 'LA']

Example 2: Replace tabs with spaces

python

text = "Column1\tColumn2\tColumn3"
spaced_text = re.sub(r'\t', '    ', text)  # Replace tabs with 4 spaces
print(spaced_text)  # "Column1    Column2    Column3"

5. \r – Carriage Return

Description: Matches a carriage return character

Example 1: Handle Windows line endings

python

text = "Line 1\r\nLine 2\r\nLine 3"
# Remove carriage returns
clean_text = re.sub(r'\r', '', text)
print(repr(clean_text))  # 'Line 1\nLine 2\nLine 3'

Example 2: Count carriage returns

python

text = "Hello\rWorld\r\nTest\r"
cr_count = len(re.findall(r'\r', text))
print("Carriage returns:", cr_count)  # 3

6. \f – Form Feed

Description: Matches a form feed character (rarely used in modern text)

Example 1: Find form feed characters

python

text = "Page 1\fPage 2\fPage 3"
page_breaks = re.findall(r'\f', text)
print("Form feeds found:", len(page_breaks))  # 2

Example 2: Split on form feeds

python

text = "Section 1\fSection 2\fSection 3"
sections = re.split(r'\f', text)
print(sections)  # ['Section 1', 'Section 2', 'Section 3']

7. \v – Vertical Tab

Description: Matches a vertical tab character (rarely used)

Example 1: Detect vertical tabs

python

text = "Row1\vRow2\vRow3"
vertical_tabs = re.findall(r'\v', text)
print("Vertical tabs:", len(vertical_tabs))  # 2

Example 2: Replace vertical tabs

python

text = "Item1\vItem2\vItem3"
clean_text = re.sub(r'\v', '\n', text)  # Replace with newlines
print(clean_text)  # "Item1\nItem2\nItem3"

Practical Examples with Multiple Special Sequences

Example: Normalize different line endings

python

def normalize_line_endings(text):
    """Convert all line endings to Unix style (\n)"""
    # Replace Windows (\r\n) and old Mac (\r) line endings
    text = re.sub(r'\r\n', '\n', text)  # Windows to Unix
    text = re.sub(r'\r', '\n', text)    # Old Mac to Unix
    return text

mixed_text = "Line 1\r\nLine 2\nLine 3\rLine 4"
normalized = normalize_line_endings(mixed_text)
print(repr(normalized))  # 'Line 1\nLine 2\nLine 3\nLine 4'

Example: Parse CSV-like data with various whitespace

python

def parse_data(text):
    """Parse data with mixed whitespace separators"""
    # Replace various whitespace with single comma
    text = re.sub(r'[\t\v]+', ',', text)  # Tabs and vertical tabs to commas
    text = re.sub(r'[ ]+', ',', text)     # Multiple spaces to commas
    lines = re.split(r'\n', text)         # Split into lines
    return [line.split(',') for line in lines if line.strip()]

data_text = "Name Age  City\nJohn  25\tNYC\nJane  30  LA"
parsed = parse_data(data_text)
print(parsed)  # [['Name', 'Age', 'City'], ['John', '25', 'NYC'], ['Jane', '30', 'LA']]

Example: Validate complete string patterns

python

def is_valid_pattern(text, pattern):
    """Check if entire text matches pattern from start to finish"""
    return bool(re.fullmatch(pattern, text))

# Validate email (entire string must match)
emails = ["user@example.com", "invalid@", "@domain.com"]
email_pattern = r'\A[\w\.-]+@[\w\.-]+\.\w+\Z'

for email in emails:
    valid = is_valid_pattern(email, email_pattern)
    print(f"'{email}': {valid}")
# Output: 'user@example.com': True, 'invalid@': False, '@domain.com': False

Example: Extract content between specific markers

python

def extract_sections(text, start_marker, end_marker):
    """Extract content between specific markers"""
    pattern = fr'{re.escape(start_marker)}(.*?){re.escape(end_marker)}'
    return re.findall(pattern, text, re.DOTALL)

text = """
[START_SECTION]
This is section 1
With multiple lines
[END_SECTION]

[START_SECTION]
Section 2 content
[END_SECTION]
"""

sections = extract_sections(text, "[START_SECTION]", "[END_SECTION]")
for i, section in enumerate(sections, 1):
    print(f"Section {i}:\n{section.strip()}\n")

Important Notes

  1. String vs. Pattern: Special sequences work in regex patterns, not in ordinary strings
  2. Raw strings: Always use raw strings (r'pattern') to avoid interpretation of backslashes
  3. Multiline mode\A and \Z are unaffected by re.MULTILINE, unlike ^ and $
  4. Platform differences: Line ending characters vary by platform (\n on Unix, \r\n on Windows, \r on old Mac)

python

# Platform-aware line ending handling
def get_line_endings(text):
    """Detect the predominant line ending style"""
    crlf_count = len(re.findall(r'\r\n', text))
    lf_count = len(re.findall(r'(?<!\r)\n', text))  # \n not preceded by \r
    cr_count = len(re.findall(r'\r(?!\n)', text))   # \r not followed by \n
    
    if crlf_count > max(lf_count, cr_count):
        return 'Windows (\\r\\n)'
    elif lf_count > max(crlf_count, cr_count):
        return 'Unix (\\n)'
    elif cr_count > max(crlf_count, lf_count):
        return 'Old Mac (\\r)'
    else:
        return 'Mixed or none'

text = "Line 1\r\nLine 2\nLine 3\rLine 4"
print("Line ending style:", get_line_endings(text))

Similar Posts

  • re Programs

    The regular expression r’;\s*(.*?);’ is used to find and extract text that is located between two semicolons. In summary, this expression finds a semicolon, then non-greedily captures all characters up to the next semicolon. This is an effective way to extract the middle value from a semicolon-separated string. Title 1 to 25 chars The regular…

  • binary files

    # Read the original image and write to a new file original_file = open(‘image.jpg’, ‘rb’) # ‘rb’ = read binary copy_file = open(‘image_copy.jpg’, ‘wb’) # ‘wb’ = write binary # Read and write in chunks to handle large files while True: chunk = original_file.read(4096) # Read 4KB at a time if not chunk: break copy_file.write(chunk)…

  • Bank Account Class with Minimum Balance

    Challenge Summary: Bank Account Class with Minimum Balance Objective: Create a BankAccount class that automatically assigns account numbers and enforces a minimum balance rule. 1. Custom Exception Class python class MinimumBalanceError(Exception): “””Custom exception for minimum balance violation””” pass 2. BankAccount Class Requirements Properties: Methods: __init__(self, name, initial_balance) deposit(self, amount) withdraw(self, amount) show_details(self) 3. Key Rules: 4. Testing…

  • Generalization vs. Specialization

    Object-Oriented Programming: Generalization vs. Specialization Introduction Inheritance in OOP serves two primary purposes: Let’s explore these concepts with clear examples. 1. Specialization (Extending Functionality) Specialization involves creating a new class that inherits all features from a parent class and then adds new, specific features. The core idea is reusability—you build upon what already exists. Key Principle: Child Class =…

  • group() and groups()

    Python re group() and groups() Methods Explained The group() and groups() methods are used with match objects to extract captured groups from regex patterns. They work on the result of re.search(), re.match(), or re.finditer(). group() Method groups() Method Example 1: Basic Group Extraction python import retext = “John Doe, age 30, email: john.doe@email.com”# Pattern with multiple capture groupspattern = r'(\w+)\s+(\w+),\s+age\s+(\d+),\s+email:\s+([\w.]+@[\w.]+)’///The Pattern: r'(\w+)\s+(\w+),\s+age\s+(\d+),\s+email:\s+([\w.]+@[\w.]+)’Breakdown by Capture…

  • Functions Returning Functions

    Understanding Functions Returning Functions In Python, functions can return other functions, which is a powerful feature of functional programming. Basic Example python def outer(): def inner(): print(“Welcome!”) return inner # Return the inner function (without calling it) # Calling outer() returns the inner function f = outer() # f now refers to the inner function…

Leave a Reply

Your email address will not be published. Required fields are marked *