Special Sequences in Python

Special Sequences in Python Regular Expressions – Detailed Explanation

Special sequences are escape sequences that represent specific character types or positions in regex patterns.

1. \A – Start of String Anchor

Description: Matches only at the absolute start of the string (unaffected by re.MULTILINE flag)

Example 1: Match only at absolute beginning

python

import re

text = "Start here\nStart again"
result = re.findall(r'\AStart', text)
print(result)  # ['Start'] - Only matches at very beginning

Example 2: Comparison with ^ in multiline mode

python

text = "First line\nSecond line"
result_caret = re.findall(r'^.*', text, re.MULTILINE)
result_A = re.findall(r'\A.*', text)
print("^ matches:", result_caret)  # ['First line', 'Second line']
print(r"\A matches:", result_A)    # ['First line'] - Only first line

2. \Z – End of String Anchor

Description: Matches only at the absolute end of the string (unaffected by re.MULTILINE flag)

Example 1: Match only at absolute end

python

text = "Line one\nLine two\nEnd"
result = re.findall(r'End\Z', text)
print(result)  # ['End'] - Only matches at very end

Example 2: Validate file extension at end

python

filenames = ["document.txt", "image.jpg", "script.py.backup"]
valid_extensions = [f for f in filenames if re.search(r'\.txt\Z', f)]
print(valid_extensions)  # ['document.txt'] - Only exact .txt at end

3. \n – Newline Character

Description: Matches a newline character (line feed)

Example 1: Count lines in text

python

text = "Line 1\nLine 2\nLine 3\nLine 4"
line_count = len(re.findall(r'\n', text)) + 1
print("Number of lines:", line_count)  # 4

Example 2: Split on newlines

python

text = "Hello\nWorld\nPython\nProgramming"
lines = re.split(r'\n', text)
print(lines)  # ['Hello', 'World', 'Python', 'Programming']

4. \t – Tab Character

Description: Matches a tab character

Example 1: Find tab-separated values

python

text = "Name\tAge\tCity\nJohn\t25\tNYC\nJane\t30\tLA"
rows = re.split(r'\n', text)
for row in rows:
    columns = re.split(r'\t', row)
    print(columns)
# Output: ['Name', 'Age', 'City'], ['John', '25', 'NYC'], ['Jane', '30', 'LA']

Example 2: Replace tabs with spaces

python

text = "Column1\tColumn2\tColumn3"
spaced_text = re.sub(r'\t', '    ', text)  # Replace tabs with 4 spaces
print(spaced_text)  # "Column1    Column2    Column3"

5. \r – Carriage Return

Description: Matches a carriage return character

Example 1: Handle Windows line endings

python

text = "Line 1\r\nLine 2\r\nLine 3"
# Remove carriage returns
clean_text = re.sub(r'\r', '', text)
print(repr(clean_text))  # 'Line 1\nLine 2\nLine 3'

Example 2: Count carriage returns

python

text = "Hello\rWorld\r\nTest\r"
cr_count = len(re.findall(r'\r', text))
print("Carriage returns:", cr_count)  # 3

6. \f – Form Feed

Description: Matches a form feed character (rarely used in modern text)

Example 1: Find form feed characters

python

text = "Page 1\fPage 2\fPage 3"
page_breaks = re.findall(r'\f', text)
print("Form feeds found:", len(page_breaks))  # 2

Example 2: Split on form feeds

python

text = "Section 1\fSection 2\fSection 3"
sections = re.split(r'\f', text)
print(sections)  # ['Section 1', 'Section 2', 'Section 3']

7. \v – Vertical Tab

Description: Matches a vertical tab character (rarely used)

Example 1: Detect vertical tabs

python

text = "Row1\vRow2\vRow3"
vertical_tabs = re.findall(r'\v', text)
print("Vertical tabs:", len(vertical_tabs))  # 2

Example 2: Replace vertical tabs

python

text = "Item1\vItem2\vItem3"
clean_text = re.sub(r'\v', '\n', text)  # Replace with newlines
print(clean_text)  # "Item1\nItem2\nItem3"

Practical Examples with Multiple Special Sequences

Example: Normalize different line endings

python

def normalize_line_endings(text):
    """Convert all line endings to Unix style (\n)"""
    # Replace Windows (\r\n) and old Mac (\r) line endings
    text = re.sub(r'\r\n', '\n', text)  # Windows to Unix
    text = re.sub(r'\r', '\n', text)    # Old Mac to Unix
    return text

mixed_text = "Line 1\r\nLine 2\nLine 3\rLine 4"
normalized = normalize_line_endings(mixed_text)
print(repr(normalized))  # 'Line 1\nLine 2\nLine 3\nLine 4'

Example: Parse CSV-like data with various whitespace

python

def parse_data(text):
    """Parse data with mixed whitespace separators"""
    # Replace various whitespace with single comma
    text = re.sub(r'[\t\v]+', ',', text)  # Tabs and vertical tabs to commas
    text = re.sub(r'[ ]+', ',', text)     # Multiple spaces to commas
    lines = re.split(r'\n', text)         # Split into lines
    return [line.split(',') for line in lines if line.strip()]

data_text = "Name Age  City\nJohn  25\tNYC\nJane  30  LA"
parsed = parse_data(data_text)
print(parsed)  # [['Name', 'Age', 'City'], ['John', '25', 'NYC'], ['Jane', '30', 'LA']]

Example: Validate complete string patterns

python

def is_valid_pattern(text, pattern):
    """Check if entire text matches pattern from start to finish"""
    return bool(re.fullmatch(pattern, text))

# Validate email (entire string must match)
emails = ["user@example.com", "invalid@", "@domain.com"]
email_pattern = r'\A[\w\.-]+@[\w\.-]+\.\w+\Z'

for email in emails:
    valid = is_valid_pattern(email, email_pattern)
    print(f"'{email}': {valid}")
# Output: 'user@example.com': True, 'invalid@': False, '@domain.com': False

Example: Extract content between specific markers

python

def extract_sections(text, start_marker, end_marker):
    """Extract content between specific markers"""
    pattern = fr'{re.escape(start_marker)}(.*?){re.escape(end_marker)}'
    return re.findall(pattern, text, re.DOTALL)

text = """
[START_SECTION]
This is section 1
With multiple lines
[END_SECTION]

[START_SECTION]
Section 2 content
[END_SECTION]
"""

sections = extract_sections(text, "[START_SECTION]", "[END_SECTION]")
for i, section in enumerate(sections, 1):
    print(f"Section {i}:\n{section.strip()}\n")

Important Notes

  1. String vs. Pattern: Special sequences work in regex patterns, not in ordinary strings
  2. Raw strings: Always use raw strings (r'pattern') to avoid interpretation of backslashes
  3. Multiline mode\A and \Z are unaffected by re.MULTILINE, unlike ^ and $
  4. Platform differences: Line ending characters vary by platform (\n on Unix, \r\n on Windows, \r on old Mac)

python

# Platform-aware line ending handling
def get_line_endings(text):
    """Detect the predominant line ending style"""
    crlf_count = len(re.findall(r'\r\n', text))
    lf_count = len(re.findall(r'(?<!\r)\n', text))  # \n not preceded by \r
    cr_count = len(re.findall(r'\r(?!\n)', text))   # \r not followed by \n
    
    if crlf_count > max(lf_count, cr_count):
        return 'Windows (\\r\\n)'
    elif lf_count > max(crlf_count, cr_count):
        return 'Unix (\\n)'
    elif cr_count > max(crlf_count, lf_count):
        return 'Old Mac (\\r)'
    else:
        return 'Mixed or none'

text = "Line 1\r\nLine 2\nLine 3\rLine 4"
print("Line ending style:", get_line_endings(text))

Similar Posts

  • Quantifiers (Repetition)

    Quantifiers (Repetition) in Python Regular Expressions – Detailed Explanation Basic Quantifiers 1. * – 0 or more occurrences (Greedy) Description: Matches the preceding element zero or more times Example 1: Match zero or more digits python import re text = “123 4567 89″ result = re.findall(r’\d*’, text) print(result) # [‘123’, ”, ‘4567’, ”, ’89’, ”] # Matches…

  • Method Overloading

    Python does not support traditional method overloading in the way languages like C++ or Java do. If you define multiple methods with the same name, the last definition will simply overwrite all previous ones. However, you can achieve the same result—making a single method behave differently based on the number or type of arguments—using Python’s…

  • Classes and Objects in Python

    Classes and Objects in Python What are Classes and Objects? In Python, classes and objects are fundamental concepts of object-oriented programming (OOP). Real-world Analogy Think of a class as a “cookie cutter” and objects as the “cookies” made from it. The cookie cutter defines the shape, and each cookie is an instance of that shape. 1. Using type() function The type() function returns…

  • Generators in Python

    Generators in Python What is a Generator? A generator is a special type of iterator that allows you to iterate over a sequence of values without storing them all in memory at once. Generators generate values on-the-fly (lazy evaluation) using the yield keyword. Key Characteristics Basic Syntax python def generator_function(): yield value1 yield value2 yield value3 Simple Examples Example…

  • Python Comments Tutorial: Single-line, Multi-line & Docstrings

    Comments in Python are lines of text within your code that are ignored by the Python interpreter. They are non-executable statements meant to provide explanations, documentation, or clarifications to yourself and other developers who might read your code. Types of Comments Uses of Comments Best Practices Example Python By using comments effectively, you can make…

Leave a Reply

Your email address will not be published. Required fields are marked *