Special Sequences in Python Regex

1. \A and \Z – String Anchors

\A – Matches ONLY at the beginning of the string

\Z – Matches ONLY at the end of the string

Example 1: \A – Start of string

python

import re

text = "Hello World\nHello Python"
pattern = r"\AHello"  # Match "Hello" only at the VERY beginning

matches = re.findall(pattern, text)
print("\\A matches:", matches)  # Output: ['Hello']

# Compare with ^ which matches start of each line
pattern_caret = r"^Hello"
matches_caret = re.findall(pattern_caret, text, re.MULTILINE)
print("^ matches:", matches_caret)  # Output: ['Hello', 'Hello']

Example 2: \Z – End of string

python

import re

text = "Hello World\nGoodbye World"
pattern = r"World\Z"  # Match "World" only at the VERY end

matches = re.findall(pattern, text)
print("\\Z matches:", matches)  # Output: ['World']

# Compare with $ which matches end of each line
pattern_dollar = r"World$"
matches_dollar = re.findall(pattern_dollar, text, re.MULTILINE)
print("$ matches:", matches_dollar)  # Output: ['World', 'World']

2. \b and \B – Word Boundaries

\b – Word boundary (between word and non-word characters)

\B – Non-word boundary (within words or between non-word characters)

Example 1: \b – Word boundaries

python

import re

text = "cat category catfish concat"
pattern = r"\bcat\b"  # Match "cat" as a whole word only

matches = re.findall(pattern, text)
print("\\b matches:", matches)  # Output: ['cat']

pattern_no_boundary = r"cat"
matches_no_boundary = re.findall(pattern_no_boundary, text)
print("No boundary:", matches_no_boundary)  # Output: ['cat', 'cat', 'cat', 'cat']

Example 2: \B – Non-word boundaries

python

import re

text = "cat category catfish concat"
pattern = r"\Bcat\B"  # Match "cat" only when surrounded by word characters

matches = re.findall(pattern, text)
print("\\B matches:", matches)  # Output: ['cat'] (from "category")

pattern_start = r"\Bcat"  # "cat" not at word start
matches_start = re.findall(pattern_start, text)
print("\\B at start:", matches_start)  # Output: ['cat'] (from "concat")

3. \d and \D – Digits and Non-digits

\d – Any digit (0-9) – equivalent to [0-9]

\D – Any NON-digit – equivalent to [^0-9]

Example 1: \d – Digits only

python

import re

text = "Room 25B, Price $199.99, Phone: 555-1234"
pattern = r"\d"  # Match any single digit

matches = re.findall(pattern, text)
print("\\d matches:", matches)  # Output: ['2', '5', '1', '9', '9', '9', '9', '5', '5', '5', '1', '2', '3', '4']

pattern_multi = r"\d+"  # Match sequences of digits
matches_multi = re.findall(pattern_multi, text)
print("\\d+ matches:", matches_multi)  # Output: ['25', '199', '99', '555', '1234']

Example 2: \D – Non-digits only

python

import re

text = "Room 25B, Price $199.99"
pattern = r"\D"  # Match any single non-digit character

matches = re.findall(pattern, text)
print("\\D matches:", matches)  # Output: ['R', 'o', 'o', 'm', ' ', 'B', ',', ' ', 'P', 'r', 'i', 'c', 'e', ' ', '$', '.', '']

pattern_multi = r"\D+"  # Match sequences of non-digits
matches_multi = re.findall(pattern_multi, text)
print("\\D+ matches:", matches_multi)  # Output: ['Room ', 'B, Price $', '.', '']

4. \s and \S – Whitespace and Non-whitespace

\s – Any whitespace character (space, tab, newline, etc.)

\S – Any NON-whitespace character

Example 1: \s – Whitespace characters

python

import re

text = "Hello\tWorld\nPython  Programming"
pattern = r"\s"  # Match any whitespace character

matches = re.findall(pattern, text)
print("\\s matches:", matches)  # Output: [' ', '\t', '\n', ' ']

# Show whitespace types
for match in matches:
    print(f"Whitespace: {repr(match)}")

Example 2: \S – Non-whitespace characters

python

import re

text = "Hello\tWorld\nPython  Programming"
pattern = r"\S"  # Match any non-whitespace character

matches = re.findall(pattern, text)
print("\\S matches:", matches)  # Output: ['H', 'e', 'l', 'l', 'o', 'W', 'o', 'r', 'l', 'd', 'P', 'y', 't', 'h', 'o', 'n', 'P', 'r', 'o', 'g', 'r', 'a', 'm', 'm', 'i', 'n', 'g']

pattern_multi = r"\S+"  # Match words (sequences of non-whitespace)
matches_multi = re.findall(pattern_multi, text)
print("\\S+ matches:", matches_multi)  # Output: ['Hello', 'World', 'Python', 'Programming']

5. \w and \W – Word and Non-word Characters

\w – Any word character (letters, digits, underscore) – equivalent to [a-zA-Z0-9_]

\W – Any NON-word character

Example 1: \w – Word characters

python

import re

text = "User_id: john_doe123, Email: test@example.com"
pattern = r"\w"  # Match any single word character

matches = re.findall(pattern, text)
print("\\w matches:", matches)  # Output: ['U', 's', 'e', 'r', '_', 'i', 'd', 'j', 'o', 'h', 'n', '_', 'd', 'o', 'e', '1', '2', '3', 'E', 'm', 'a', 'i', 'l', 't', 'e', 's', 't', 'e', 'x', 'a', 'm', 'p', 'l', 'e', 'c', 'o', 'm']

pattern_multi = r"\w+"  # Match words
matches_multi = re.findall(pattern_multi, text)
print("\\w+ matches:", matches_multi)  # Output: ['User_id', 'john_doe123', 'Email', 'test', 'example', 'com']

Example 2: \W – Non-word characters

python

import re

text = "User_id: john_doe123, Email: test@example.com"
pattern = r"\W"  # Match any single non-word character

matches = re.findall(pattern, text)
print("\\W matches:", matches)  # Output: [':', ' ', ',', ' ', ':', ' ', '@', '.']

pattern_multi = r"\W+"  # Match sequences of non-word characters
matches_multi = re.findall(pattern_multi, text)
print("\\W+ matches:", matches_multi)  # Output: [': ', ', Email: ', '@', '.']

Similar Posts

  • Formatted printing

    C-Style String Formatting in Python Python supports C-style string formatting using the % operator, which provides similar functionality to C’s printf() function. This method is sometimes called “old-style” string formatting but remains useful in many scenarios. Basic Syntax python “format string” % (values) Control Characters (Format Specifiers) Format Specifier Description Example Output %s String “%s” % “hello” hello %d…

  • Challenge Summary: Inheritance – Polygon and Triangle Classes

    Challenge Summary: Inheritance – Polygon and Triangle Classes Objective: Create two classes where Triangle inherits from Polygon and calculates area using Heron’s formula. 1. Polygon Class (Base Class) Properties: Methods: __init__(self, num_sides, *sides) python class Polygon: def __init__(self, num_sides, *sides): self.number_of_sides = num_sides self.sides = list(sides) 2. Triangle Class (Derived Class) Inheritance: Methods: __init__(self, *sides) area(self) python import math…

  • Python Exception Handling – Basic Examples

    1. Basic try-except Block python # Basic exception handlingtry: num = int(input(“Enter a number: “)) result = 10 / num print(f”Result: {result}”)except: print(“Something went wrong!”) Example 1: Division with Zero Handling python # Handling division by zero error try: num1 = int(input(“Enter first number: “)) num2 = int(input(“Enter second number: “)) result = num1 /…

  • Strings in Python Indexing,Traversal

    Strings in Python and Indexing Strings in Python are sequences of characters enclosed in single quotes (‘ ‘), double quotes (” “), or triple quotes (”’ ”’ or “”” “””). They are immutable sequences of Unicode code points used to represent text. String Characteristics Creating Strings python single_quoted = ‘Hello’ double_quoted = “World” triple_quoted = ”’This is…

Leave a Reply

Your email address will not be published. Required fields are marked *