Special Sequences in Python Regex

1. \A and \Z – String Anchors

\A – Matches ONLY at the beginning of the string

\Z – Matches ONLY at the end of the string

Example 1: \A – Start of string

python

import re

text = "Hello World\nHello Python"
pattern = r"\AHello"  # Match "Hello" only at the VERY beginning

matches = re.findall(pattern, text)
print("\\A matches:", matches)  # Output: ['Hello']

# Compare with ^ which matches start of each line
pattern_caret = r"^Hello"
matches_caret = re.findall(pattern_caret, text, re.MULTILINE)
print("^ matches:", matches_caret)  # Output: ['Hello', 'Hello']

Example 2: \Z – End of string

python

import re

text = "Hello World\nGoodbye World"
pattern = r"World\Z"  # Match "World" only at the VERY end

matches = re.findall(pattern, text)
print("\\Z matches:", matches)  # Output: ['World']

# Compare with $ which matches end of each line
pattern_dollar = r"World$"
matches_dollar = re.findall(pattern_dollar, text, re.MULTILINE)
print("$ matches:", matches_dollar)  # Output: ['World', 'World']

2. \b and \B – Word Boundaries

\b – Word boundary (between word and non-word characters)

\B – Non-word boundary (within words or between non-word characters)

Example 1: \b – Word boundaries

python

import re

text = "cat category catfish concat"
pattern = r"\bcat\b"  # Match "cat" as a whole word only

matches = re.findall(pattern, text)
print("\\b matches:", matches)  # Output: ['cat']

pattern_no_boundary = r"cat"
matches_no_boundary = re.findall(pattern_no_boundary, text)
print("No boundary:", matches_no_boundary)  # Output: ['cat', 'cat', 'cat', 'cat']

Example 2: \B – Non-word boundaries

python

import re

text = "cat category catfish concat"
pattern = r"\Bcat\B"  # Match "cat" only when surrounded by word characters

matches = re.findall(pattern, text)
print("\\B matches:", matches)  # Output: ['cat'] (from "category")

pattern_start = r"\Bcat"  # "cat" not at word start
matches_start = re.findall(pattern_start, text)
print("\\B at start:", matches_start)  # Output: ['cat'] (from "concat")

3. \d and \D – Digits and Non-digits

\d – Any digit (0-9) – equivalent to [0-9]

\D – Any NON-digit – equivalent to [^0-9]

Example 1: \d – Digits only

python

import re

text = "Room 25B, Price $199.99, Phone: 555-1234"
pattern = r"\d"  # Match any single digit

matches = re.findall(pattern, text)
print("\\d matches:", matches)  # Output: ['2', '5', '1', '9', '9', '9', '9', '5', '5', '5', '1', '2', '3', '4']

pattern_multi = r"\d+"  # Match sequences of digits
matches_multi = re.findall(pattern_multi, text)
print("\\d+ matches:", matches_multi)  # Output: ['25', '199', '99', '555', '1234']

Example 2: \D – Non-digits only

python

import re

text = "Room 25B, Price $199.99"
pattern = r"\D"  # Match any single non-digit character

matches = re.findall(pattern, text)
print("\\D matches:", matches)  # Output: ['R', 'o', 'o', 'm', ' ', 'B', ',', ' ', 'P', 'r', 'i', 'c', 'e', ' ', '$', '.', '']

pattern_multi = r"\D+"  # Match sequences of non-digits
matches_multi = re.findall(pattern_multi, text)
print("\\D+ matches:", matches_multi)  # Output: ['Room ', 'B, Price $', '.', '']

4. \s and \S – Whitespace and Non-whitespace

\s – Any whitespace character (space, tab, newline, etc.)

\S – Any NON-whitespace character

Example 1: \s – Whitespace characters

python

import re

text = "Hello\tWorld\nPython  Programming"
pattern = r"\s"  # Match any whitespace character

matches = re.findall(pattern, text)
print("\\s matches:", matches)  # Output: [' ', '\t', '\n', ' ']

# Show whitespace types
for match in matches:
    print(f"Whitespace: {repr(match)}")

Example 2: \S – Non-whitespace characters

python

import re

text = "Hello\tWorld\nPython  Programming"
pattern = r"\S"  # Match any non-whitespace character

matches = re.findall(pattern, text)
print("\\S matches:", matches)  # Output: ['H', 'e', 'l', 'l', 'o', 'W', 'o', 'r', 'l', 'd', 'P', 'y', 't', 'h', 'o', 'n', 'P', 'r', 'o', 'g', 'r', 'a', 'm', 'm', 'i', 'n', 'g']

pattern_multi = r"\S+"  # Match words (sequences of non-whitespace)
matches_multi = re.findall(pattern_multi, text)
print("\\S+ matches:", matches_multi)  # Output: ['Hello', 'World', 'Python', 'Programming']

5. \w and \W – Word and Non-word Characters

\w – Any word character (letters, digits, underscore) – equivalent to [a-zA-Z0-9_]

\W – Any NON-word character

Example 1: \w – Word characters

python

import re

text = "User_id: john_doe123, Email: test@example.com"
pattern = r"\w"  # Match any single word character

matches = re.findall(pattern, text)
print("\\w matches:", matches)  # Output: ['U', 's', 'e', 'r', '_', 'i', 'd', 'j', 'o', 'h', 'n', '_', 'd', 'o', 'e', '1', '2', '3', 'E', 'm', 'a', 'i', 'l', 't', 'e', 's', 't', 'e', 'x', 'a', 'm', 'p', 'l', 'e', 'c', 'o', 'm']

pattern_multi = r"\w+"  # Match words
matches_multi = re.findall(pattern_multi, text)
print("\\w+ matches:", matches_multi)  # Output: ['User_id', 'john_doe123', 'Email', 'test', 'example', 'com']

Example 2: \W – Non-word characters

python

import re

text = "User_id: john_doe123, Email: test@example.com"
pattern = r"\W"  # Match any single non-word character

matches = re.findall(pattern, text)
print("\\W matches:", matches)  # Output: [':', ' ', ',', ' ', ':', ' ', '@', '.']

pattern_multi = r"\W+"  # Match sequences of non-word characters
matches_multi = re.findall(pattern_multi, text)
print("\\W+ matches:", matches_multi)  # Output: [': ', ', Email: ', '@', '.']

Special Sequences in Python Regex

1. \A and \Z – String Anchors

\A – Matches ONLY at the beginning of the string

\Z – Matches ONLY at the end of the string

2. \b and \B – Word Boundaries

\b – Word boundary (between word and non-word characters)

\B – Non-word boundary (within words or between non-word characters)

3. \d and \D – Digits and Non-digits

\d – Any digit (0-9) – equivalent to [0-9]

\D – Any NON-digit – equivalent to [^0-9]

4. \s and \S – Whitespace and Non-whitespace

\s – Any whitespace character (space, tab, newline, etc.)

\S – Any NON-whitespace character

5. \w and \W – Word and Non-word Characters

\w – Any word character (letters, digits, underscore) – equivalent to [a-zA-Z0-9_]

\W – Any NON-word character

Demo And course Content

Nested for loops, break, continue, and pass in for loops

Special Character Classes Explained with Examples

What is Quantum Computing? A Beginner’s Guide to the Future of Technology

Raw Strings in Python

List Comprehensions

Leave a Reply Cancel reply

1. \A and \Z – String Anchors

\A – Matches ONLY at the beginning of the string

\Z – Matches ONLY at the end of the string

2. \b and \B – Word Boundaries

\b – Word boundary (between word and non-word characters)

\B – Non-word boundary (within words or between non-word characters)

3. \d and \D – Digits and Non-digits

\d – Any digit (0-9) – equivalent to [0-9]

\D – Any NON-digit – equivalent to [^0-9]

4. \s and \S – Whitespace and Non-whitespace

\s – Any whitespace character (space, tab, newline, etc.)

\S – Any NON-whitespace character

5. \w and \W – Word and Non-word Characters

\w – Any word character (letters, digits, underscore) – equivalent to [a-zA-Z0-9_]

\W – Any NON-word character

Similar Posts

Leave a Reply Cancel reply