Special Sequences in Python Regex
1. \A and \Z – String Anchors
\A – Matches ONLY at the beginning of the string
\Z – Matches ONLY at the end of the string
Example 1: \A – Start of string
python
import re
text = "Hello World\nHello Python"
pattern = r"\AHello" # Match "Hello" only at the VERY beginning
matches = re.findall(pattern, text)
print("\\A matches:", matches) # Output: ['Hello']
# Compare with ^ which matches start of each line
pattern_caret = r"^Hello"
matches_caret = re.findall(pattern_caret, text, re.MULTILINE)
print("^ matches:", matches_caret) # Output: ['Hello', 'Hello']
Example 2: \Z – End of string
python
import re
text = "Hello World\nGoodbye World"
pattern = r"World\Z" # Match "World" only at the VERY end
matches = re.findall(pattern, text)
print("\\Z matches:", matches) # Output: ['World']
# Compare with $ which matches end of each line
pattern_dollar = r"World$"
matches_dollar = re.findall(pattern_dollar, text, re.MULTILINE)
print("$ matches:", matches_dollar) # Output: ['World', 'World']
2. \b and \B – Word Boundaries
\b – Word boundary (between word and non-word characters)
\B – Non-word boundary (within words or between non-word characters)
Example 1: \b – Word boundaries
python
import re
text = "cat category catfish concat"
pattern = r"\bcat\b" # Match "cat" as a whole word only
matches = re.findall(pattern, text)
print("\\b matches:", matches) # Output: ['cat']
pattern_no_boundary = r"cat"
matches_no_boundary = re.findall(pattern_no_boundary, text)
print("No boundary:", matches_no_boundary) # Output: ['cat', 'cat', 'cat', 'cat']
Example 2: \B – Non-word boundaries
python
import re
text = "cat category catfish concat"
pattern = r"\Bcat\B" # Match "cat" only when surrounded by word characters
matches = re.findall(pattern, text)
print("\\B matches:", matches) # Output: ['cat'] (from "category")
pattern_start = r"\Bcat" # "cat" not at word start
matches_start = re.findall(pattern_start, text)
print("\\B at start:", matches_start) # Output: ['cat'] (from "concat")
3. \d and \D – Digits and Non-digits
\d – Any digit (0-9) – equivalent to [0-9]
\D – Any NON-digit – equivalent to [^0-9]
Example 1: \d – Digits only
python
import re
text = "Room 25B, Price $199.99, Phone: 555-1234"
pattern = r"\d" # Match any single digit
matches = re.findall(pattern, text)
print("\\d matches:", matches) # Output: ['2', '5', '1', '9', '9', '9', '9', '5', '5', '5', '1', '2', '3', '4']
pattern_multi = r"\d+" # Match sequences of digits
matches_multi = re.findall(pattern_multi, text)
print("\\d+ matches:", matches_multi) # Output: ['25', '199', '99', '555', '1234']
Example 2: \D – Non-digits only
python
import re
text = "Room 25B, Price $199.99"
pattern = r"\D" # Match any single non-digit character
matches = re.findall(pattern, text)
print("\\D matches:", matches) # Output: ['R', 'o', 'o', 'm', ' ', 'B', ',', ' ', 'P', 'r', 'i', 'c', 'e', ' ', '$', '.', '']
pattern_multi = r"\D+" # Match sequences of non-digits
matches_multi = re.findall(pattern_multi, text)
print("\\D+ matches:", matches_multi) # Output: ['Room ', 'B, Price $', '.', '']
4. \s and \S – Whitespace and Non-whitespace
\s – Any whitespace character (space, tab, newline, etc.)
\S – Any NON-whitespace character
Example 1: \s – Whitespace characters
python
import re
text = "Hello\tWorld\nPython Programming"
pattern = r"\s" # Match any whitespace character
matches = re.findall(pattern, text)
print("\\s matches:", matches) # Output: [' ', '\t', '\n', ' ']
# Show whitespace types
for match in matches:
print(f"Whitespace: {repr(match)}")
Example 2: \S – Non-whitespace characters
python
import re
text = "Hello\tWorld\nPython Programming"
pattern = r"\S" # Match any non-whitespace character
matches = re.findall(pattern, text)
print("\\S matches:", matches) # Output: ['H', 'e', 'l', 'l', 'o', 'W', 'o', 'r', 'l', 'd', 'P', 'y', 't', 'h', 'o', 'n', 'P', 'r', 'o', 'g', 'r', 'a', 'm', 'm', 'i', 'n', 'g']
pattern_multi = r"\S+" # Match words (sequences of non-whitespace)
matches_multi = re.findall(pattern_multi, text)
print("\\S+ matches:", matches_multi) # Output: ['Hello', 'World', 'Python', 'Programming']
5. \w and \W – Word and Non-word Characters
\w – Any word character (letters, digits, underscore) – equivalent to [a-zA-Z0-9_]
\W – Any NON-word character
Example 1: \w – Word characters
python
import re
text = "User_id: john_doe123, Email: test@example.com"
pattern = r"\w" # Match any single word character
matches = re.findall(pattern, text)
print("\\w matches:", matches) # Output: ['U', 's', 'e', 'r', '_', 'i', 'd', 'j', 'o', 'h', 'n', '_', 'd', 'o', 'e', '1', '2', '3', 'E', 'm', 'a', 'i', 'l', 't', 'e', 's', 't', 'e', 'x', 'a', 'm', 'p', 'l', 'e', 'c', 'o', 'm']
pattern_multi = r"\w+" # Match words
matches_multi = re.findall(pattern_multi, text)
print("\\w+ matches:", matches_multi) # Output: ['User_id', 'john_doe123', 'Email', 'test', 'example', 'com']
Example 2: \W – Non-word characters
python
import re
text = "User_id: john_doe123, Email: test@example.com"
pattern = r"\W" # Match any single non-word character
matches = re.findall(pattern, text)
print("\\W matches:", matches) # Output: [':', ' ', ',', ' ', ':', ' ', '@', '.']
pattern_multi = r"\W+" # Match sequences of non-word characters
matches_multi = re.findall(pattern_multi, text)
print("\\W+ matches:", matches_multi) # Output: [': ', ', Email: ', '@', '.']