Negative lookbehind assertion
A negative lookbehind assertion in Python’s re module is a zero-width assertion that checks if a pattern is not present immediately before the current position. It is written as (?<!...). It’s the opposite of a positive lookbehind and allows you to exclude matches based on what precedes them.
Similar to the positive lookbehind, the pattern inside (?<!...) must be of a fixed length.
Example: Finding Words Not Preceded by a Specific Word
Let’s say you want to find all numbers in a string that are not preceded by the word “price:”. You want to match the numbers but not the preceding text.
Python
import re
text = "The cost: 50. The price: 20."
# This pattern looks for one or more digits (\d+) that are NOT preceded by "price: ".
pattern = r'(?<!price: )\d+'
matches = re.findall(pattern, text)
print(matches)
- Output:
['50']
How it Works: Step-by-Step
\d+: This part of the pattern looks for one or more digits (50,20).(?<!price: ): This is the negative lookbehind assertion.- After matching
50, the regex engine “looks behind” to see if the preceding characters are “price: “. Since they are not, the assertion isTrue, and50is returned as a match. - After matching
20, the engine looks behind and sees “price: ” is present. The negative lookbehind assertion fails, and20is not returned as a match.
- After matching
This is a powerful way to exclude specific cases from your search results. It allows you to define a set of conditions that must not exist before the desired match.
1. Finding a filename extension that is not preceded by a specific word
Let’s say you have a list of filenames and you want to find all .txt files that are not part of a “backup” naming convention (e.g., backup.txt).
Python
import re
filenames = "notes.txt, backup.txt, report.txt"
# This pattern looks for a .txt extension that is NOT preceded by the word "backup"
pattern = r'(?<!backup)\.txt'
matches = re.findall(pattern, filenames)
print(matches)
- Output:
['.txt', '.txt']
How it works
- The pattern looks for the literal string
.txt. - The lookbehind
(?<!backup)checks if the preceding word is “backup”. - The first
.txtinnotes.txtis found, and since “backup” does not precede it, the match is successful. - The
.txtinbackup.txtis preceded by “backup,” so the negative lookbehind fails, and the match is ignored. - The
.txtinreport.txtis not preceded by “backup,” so the match is successful.
2. Extracting prices not in a specific currency
Imagine you have a list of prices and you want to extract all the numbers that are not preceded by a dollar sign ($).
Python
import re
text = "The cost is $50. The value is €10. The price is $200. The discount is 5."
# This pattern matches any number that is NOT preceded by a dollar sign ($)
pattern = r'(?<!\$)\d+'
matches = re.findall(pattern, text)
print(matches)
- Output:
['10', '5']
How it works
- The pattern
\d+looks for one or more digits. - The negative lookbehind
(?<!\$)checks that a dollar sign does not precede the digits. - The numbers
10and5meet this condition and are returned in the list. - The numbers
50and200are ignored because they are preceded by a dollar sign, causing the negative lookbehind to fail.