Negative lookbehind assertion

A negative lookbehind assertion in Python’s re module is a zero-width assertion that checks if a pattern is not present immediately before the current position. It is written as (?<!...). It’s the opposite of a positive lookbehind and allows you to exclude matches based on what precedes them.

Similar to the positive lookbehind, the pattern inside (?<!...) must be of a fixed length.

Example: Finding Words Not Preceded by a Specific Word

Let’s say you want to find all numbers in a string that are not preceded by the word “price:”. You want to match the numbers but not the preceding text.

Python

import re

text = "The cost: 50. The price: 20."

# This pattern looks for one or more digits (\d+) that are NOT preceded by "price: ".
pattern = r'(?<!price: )\d+'

matches = re.findall(pattern, text)

print(matches)
  • Output:
    • ['50']

How it Works: Step-by-Step

  1. \d+: This part of the pattern looks for one or more digits (50, 20).
  2. (?<!price: ): This is the negative lookbehind assertion.
    • After matching 50, the regex engine “looks behind” to see if the preceding characters are “price: “. Since they are not, the assertion is True, and 50 is returned as a match.
    • After matching 20, the engine looks behind and sees “price: ” is present. The negative lookbehind assertion fails, and 20 is not returned as a match.

This is a powerful way to exclude specific cases from your search results. It allows you to define a set of conditions that must not exist before the desired match.

1. Finding a filename extension that is not preceded by a specific word

Let’s say you have a list of filenames and you want to find all .txt files that are not part of a “backup” naming convention (e.g., backup.txt).

Python

import re

filenames = "notes.txt, backup.txt, report.txt"

# This pattern looks for a .txt extension that is NOT preceded by the word "backup"
pattern = r'(?<!backup)\.txt'

matches = re.findall(pattern, filenames)

print(matches)
  • Output:
    • ['.txt', '.txt']

How it works

  • The pattern looks for the literal string .txt.
  • The lookbehind (?<!backup) checks if the preceding word is “backup”.
  • The first .txt in notes.txt is found, and since “backup” does not precede it, the match is successful.
  • The .txt in backup.txt is preceded by “backup,” so the negative lookbehind fails, and the match is ignored.
  • The .txt in report.txt is not preceded by “backup,” so the match is successful.

2. Extracting prices not in a specific currency

Imagine you have a list of prices and you want to extract all the numbers that are not preceded by a dollar sign ($).

Python

import re

text = "The cost is $50. The value is €10. The price is $200. The discount is 5."

# This pattern matches any number that is NOT preceded by a dollar sign ($)
pattern = r'(?<!\$)\d+'

matches = re.findall(pattern, text)

print(matches)
  • Output:
    • ['10', '5']

How it works

  • The pattern \d+ looks for one or more digits.
  • The negative lookbehind (?<!\$) checks that a dollar sign does not precede the digits.
  • The numbers 10 and 5 meet this condition and are returned in the list.
  • The numbers 50 and 200 are ignored because they are preceded by a dollar sign, causing the negative lookbehind to fail.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *