positive lookahead assertion

A positive lookahead assertion in Python’s re module is a zero-width assertion that checks if the pattern that follows it is present, without including that pattern in the overall match. It is written as (?=...).

The key is that it’s a “lookahead”—the regex engine looks ahead in the string to see if the pattern inside the parentheses is there. If it is, the match succeeds, but the lookahead part itself is not consumed or returned as part of the match.

Example: Finding Words Followed by a Specific Word

Let’s say you want to find all numbers in a string that are followed by the word “dollars”, but you only want to match the numbers, not the word “dollars” itself.

Python

import re

text = "The price is 500 dollars, but the tax is 50 dollars."

# This pattern looks for one or more digits (\d+), followed by a positive lookahead
# that checks for a space and the word "dollars".
pattern = r'\d+(?= dollars)'

matches = re.findall(pattern, text)

print(matches)
  • Output:
    • ['500', '50']

How it Works: Step-by-Step

  1. \d+: This part of the pattern matches one or more digits (500, 50).
  2. (?= dollars): This is the positive lookahead assertion.
    • The regex engine, after matching \d+, “looks ahead” to see if the next characters are a space followed by the word “dollars”.
    • Since it finds ” dollars” after 500 and 50, the assertion is True.
    • Crucially, the (?= dollars) part itself is not included in the final match. It just verifies a condition.

Without the positive lookahead, using a simple \d+ dollars pattern would match 500 dollars and 50 dollars, which is not what we wanted.

Why use it?

Positive lookahead assertions are useful for:

  • Constraining a match: Ensuring a pattern is present under specific conditions without including that condition in the match result.
  • Overlapping matches: They can be used to find overlapping matches that might not be possible with standard regex patterns.

Let’s say you want to find all street names in an address string that are followed by “Street” or “St.”, but you only want to match the name of the street itself.

Python

import re

address = "123 Main Street, 45 Elm St., and 67 Oak Avenue."

# This pattern finds any word (\w+) that is immediately followed by a space
# and either "Street" or "St."
pattern = r'\w+(?= Street| St\.)'

matches = re.findall(pattern, address)

print(matches)
  • Output:
    • ['Main', 'Elm']

How it Works

  • \w+: This part matches one or more word characters (letters, numbers, or underscore).
  • (?= Street| St\.): This is the positive lookahead assertion.
    • It looks ahead to see if the matched word is followed by a space and then either the literal string “Street” or “St.”.
    • The | acts as an “OR” condition, checking for either option.
    • The \. is an escaped period, as a period is a special character in regex that matches any character.
    • The assertion is successful for “Main” (which is followed by ” Street”) and “Elm” (followed by ” St.”).
  • The street names “Main” and “Elm” are returned in the list matches, but the “Street” or ” St.” parts are not included, because they were only part of the lookahead check.

1. Finding words that aren’t verbs

Imagine you want to find all occurrences of the word “play” that are not followed by “ing” or “ed” (i.e., you want to find the root word “play” when it’s not a verb in a continuous or past tense).

Python

import re

text = "Let's go play a game. We played cards. He is playing football."

# This pattern matches the word "play" only if it's not followed by "ing" or "ed"
pattern = r'play(?!ing|ed)'

matches = re.findall(pattern, text)

print(matches)
  • Output:
    • ['play']

How it Works

  • The pattern first matches the literal word play.
  • The negative lookahead (?!ing|ed) then checks what immediately follows.
  • It finds “play” in “Let’s go play a game.” and checks what follows. Since ” a” is not “ing” or “ed”, the match is successful.
  • It finds “play” in “We played cards.” and checks what follows. The ed is present, so the negative lookahead fails, and “play” is not matched.
  • It finds “play” in “He is playing football.” and checks what follows. The ing is present, so the negative lookahead fails again.

2. Matching filenames without a specific extension

Let’s say you have a list of filenames and you want to find all .txt files that are not followed by a .bak extension.

Python

import re

filenames = "notes.txt, draft.txt.bak, final_report.txt, backup.txt.bak"

# This pattern looks for a .txt extension that is NOT followed by .bak
# The \. is an escaped period.
pattern = r'\.txt(?!.bak)'

matches = re.findall(pattern, filenames)

print(matches)
  • Output:
    • ['.txt', '.txt']

How it Works

  • The pattern looks for the literal string .txt.
  • The negative lookahead (?!.bak) checks if .bak follows.
  • The first . in .txt is found in notes.txt. The negative lookahead checks for .bak and doesn’t find it, so .txt is matched.
  • The first . in draft.txt.bak is found. The negative lookahead finds .bak and fails the match.
  • The . in final_report.txt is found, and since .bak isn’t after it, .txt is matched.
  • The . in backup.txt.bak is found, but the lookahead fails the match.

Similar Posts

  • Method overriding

    Method overriding is a key feature of object-oriented programming (OOP) and inheritance. It allows a subclass (child class) to provide its own specific implementation of a method that is already defined in its superclass (parent class). When a method is called on an object of the child class, the child’s version of the method is…

  • Challenge Summary: Inheritance – Polygon and Triangle Classes

    Challenge Summary: Inheritance – Polygon and Triangle Classes Objective: Create two classes where Triangle inherits from Polygon and calculates area using Heron’s formula. 1. Polygon Class (Base Class) Properties: Methods: __init__(self, num_sides, *sides) python class Polygon: def __init__(self, num_sides, *sides): self.number_of_sides = num_sides self.sides = list(sides) 2. Triangle Class (Derived Class) Inheritance: Methods: __init__(self, *sides) area(self) python import math…

  • Mathematical Functions

    1. abs() Syntax: abs(x)Description: Returns the absolute value (non-negative value) of a number. Examples: python # 1. Basic negative numbers print(abs(-10)) # 10 # 2. Positive numbers remain unchanged print(abs(5.5)) # 5.5 # 3. Floating point negative numbers print(abs(-3.14)) # 3.14 # 4. Zero remains zero print(abs(0)) # 0 # 5. Complex numbers (returns magnitude) print(abs(3 +…

  • pop(), remove(), clear(), and del 

    pop(), remove(), clear(), and del with 5 examples each, including slicing where applicable: 1. pop([index]) Removes and returns the item at the given index. If no index is given, it removes the last item. Examples: 2. remove(x) Removes the first occurrence of the specified value x. Raises ValueError if not found. Examples: 3. clear() Removes all elements from the list, making it empty. Examples: 4. del Statement Deletes elements by index or slice (not a method, but a…

  • How to create Class

    🟥 Rectangle Properties Properties are the nouns that describe a rectangle. They are the characteristics that define a specific rectangle’s dimensions and position. Examples: 📐 Rectangle Methods Methods are the verbs that describe what a rectangle can do or what can be done to it. They are the actions that allow you to calculate information…

  • Python Exception Handling – Basic Examples

    1. Basic try-except Block python # Basic exception handlingtry: num = int(input(“Enter a number: “)) result = 10 / num print(f”Result: {result}”)except: print(“Something went wrong!”) Example 1: Division with Zero Handling python # Handling division by zero error try: num1 = int(input(“Enter first number: “)) num2 = int(input(“Enter second number: “)) result = num1 /…

Leave a Reply

Your email address will not be published. Required fields are marked *