positive lookahead assertion

A positive lookahead assertion in Python’s re module is a zero-width assertion that checks if the pattern that follows it is present, without including that pattern in the overall match. It is written as (?=...).

The key is that it’s a “lookahead”—the regex engine looks ahead in the string to see if the pattern inside the parentheses is there. If it is, the match succeeds, but the lookahead part itself is not consumed or returned as part of the match.

Example: Finding Words Followed by a Specific Word

Let’s say you want to find all numbers in a string that are followed by the word “dollars”, but you only want to match the numbers, not the word “dollars” itself.

Python

import re

text = "The price is 500 dollars, but the tax is 50 dollars."

# This pattern looks for one or more digits (\d+), followed by a positive lookahead
# that checks for a space and the word "dollars".
pattern = r'\d+(?= dollars)'

matches = re.findall(pattern, text)

print(matches)
  • Output:
    • ['500', '50']

How it Works: Step-by-Step

  1. \d+: This part of the pattern matches one or more digits (500, 50).
  2. (?= dollars): This is the positive lookahead assertion.
    • The regex engine, after matching \d+, “looks ahead” to see if the next characters are a space followed by the word “dollars”.
    • Since it finds ” dollars” after 500 and 50, the assertion is True.
    • Crucially, the (?= dollars) part itself is not included in the final match. It just verifies a condition.

Without the positive lookahead, using a simple \d+ dollars pattern would match 500 dollars and 50 dollars, which is not what we wanted.

Why use it?

Positive lookahead assertions are useful for:

  • Constraining a match: Ensuring a pattern is present under specific conditions without including that condition in the match result.
  • Overlapping matches: They can be used to find overlapping matches that might not be possible with standard regex patterns.

Let’s say you want to find all street names in an address string that are followed by “Street” or “St.”, but you only want to match the name of the street itself.

Python

import re

address = "123 Main Street, 45 Elm St., and 67 Oak Avenue."

# This pattern finds any word (\w+) that is immediately followed by a space
# and either "Street" or "St."
pattern = r'\w+(?= Street| St\.)'

matches = re.findall(pattern, address)

print(matches)
  • Output:
    • ['Main', 'Elm']

How it Works

  • \w+: This part matches one or more word characters (letters, numbers, or underscore).
  • (?= Street| St\.): This is the positive lookahead assertion.
    • It looks ahead to see if the matched word is followed by a space and then either the literal string “Street” or “St.”.
    • The | acts as an “OR” condition, checking for either option.
    • The \. is an escaped period, as a period is a special character in regex that matches any character.
    • The assertion is successful for “Main” (which is followed by ” Street”) and “Elm” (followed by ” St.”).
  • The street names “Main” and “Elm” are returned in the list matches, but the “Street” or ” St.” parts are not included, because they were only part of the lookahead check.

1. Finding words that aren’t verbs

Imagine you want to find all occurrences of the word “play” that are not followed by “ing” or “ed” (i.e., you want to find the root word “play” when it’s not a verb in a continuous or past tense).

Python

import re

text = "Let's go play a game. We played cards. He is playing football."

# This pattern matches the word "play" only if it's not followed by "ing" or "ed"
pattern = r'play(?!ing|ed)'

matches = re.findall(pattern, text)

print(matches)
  • Output:
    • ['play']

How it Works

  • The pattern first matches the literal word play.
  • The negative lookahead (?!ing|ed) then checks what immediately follows.
  • It finds “play” in “Let’s go play a game.” and checks what follows. Since ” a” is not “ing” or “ed”, the match is successful.
  • It finds “play” in “We played cards.” and checks what follows. The ed is present, so the negative lookahead fails, and “play” is not matched.
  • It finds “play” in “He is playing football.” and checks what follows. The ing is present, so the negative lookahead fails again.

2. Matching filenames without a specific extension

Let’s say you have a list of filenames and you want to find all .txt files that are not followed by a .bak extension.

Python

import re

filenames = "notes.txt, draft.txt.bak, final_report.txt, backup.txt.bak"

# This pattern looks for a .txt extension that is NOT followed by .bak
# The \. is an escaped period.
pattern = r'\.txt(?!.bak)'

matches = re.findall(pattern, filenames)

print(matches)
  • Output:
    • ['.txt', '.txt']

How it Works

  • The pattern looks for the literal string .txt.
  • The negative lookahead (?!.bak) checks if .bak follows.
  • The first . in .txt is found in notes.txt. The negative lookahead checks for .bak and doesn’t find it, so .txt is matched.
  • The first . in draft.txt.bak is found. The negative lookahead finds .bak and fails the match.
  • The . in final_report.txt is found, and since .bak isn’t after it, .txt is matched.
  • The . in backup.txt.bak is found, but the lookahead fails the match.

Similar Posts

  • file properties and methods

    1. file.closed – Is the file door shut? Think of a file like a door. file.closed tells you if the door is open or closed. python # Open the file (open the door) f = open(“test.txt”, “w”) f.write(“Hello!”) print(f.closed) # Output: False (door is open) # Close the file (close the door) f.close() print(f.closed) # Output: True (door is…

  • Python Primitive Data Types & Functions: Explained with Examples

    1. Primitive Data Types Primitive data types are the most basic building blocks in Python. They represent simple, single values and are immutable (cannot be modified after creation). Key Primitive Data Types Type Description Example int Whole numbers (positive/negative) x = 10 float Decimal numbers y = 3.14 bool Boolean (True/False) is_valid = True str…

  • String Alignment and Padding in Python

    String Alignment and Padding in Python In Python, you can align and pad strings to make them visually consistent in output. The main methods used for this are: 1. str.ljust(width, fillchar) Left-aligns the string and fills remaining space with a specified character (default: space). Syntax: python string.ljust(width, fillchar=’ ‘) Example: python text = “Python” print(text.ljust(10)) #…

  • Examples of Python Exceptions

    Comprehensive Examples of Python Exceptions Here are examples of common Python exceptions with simple programs: 1. SyntaxError 2. IndentationError 3. NameError 4. TypeError 5. ValueError 6. IndexError 7. KeyError 8. ZeroDivisionError 9. FileNotFoundError 10. PermissionError 11. ImportError 12. AttributeError 13. RuntimeError 14. RecursionError 15. KeyboardInterrupt 16. MemoryError 17. OverflowError 18. StopIteration 19. AssertionError 20. UnboundLocalError…

  • Password Strength Checker

    python Enhanced Password Strength Checker python import re def is_strong(password): “”” Check if a password is strong based on multiple criteria. Returns (is_valid, message) tuple. “”” # Define criteria and error messages criteria = [ { ‘check’: len(password) >= 8, ‘message’: “at least 8 characters” }, { ‘check’: bool(re.search(r'[A-Z]’, password)), ‘message’: “one uppercase letter (A-Z)”…

Leave a Reply

Your email address will not be published. Required fields are marked *