positive lookbehind assertion

A positive lookbehind assertion in Python’s re module is a zero-width assertion that checks if the pattern that precedes it is present, without including that pattern in the overall match. It’s the opposite of a lookahead. It is written as (?<=...).

The key constraint for lookbehind assertions in Python is that the pattern inside the parentheses must be of a fixed length or have a specific number of alternations with a fixed length. For example, (?<=abc) is valid, but (?<=a|b) is not because a and b have different lengths. However, (?<=a|b|c) is valid because all alternatives have a fixed length of one character.

Example: Finding Words After a Specific Word

Let’s say you want to find all numbers in a string that are preceded by the word “cost:”, but you only want to match the numbers, not the word “cost:”.

Python

import re

text = "The total cost: 50. The final price: 20."

# This pattern looks for one or more digits (\d+), preceded by a positive lookbehind
# that checks for the literal string "cost: ".
pattern = r'(?<=cost: )\d+'

matches = re.findall(pattern, text)

print(matches)
  • Output:
    • ['50']

How it Works: Step-by-Step

  1. \d+: This part of the pattern looks for one or more digits (50, 20).
  2. (?<=cost: ): This is the positive lookbehind assertion.
    • The regex engine, after matching 50, “looks behind” to see if the preceding characters are “cost: “.
    • Since it finds “cost: ” before 50, the assertion is True.
    • The lookbehind part itself is not included in the final match. It just verifies a condition.

Without the positive lookbehind, a simple cost: \d+ pattern would match cost: 50, which is not what was intended.

Why use it?

Positive lookbehind assertions are useful for:

  • Targeted Matching: Finding a specific pattern only if it’s in a certain context.
  • Excluding Preceding Characters: Matching a string without including the characters that come before it.

. Extracting currency values

Let’s say you have a string with different currencies and you only want to extract the dollar amounts.

Python

import re

text = "The cost is $50 and €10. The total is $200 and £5."

# This pattern matches any number (\d+) that is preceded by a dollar sign ($)
# Note that we escape the dollar sign with a backslash since it's a special character in regex.
pattern = r'(?<=\$)\d+'

matches = re.findall(pattern, text)

print(matches)
  • Output:
    • ['50', '200']

How it Works

  • The pattern \d+ looks for one or more digits.
  • The lookbehind (?<=\$) checks to make sure the dollar sign $ immediately precedes the digits.
  • The digits 50 and 200 meet this condition and are returned in the list. The $ symbol is not part of the match itself. The numbers 10 and 5 are ignored because they are not preceded by a dollar sign.

2. Getting names after a title

Imagine you have a list of people’s names with titles, and you only want to extract the names that are preceded by the title “Mr.”

Python

import re

names = "Mr. John Smith, Ms. Jane Doe, Mr. Peter Jones"

# This pattern looks for a word (\w+) that is preceded by "Mr. "
# We are specific here with the space after "Mr." to avoid matching other words.
pattern = r'(?<=Mr\. )\w+'

matches = re.findall(pattern, names)

print(matches)
  • Output:
    • ['John', 'Peter']

How it Works

  • The pattern \w+ looks for one or more word characters (the names themselves).
  • The lookbehind (?<=Mr\. ) checks that the preceding text is the literal string “Mr. “. Note the backslash to escape the period . which is a special regex character.
  • The lookbehind finds “Mr. ” before “John” and “Peter”, but not before “Jane”, so only “John” and “Peter” are returned as matches.

Similar Posts

  • Indexing and Slicing for Writing (Modifying) Lists in Python

    Indexing and Slicing for Writing (Modifying) Lists in Python Indexing and slicing aren’t just for reading lists – they’re powerful tools for modifying lists as well. Let’s explore how to use them to change list contents with detailed examples. 1. Modifying Single Elements (Indexing for Writing) You can directly assign new values to specific indices. Example 1:…

  • circle,Rational Number

    1. What is a Rational Number? A rational number is any number that can be expressed as a fraction where both the numerator and the denominator are integers (whole numbers), and the denominator is not zero. The key idea is ratio. The word “rational” comes from the word “ratio.” General Form:a / b Examples: Non-Examples: 2. Formulas for Addition and Subtraction…

  • Bank Account Class with Minimum Balance

    Challenge Summary: Bank Account Class with Minimum Balance Objective: Create a BankAccount class that automatically assigns account numbers and enforces a minimum balance rule. 1. Custom Exception Class python class MinimumBalanceError(Exception): “””Custom exception for minimum balance violation””” pass 2. BankAccount Class Requirements Properties: Methods: __init__(self, name, initial_balance) deposit(self, amount) withdraw(self, amount) show_details(self) 3. Key Rules: 4. Testing…

  • Formatting Date and Time in Python

    Formatting Date and Time in Python Python provides powerful formatting options for dates and times using the strftime() method and parsing using strptime() method. 1. Basic Formatting with strftime() Date Formatting python from datetime import date, datetime # Current date today = date.today() print(“Date Formatting Examples:”) print(f”Default: {today}”) print(f”YYYY-MM-DD: {today.strftime(‘%Y-%m-%d’)}”) print(f”MM/DD/YYYY: {today.strftime(‘%m/%d/%Y’)}”) print(f”DD-MM-YYYY: {today.strftime(‘%d-%m-%Y’)}”) print(f”Full month: {today.strftime(‘%B %d, %Y’)}”) print(f”Abbr…

  • Functions as Objects

    Functions as Objects and First-Class Functions in Python In Python, functions are first-class objects, which means they can be: 1. Functions as Objects In Python, everything is an object, including functions. When you define a function, you’re creating a function object. python def greet(name): return f”Hello, {name}!” # The function is an object with type ‘function’…

Leave a Reply

Your email address will not be published. Required fields are marked *