positive lookbehind assertion

A positive lookbehind assertion in Python’s re module is a zero-width assertion that checks if the pattern that precedes it is present, without including that pattern in the overall match. It’s the opposite of a lookahead. It is written as (?<=...).

The key constraint for lookbehind assertions in Python is that the pattern inside the parentheses must be of a fixed length or have a specific number of alternations with a fixed length. For example, (?<=abc) is valid, but (?<=a|b) is not because a and b have different lengths. However, (?<=a|b|c) is valid because all alternatives have a fixed length of one character.

Example: Finding Words After a Specific Word

Let’s say you want to find all numbers in a string that are preceded by the word “cost:”, but you only want to match the numbers, not the word “cost:”.

Python

import re

text = "The total cost: 50. The final price: 20."

# This pattern looks for one or more digits (\d+), preceded by a positive lookbehind
# that checks for the literal string "cost: ".
pattern = r'(?<=cost: )\d+'

matches = re.findall(pattern, text)

print(matches)
  • Output:
    • ['50']

How it Works: Step-by-Step

  1. \d+: This part of the pattern looks for one or more digits (50, 20).
  2. (?<=cost: ): This is the positive lookbehind assertion.
    • The regex engine, after matching 50, “looks behind” to see if the preceding characters are “cost: “.
    • Since it finds “cost: ” before 50, the assertion is True.
    • The lookbehind part itself is not included in the final match. It just verifies a condition.

Without the positive lookbehind, a simple cost: \d+ pattern would match cost: 50, which is not what was intended.

Why use it?

Positive lookbehind assertions are useful for:

  • Targeted Matching: Finding a specific pattern only if it’s in a certain context.
  • Excluding Preceding Characters: Matching a string without including the characters that come before it.

. Extracting currency values

Let’s say you have a string with different currencies and you only want to extract the dollar amounts.

Python

import re

text = "The cost is $50 and €10. The total is $200 and £5."

# This pattern matches any number (\d+) that is preceded by a dollar sign ($)
# Note that we escape the dollar sign with a backslash since it's a special character in regex.
pattern = r'(?<=\$)\d+'

matches = re.findall(pattern, text)

print(matches)
  • Output:
    • ['50', '200']

How it Works

  • The pattern \d+ looks for one or more digits.
  • The lookbehind (?<=\$) checks to make sure the dollar sign $ immediately precedes the digits.
  • The digits 50 and 200 meet this condition and are returned in the list. The $ symbol is not part of the match itself. The numbers 10 and 5 are ignored because they are not preceded by a dollar sign.

2. Getting names after a title

Imagine you have a list of people’s names with titles, and you only want to extract the names that are preceded by the title “Mr.”

Python

import re

names = "Mr. John Smith, Ms. Jane Doe, Mr. Peter Jones"

# This pattern looks for a word (\w+) that is preceded by "Mr. "
# We are specific here with the space after "Mr." to avoid matching other words.
pattern = r'(?<=Mr\. )\w+'

matches = re.findall(pattern, names)

print(matches)
  • Output:
    • ['John', 'Peter']

How it Works

  • The pattern \w+ looks for one or more word characters (the names themselves).
  • The lookbehind (?<=Mr\. ) checks that the preceding text is the literal string “Mr. “. Note the backslash to escape the period . which is a special regex character.
  • The lookbehind finds “Mr. ” before “John” and “Peter”, but not before “Jane”, so only “John” and “Peter” are returned as matches.

Similar Posts

  • Python Variables: A Complete Guide with Interview Q&A

    Here’s a detailed set of notes on Python variables that you can use to explain the concept to your students. These notes are structured to make it easy for beginners to understand. Python Variables: Notes for Students 1. What is a Variable? 2. Rules for Naming Variables Python has specific rules for naming variables: 3….

  • Challenge Summary: Inheritance – Polygon and Triangle Classes

    Challenge Summary: Inheritance – Polygon and Triangle Classes Objective: Create two classes where Triangle inherits from Polygon and calculates area using Heron’s formula. 1. Polygon Class (Base Class) Properties: Methods: __init__(self, num_sides, *sides) python class Polygon: def __init__(self, num_sides, *sides): self.number_of_sides = num_sides self.sides = list(sides) 2. Triangle Class (Derived Class) Inheritance: Methods: __init__(self, *sides) area(self) python import math…

  • Class 10 String Comparison ,Bitwise Operators,Chaining Comparisons

    String Comparison with Relational Operators in Python 💬⚖️ In Python, you can compare strings using relational operators (<, <=, >, >=, ==, !=). These comparisons are based on lexicographical (dictionary) order, which uses the Unicode code points of the characters. 📖 How String Comparison Works 🤔 Examples 💡 Python Important Notes 📌 String comparison is…

  • Positional-Only Arguments in Python

    Positional-Only Arguments in Python Positional-only arguments are function parameters that must be passed by position (order) and cannot be passed by keyword name. Syntax Use the / symbol in the function definition to indicate that all parameters before it are positional-only: python def function_name(param1, param2, /, param3, param4): # function body Simple Examples Example 1: Basic Positional-Only Arguments python def calculate_area(length,…

  • Unlock the Power of Python: What is Python, History, Uses, & 7 Amazing Applications

    What is Python and History of python, different sectors python used Python is one of the most popular programming languages worldwide, known for its versatility and beginner-friendliness . From web development to data science and machine learning, Python has become an indispensable tool for developers and tech professionals across various industries . This blog post…

Leave a Reply

Your email address will not be published. Required fields are marked *