positive lookbehind assertion

A positive lookbehind assertion in Python’s re module is a zero-width assertion that checks if the pattern that precedes it is present, without including that pattern in the overall match. It’s the opposite of a lookahead. It is written as (?<=...).

The key constraint for lookbehind assertions in Python is that the pattern inside the parentheses must be of a fixed length or have a specific number of alternations with a fixed length. For example, (?<=abc) is valid, but (?<=a|b) is not because a and b have different lengths. However, (?<=a|b|c) is valid because all alternatives have a fixed length of one character.

Example: Finding Words After a Specific Word

Let’s say you want to find all numbers in a string that are preceded by the word “cost:”, but you only want to match the numbers, not the word “cost:”.

Python

import re

text = "The total cost: 50. The final price: 20."

# This pattern looks for one or more digits (\d+), preceded by a positive lookbehind
# that checks for the literal string "cost: ".
pattern = r'(?<=cost: )\d+'

matches = re.findall(pattern, text)

print(matches)
  • Output:
    • ['50']

How it Works: Step-by-Step

  1. \d+: This part of the pattern looks for one or more digits (50, 20).
  2. (?<=cost: ): This is the positive lookbehind assertion.
    • The regex engine, after matching 50, “looks behind” to see if the preceding characters are “cost: “.
    • Since it finds “cost: ” before 50, the assertion is True.
    • The lookbehind part itself is not included in the final match. It just verifies a condition.

Without the positive lookbehind, a simple cost: \d+ pattern would match cost: 50, which is not what was intended.

Why use it?

Positive lookbehind assertions are useful for:

  • Targeted Matching: Finding a specific pattern only if it’s in a certain context.
  • Excluding Preceding Characters: Matching a string without including the characters that come before it.

. Extracting currency values

Let’s say you have a string with different currencies and you only want to extract the dollar amounts.

Python

import re

text = "The cost is $50 and €10. The total is $200 and £5."

# This pattern matches any number (\d+) that is preceded by a dollar sign ($)
# Note that we escape the dollar sign with a backslash since it's a special character in regex.
pattern = r'(?<=\$)\d+'

matches = re.findall(pattern, text)

print(matches)
  • Output:
    • ['50', '200']

How it Works

  • The pattern \d+ looks for one or more digits.
  • The lookbehind (?<=\$) checks to make sure the dollar sign $ immediately precedes the digits.
  • The digits 50 and 200 meet this condition and are returned in the list. The $ symbol is not part of the match itself. The numbers 10 and 5 are ignored because they are not preceded by a dollar sign.

2. Getting names after a title

Imagine you have a list of people’s names with titles, and you only want to extract the names that are preceded by the title “Mr.”

Python

import re

names = "Mr. John Smith, Ms. Jane Doe, Mr. Peter Jones"

# This pattern looks for a word (\w+) that is preceded by "Mr. "
# We are specific here with the space after "Mr." to avoid matching other words.
pattern = r'(?<=Mr\. )\w+'

matches = re.findall(pattern, names)

print(matches)
  • Output:
    • ['John', 'Peter']

How it Works

  • The pattern \w+ looks for one or more word characters (the names themselves).
  • The lookbehind (?<=Mr\. ) checks that the preceding text is the literal string “Mr. “. Note the backslash to escape the period . which is a special regex character.
  • The lookbehind finds “Mr. ” before “John” and “Peter”, but not before “Jane”, so only “John” and “Peter” are returned as matches.

Similar Posts

  • Dictionaries

    Python Dictionaries: Explanation with Examples A dictionary in Python is an unordered collection of items that stores data in key-value pairs. Dictionaries are: Creating a Dictionary python # Empty dictionary my_dict = {} # Dictionary with initial values student = { “name”: “John Doe”, “age”: 21, “courses”: [“Math”, “Physics”, “Chemistry”], “GPA”: 3.7 } Accessing Dictionary Elements…

  • Programs

    Weekly Wages Removing Duplicates even ,odd Palindrome  Rotate list Shuffle a List Python random Module Explained with Examples The random module in Python provides functions for generating pseudo-random numbers and performing random operations. Here’s a detailed explanation with three examples for each important method: Basic Random Number Generation 1. random.random() Returns a random float between 0.0 and 1.0 python import…

  • re.split()

    Python re.split() Method Explained The re.split() method splits a string by the occurrences of a pattern. It’s like the built-in str.split() but much more powerful because you can use regex patterns. Syntax python re.split(pattern, string, maxsplit=0, flags=0) Example 1: Splitting by Multiple Delimiters python import retext1=”The re.split() method splits a string by the occurrences of a pattern. It’s like…

  • Exception handling & Types of Errors in Python Programming

    Exception handling in Python is a process of responding to and managing errors that occur during a program’s execution, allowing the program to continue running without crashing. These errors, known as exceptions, disrupt the normal flow of the program and can be caught and dealt with using a try…except block. How It Works The core…

  • Mutable vs. Immutable Objects in Python 🔄🔒

    Mutable vs. Immutable Objects in Python 🔄🔒 In Python, mutability determines whether an object’s value can be changed after creation. This is crucial for understanding how variables behave. 🤔 Immutable Objects 🔒 Example 1: Strings (Immutable) 💬 Python Example 2: Tuples (Immutable) 📦 Python Mutable Objects 📝 Example 1: Lists (Mutable) 📋 Python Example 2:…

  • Python Functions

    A function is a block of organized, reusable code that is used to perform a single, related action. Functions provide better modularity for your application and a high degree of code reusing. Defining a Function In Python, a function is defined using the def keyword, followed by the function name, a set of parentheses (),…

Leave a Reply

Your email address will not be published. Required fields are marked *