positive lookbehind assertion

A positive lookbehind assertion in Python’s re module is a zero-width assertion that checks if the pattern that precedes it is present, without including that pattern in the overall match. It’s the opposite of a lookahead. It is written as (?<=...).

The key constraint for lookbehind assertions in Python is that the pattern inside the parentheses must be of a fixed length or have a specific number of alternations with a fixed length. For example, (?<=abc) is valid, but (?<=a|b) is not because a and b have different lengths. However, (?<=a|b|c) is valid because all alternatives have a fixed length of one character.

Example: Finding Words After a Specific Word

Let’s say you want to find all numbers in a string that are preceded by the word “cost:”, but you only want to match the numbers, not the word “cost:”.

Python

import re

text = "The total cost: 50. The final price: 20."

# This pattern looks for one or more digits (\d+), preceded by a positive lookbehind
# that checks for the literal string "cost: ".
pattern = r'(?<=cost: )\d+'

matches = re.findall(pattern, text)

print(matches)
  • Output:
    • ['50']

How it Works: Step-by-Step

  1. \d+: This part of the pattern looks for one or more digits (50, 20).
  2. (?<=cost: ): This is the positive lookbehind assertion.
    • The regex engine, after matching 50, “looks behind” to see if the preceding characters are “cost: “.
    • Since it finds “cost: ” before 50, the assertion is True.
    • The lookbehind part itself is not included in the final match. It just verifies a condition.

Without the positive lookbehind, a simple cost: \d+ pattern would match cost: 50, which is not what was intended.

Why use it?

Positive lookbehind assertions are useful for:

  • Targeted Matching: Finding a specific pattern only if it’s in a certain context.
  • Excluding Preceding Characters: Matching a string without including the characters that come before it.

. Extracting currency values

Let’s say you have a string with different currencies and you only want to extract the dollar amounts.

Python

import re

text = "The cost is $50 and €10. The total is $200 and £5."

# This pattern matches any number (\d+) that is preceded by a dollar sign ($)
# Note that we escape the dollar sign with a backslash since it's a special character in regex.
pattern = r'(?<=\$)\d+'

matches = re.findall(pattern, text)

print(matches)
  • Output:
    • ['50', '200']

How it Works

  • The pattern \d+ looks for one or more digits.
  • The lookbehind (?<=\$) checks to make sure the dollar sign $ immediately precedes the digits.
  • The digits 50 and 200 meet this condition and are returned in the list. The $ symbol is not part of the match itself. The numbers 10 and 5 are ignored because they are not preceded by a dollar sign.

2. Getting names after a title

Imagine you have a list of people’s names with titles, and you only want to extract the names that are preceded by the title “Mr.”

Python

import re

names = "Mr. John Smith, Ms. Jane Doe, Mr. Peter Jones"

# This pattern looks for a word (\w+) that is preceded by "Mr. "
# We are specific here with the space after "Mr." to avoid matching other words.
pattern = r'(?<=Mr\. )\w+'

matches = re.findall(pattern, names)

print(matches)
  • Output:
    • ['John', 'Peter']

How it Works

  • The pattern \w+ looks for one or more word characters (the names themselves).
  • The lookbehind (?<=Mr\. ) checks that the preceding text is the literal string “Mr. “. Note the backslash to escape the period . which is a special regex character.
  • The lookbehind finds “Mr. ” before “John” and “Peter”, but not before “Jane”, so only “John” and “Peter” are returned as matches.

Similar Posts

  • Why Python is So Popular: Key Reasons Behind Its Global Fame

    Python’s fame and widespread adoption across various sectors can be attributed to its unique combination of simplicity, versatility, and a robust ecosystem. Here are the key reasons why Python is so popular and widely used in different industries: 1. Easy to Learn and Use 2. Versatility Python supports multiple programming paradigms, including: This versatility allows…

  • Python Nested Lists

    Python Nested Lists: Explanation & Examples A nested list is a list that contains other lists as its elements. They are commonly used to represent matrices, tables, or hierarchical data structures. 1. Basic Nested List Creation python # A simple 2D list (matrix) matrix = [ [1, 2, 3], [4, 5, 6], [7, 8, 9]…

  • Type Conversion Functions

    Type Conversion Functions in Python 🔄 Type conversion (or type casting) transforms data from one type to another. Python provides built-in functions for these conversions. Here’s a comprehensive guide with examples: 1. int(x) 🔢 Converts x to an integer. Python 2. float(x) afloat Converts x to a floating-point number. Python 3. str(x) 💬 Converts x…

  • Nested for loops, break, continue, and pass in for loops

    break, continue, and pass in for loops with simple examples. These statements allow you to control the flow of execution within a loop. 1. break Statement The break statement is used to terminate the loop entirely. When break is encountered, the loop immediately stops, and execution continues with the statement immediately following the loop. Example:…

  • re.split()

    Python re.split() Method Explained The re.split() method splits a string by the occurrences of a pattern. It’s like the built-in str.split() but much more powerful because you can use regex patterns. Syntax python re.split(pattern, string, maxsplit=0, flags=0) Example 1: Splitting by Multiple Delimiters python import retext1=”The re.split() method splits a string by the occurrences of a pattern. It’s like…

  • Currency Converter

    Challenge: Currency Converter Class with Accessors & Mutators Objective: Create a CurrencyConverter class that converts an amount from a foreign currency to your local currency, using accessor and mutator methods. 1. Class Properties (Instance Variables) 2. Class Methods 3. Task Instructions

Leave a Reply

Your email address will not be published. Required fields are marked *