start(), end(), and span()

Python re start(), end(), and span() Methods Explained

These methods are used with match objects to get the positional information of where a pattern was found in the original string. They work on the result of re.search()re.match(), or re.finditer().

Methods Overview:

  • start(): Returns start position of the match
  • end(): Returns end position of the match
  • span(): Returns (start, end) as a tuple
string = "The Euro STOXX 600 index, which tracks all stock markets across Europe including the FTSE, fell by 11.48% – the worst day since it launched in 1998. The panic selling prompted by the coronavirus has wiped £2.7tn off the value of STOXX 600 shares since its all-time peak on 19 February."

Source: https://www.theguardian.com/



Example:

>>> import re
>>> result = re.search(r".+\s(.+ex).+(\d\d\s.+).", string)
>>> result.group(1)
'index'
>>> result.group(2)
'19 February'
>>> result.start(1)
19
>>> result.start(2)
273
>>> result.end(1)
24
>>> result.end(2)
284
>>> result.span(1)
(19, 24)
>>> result.span(2)
(273, 284)

Example 1: Basic Position Tracking

python

import re

text = "The quick brown fox jumps over the lazy dog."

# Search for a pattern
pattern = r'fox'
match = re.search(pattern, text)

if match:
    print(f"Found '{match.group()}' at:")
    print(f"Start position: {match.start()}")
    print(f"End position: {match.end()}")
    print(f"Span: {match.span()}")
    
    # Show the actual substring using slicing
    start, end = match.span()
    print(f"Text at this position: '{text[start:end]}'")
    
    # Show context around the match
    context_start = max(0, start - 5)
    context_end = min(len(text), end + 5)
    print(f"Context: '...{text[context_start:context_end]}...'")

Output:

text

Found 'fox' at:
Start position: 16
End position: 19
Span: (16, 19)
Text at this position: 'fox'
Context: '... brown fox jump...'

Example 2: Working with Capture Groups

python

import re

text = "John Doe, age 30, email: john.doe@email.com"

# Pattern with multiple capture groups
pattern = r'(\w+)\s+(\w+),\s+age\s+(\d+),\s+email:\s+([\w.]+@[\w.]+)'
match = re.search(pattern, text)

if match:
    print("Full match:")
    print(f"  Text: '{match.group(0)}'")
    print(f"  Span: {match.span()}")
    print(f"  Start: {match.start()}, End: {match.end()}")
    print()
    
    # Get positions for each capture group
    for i in range(1, len(match.groups()) + 1):
        print(f"Group {i} ('{match.group(i)}'):")
        print(f"  Span: {match.span(i)}")
        print(f"  Start: {match.start(i)}, End: {match.end(i)}")
        print(f"  Text at position: '{text[match.start(i):match.end(i)]}'")
        print()

# Example with finditer for multiple matches
text_multiple = "cat, dog, bird, fish"
print("All animal positions:")
for match in re.finditer(r'\b\w+\b', text_multiple):
    print(f"'{match.group()}' at {match.span()} -> '{text_multiple[match.start():match.end()]}'")

Output:

text

Full match:
  Text: 'John Doe, age 30, email: john.doe@email.com'
  Span: (0, 44)
  Start: 0, End: 44

Group 1 ('John'):
  Span: (0, 4)
  Start: 0, End: 4
  Text at position: 'John'

Group 2 ('Doe'):
  Span: (5, 8)
  Start: 5, End: 8
  Text at position: 'Doe'

Group 3 ('30'):
  Span: (14, 16)
  Start: 14, End: 16
  Text at position: '30'

Group 4 ('john.doe@email.com'):
  Span: (25, 44)
  Start: 25, End: 44
  Text at position: 'john.doe@email.com'

All animal positions:
'cat' at (0, 3) -> 'cat'
'dog' at (5, 8) -> 'dog'
'bird' at (10, 14) -> 'bird'
'fish' at (16, 20) -> 'fish'

Example 3: Advanced Text Processing with Positions

python

import re

# Extract and highlight specific information
text = """
Product: Laptop, Price: $999.99, Stock: 15
Product: Smartphone, Price: $599.50, Stock: 30
Product: Tablet, Price: $399.00, Stock: 8
"""

print("Product Information with Positions:")
for match in re.finditer(r'Product:\s*(\w+).*?Price:\s*\$(\d+\.\d{2}).*?Stock:\s*(\d+)', text):
    product, price, stock = match.groups()
    product_span = match.span(1)
    price_span = match.span(2)
    stock_span = match.span(3)
    
    print(f"\nProduct: {product} (position: {product_span})")
    print(f"Price: ${price} (position: {price_span})")
    print(f"Stock: {stock} (position: {stock_span})")
    
    # Show the actual text segments
    print(f"Product text: '{text[product_span[0]:product_span[1]]}'")
    print(f"Price text: '{text[price_span[0]:price_span[1]]}'")
    print(f"Stock text: '{text[stock_span[0]:stock_span[1]]}'")

# Create a highlighted version of the text
highlighted_text = text
for match in reversed(list(re.finditer(r'\$(\d+\.\d{2})', text))):
    start, end = match.span(1)
    price = match.group(1)
    highlighted_text = highlighted_text[:start] + f"**{price}**" + highlighted_text[end:]

print(f"\nHighlighted prices:\n{highlighted_text}")

# Find all numbers and their positions
print("\nAll numbers and their positions:")
for match in re.finditer(r'\d+\.?\d*', text):
    number = match.group()
    start, end = match.span()
    print(f"Number '{number}' at position {match.span()}: '{text[start:end]}'")

Output:

text

Product Information with Positions:

Product: Laptop (position: (12, 18))
Price: $999.99 (position: (27, 33))
Stock: 15 (position: (42, 44))
Product text: 'Laptop'
Price text: '999.99'
Stock text: '15'

Product: Smartphone (position: (56, 66))
Price: $599.50 (position: (75, 81))
Stock: 30 (position: (90, 92))
Product text: 'Smartphone'
Price text: '599.50'
Stock text: '30'

Product: Tablet (position: (104, 109))
Price: $399.00 (position: (118, 124))
Stock: 8 (position: (133, 134))
Product text: 'Tablet'
Price text: '399.00'
Stock text: '8'

Highlighted prices:

Product: Laptop, Price: $**999.99**, Stock: 15
Product: Smartphone, Price: $**599.50**, Stock: 30
Product: Tablet, Price: $**399.00**, Stock: 8

All numbers and their positions:
Number '999.99' at position (27, 33): '999.99'
Number '15' at position (42, 44): '15'
Number '599.50' at position (75, 81): '599.50'
Number '30' at position (90, 92): '30'
Number '399.00' at position (118, 124): '399.00'
Number '8' at position (133, 134): '8'

Key Features:

  1. start()/end() without arguments: Position of entire match
  2. start(n)/end(n): Position of nth capture group
  3. span(): Returns both start and end as a tuple
  4. Zero-based indexing: Positions start at 0
  5. Exclusive endtext[start:end] gets the matched text

Practical Uses:

  • Text highlighting and formatting
  • Error location reporting
  • Text extraction with precise positioning
  • Syntax highlighting
  • Data validation with location information

These methods are essential when you need to know not just WHAT was matched, but WHERE it was matched in the original text!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *