re Programs

# find titles
import re

def extract_book_titles(text):
    """
    Extracts book titles from the given text using a regular expression.
    
    Args:
        text: A string containing the bookshelf data.
        
    Returns:
        A list of book titles.
    """
    # The regular expression to find book titles.
    # It looks for a semicolon, then captures everything up to the next semicolon.
    regex = r';\s*(.*?);'
    
    # Find all non-overlapping matches in the text.
    titles = re.findall(regex, text)
    
    return titles

# The full content of the 'bookshelf.txt' file provided by the user.
file_content = """
Terry-Thomas;Filling the Gap;1959
Harpo Marx;Harpo Speaks;1961
Charlie Chaplin;My Autobiography;1964
Moe Howard;Moe Howard and the Three Stooges, AKA I Came, I Stooged, I Conquered (released posthumously);1974
Sid Caesar;Where Have I Been?;1982
Bill Cosby;Fatherhood;1986
Mel Blanc;That's NOT All, Folks;1988
Gilda Radner;It's Always Something;1989
Richard Pryor;Pryor Convictions;1995
Damon Wayans;Bootleg;1996
Stephen Fry;Moab Is My Washpot;1997
Jenny McCarthy;Jen-X: My Open Book;1997
Chris Rock;Rock This;1997
Sandra Bernhard;Confessions of a Pretty Lady;1998
Danny Bonaduce;Random Acts of Badness;2001
Fran Drescher;Cancer Schmancer;2002
Alan Thicke;How Men Have Babies: a New Father's Survival Guide;2003
Rodney Dangerfield;It's Not Easy Being Me: a Lifetime of No Respect But Plenty of Sex and Drugs;2004
Tom Green;Hollywood Causes Cancer;2004
Rik Mayall;Bigger Than Hitler & Better Than Christ;2005
Tommy Chong;The I Chong: Meditations From the Joint;2006
Alan Thicke;How to Raise Kids Who Won't Hate You;2006
Steve Martin;Born Standing Up;2007
Denis Leary;Why We Suck;2008
Stephen Fry;Ernie: The Autobiography;2009
Frankie Boyle;My Shit Life So Far;2009
Craig Ferguson;American on Purpose;2009
Todd Bridges;Killing Willis;2010
Kevin Smith;Tough Sh*t: Life Advice from a Fat, Lazy Slob Who Still Made Good;2012
Jimmie Walker;Dyn-o-mite!;2012
Andrew Dice Clay;The Filthy Truth;2014
John Cleese;So, Anyway...;2014
Cheech Marin;Cheech Is Not My Real Name...But Don't Call Me Chong!;2017
Eric Idle;Always Look On the Bright Side Of Life;2018
"""

# Call the function and print the results.
book_titles = extract_book_titles(file_content)
for title in book_titles:
    print(title)

The regular expression r';\s*(.*?);' is used to find and extract text that is located between two semicolons.

  • r'' : The r prefix in front of the string denotes a “raw string” in Python. This tells the interpreter to treat backslashes as literal characters instead of escape sequences. This is a common practice when writing regular expressions to avoid issues with characters like \n or \t.
  • ; : This matches a literal semicolon. The pattern begins by looking for a semicolon character.
  • \s* : This part matches any whitespace character (such as spaces, tabs, or newlines) that appears zero or more times. It accounts for potential spaces between the semicolon and the text you want to capture.
  • (.*?) : This is the core part of the expression.
    • () : The parentheses create a capturing group. This means that whatever text is matched within these parentheses will be “captured” or “extracted” as a separate result.
    • .* : The dot . matches any character except a newline. The asterisk * means it will match the preceding character (in this case, any character) zero or more times.
    • ? : The question mark ? makes the * non-greedy (or lazy). Instead of matching as many characters as possible until the end of the line, it matches as few characters as possible until it finds the next part of the pattern, which is the final semicolon.
  • ; : This matches the literal closing semicolon, marking the end of the text to be captured.

In summary, this expression finds a semicolon, then non-greedily captures all characters up to the next semicolon. This is an effective way to extract the middle value from a semicolon-separated string.

Title 1 to 25 chars

f = open(r"E:\bookshelf.txt")
string = f.read()
import re
result = re.findall(r".+?;(.{1,25});.+?", string)
print(result)
['Filling the Gap', 'Harpo Speaks', 'My Autobiography', 'Where Have I Been?', 'Fatherhood', "That's NOT All, Folks", "It's Always Something", 'Pryor Convictions', 'Bootleg', 'Moab Is My Washpot', 'Jen-X: My Open Book', 'Rock This', 'Random Acts of Badness', 'Cancer Schmancer', 'Hollywood Causes Cancer', 'Born Standing Up', 'Why We Suck', 'Ernie: The Autobiography', 'My Shit Life So Far', 'American on Purpose', 'Killing Willis', 'Dyn-o-mite!', 'The Filthy Truth', 'So, Anyway...']

The regular expression r".+?;(.{1,25});.+?" is designed to extract a specific piece of text between two semicolons.

  • r: This indicates a raw string in Python, which prevents backslashes from being interpreted as escape sequences.
  • .+?: This matches one or more (+) of any character (.) in a non-greedy (?) way. It will match the first part of the string up to the first semicolon.
  • ;: This matches the literal semicolon that separates the first part of the string from the text you want to capture.
  • ( and ): These create a capturing group, which saves the text that matches the pattern inside.
  • .{1,25}: This is the core of the extraction. It matches any character (.) that is repeated at least once but no more than 25 times. This sets a length constraint on the captured text.
  • ;: This matches the literal semicolon that follows the captured text.
  • .+?: This again matches one or more (+) of any character (.) in a non-greedy (?) way, continuing the match to the end of the line.

In the context of the provided text file, this regular expression would attempt to capture book titles that are between 1 and 25 characters long. For example, it would match “Bootleg” from

Damon Wayans;Bootleg;1996 but would fail to match “Moe Howard and the Three Stooges, AKA I Came, I Stooged, I Conquered (released posthumously)” because that title is longer than 25 characters.

Find authors relesed in 2000 to 2009

>>> result = re.findall(r"(.+?);.+?;20[0-9][0-9]", string)
>>> result
['Danny Bonaduce', 'Fran Drescher', 'Alan Thicke', 'Rodney Dangerfield', 'Tom Green', 'Rik Mayall', 'Tommy Chong', 'Alan Thicke', 'Steve Martin', 'Denis Leary', 'Stephen Fry', 'Frankie Boyle', 'Craig Ferguson', 'Todd Bridges', 'Kevin Smith', 'Jimmie Walker', 'Andrew Dice Clay', 'John Cleese', 'Cheech Marin', 'Eric Idle']

The regular expression r"(.+?);.+?;20[0-9][0-9]" is designed to extract a specific piece of information from the provided text, which follows the pattern Author;Title;Year.

  • r: This indicates a raw string in Python, which prevents backslashes from being treated as escape characters.
  • ( and ): These parentheses create a capturing group. The text matched by the pattern inside will be saved as a result.
  • .+?: This part matches one or more (+) of any character (.) in a non-greedy (?) way. It will match the author’s name at the beginning of the line up to the first semicolon.
  • ;: This matches the first literal semicolon.
  • .+?: This again matches one or more of any character in a non-greedy way, capturing the book title until the next semicolon.
  • ;: This matches the second literal semicolon.
  • 20[0-9][0-9]: This matches the publication year. It specifically looks for a number that starts with “20” followed by any two digits from 0 to 9.

In the context of the provided text, this regular expression would capture the author’s name for any book published in the 21st century (from the year 2000 to 2099). It would match authors like Danny Bonaduce and Fran Drescher , but would not match authors like Terry-Thomas or Harpo Marx, whose books were published in the 1900s.

Similar Posts

  • What is general-purpose programming language

    A general-purpose programming language is a language designed to be used for a wide variety of tasks and applications, rather than being specialized for a particular domain. They are versatile tools that can be used to build anything from web applications and mobile apps to desktop software, games, and even operating systems. Here’s a breakdown…

  • Decorators in Python

    Decorators in Python A decorator is a function that modifies the behavior of another function without permanently modifying it. Decorators are a powerful tool that use closure functions. Basic Concept A decorator: Simple Example python def simple_decorator(func): def wrapper(): print(“Something is happening before the function is called.”) func() print(“Something is happening after the function is…

  • Static Methods

    The primary use of a static method in Python classes is to define a function that logically belongs to the class but doesn’t need access to the instance’s data (like self) or the class’s state (like cls). They are essentially regular functions that are grouped within a class namespace. Key Characteristics and Use Cases General…

  • Keyword-Only Arguments in Python and mixed

    Keyword-Only Arguments in Python Keyword-only arguments are function parameters that must be passed using their keyword names. They cannot be passed as positional arguments. Syntax Use the * symbol in the function definition to indicate that all parameters after it are keyword-only: python def function_name(param1, param2, *, keyword_only1, keyword_only2): # function body Simple Examples Example 1: Basic Keyword-Only Arguments…

  • Python Exception Handling – Basic Examples

    1. Basic try-except Block python # Basic exception handlingtry: num = int(input(“Enter a number: “)) result = 10 / num print(f”Result: {result}”)except: print(“Something went wrong!”) Example 1: Division with Zero Handling python # Handling division by zero error try: num1 = int(input(“Enter first number: “)) num2 = int(input(“Enter second number: “)) result = num1 /…

  • binary files

    # Read the original image and write to a new file original_file = open(‘image.jpg’, ‘rb’) # ‘rb’ = read binary copy_file = open(‘image_copy.jpg’, ‘wb’) # ‘wb’ = write binary # Read and write in chunks to handle large files while True: chunk = original_file.read(4096) # Read 4KB at a time if not chunk: break copy_file.write(chunk)…

Leave a Reply

Your email address will not be published. Required fields are marked *