re.split()
Python re.split() Method Explained
The re.split() method splits a string by the occurrences of a pattern. It’s like the built-in str.split() but much more powerful because you can use regex patterns.
Syntax
python
re.split(pattern, string, maxsplit=0, flags=0)
Example 1: Splitting by Multiple Delimiters
python
import re
text1="The re.split() method splits a string by the occurrences of a pattern. It's like the built-in str.split() but much more powerful because you can use regex patterns."
print(text1.split(" "))
text = "apple,banana;orange grape|melon.peach"
# Split by any of these delimiters: , ; | . space
result = re.split(r'[,;|.\s]+', text)
"""
Explaining re.split(r'[,;|.\s]+', text)
This regex pattern is used to split a string by multiple different delimiters at once. Let me break it down:
The Pattern: r'[,;|.\s]+'
1. [ ] - Character Class
Matches any one of the characters inside the brackets
Think of it as an "OR" operation between characters
2. [,;|.\s] - What's inside the brackets:
, - comma
; - semicolon
| - pipe symbol
. - period/dot
\s - any whitespace character (spaces, tabs, newlines)
3. + - One or More
Matches one or more consecutive occurrences of any of these characters
This means multiple delimiters in a row are treated as a single split point
"""
print("Split by multiple delimiters:")
print(result)
# Remove empty strings by filtering
result_clean = [word for word in re.split(r'[,;|.\s]+', text) if word]
print("\nClean result (no empty strings):")
print(result_clean)
Output:
text
Split by multiple delimiters: ['apple', 'banana', 'orange', 'grape', 'melon', 'peach', ''] Clean result (no empty strings): ['apple', 'banana', 'orange', 'grape', 'melon', 'peach']
Example 2: Splitting and Keeping Delimiters
python
import re
text = "Hello! How are you? I'm fine. Thanks!"
# Split by punctuation but keep the delimiters
result = re.split(r'([!?.])', text)
print("Split with delimiters preserved:")
print(result)
"""
Explaining re.split(r'([!?.])', text)
This regex pattern splits a string by punctuation marks but keeps the delimiters in the result. The parentheses () are the key!
The Pattern: r'([!?.])'
1. [!?.] - Character Class
Matches any one of these punctuation marks:
! - exclamation mark
? - question mark
. - period
2. ( ) - Capture Group (THE IMPORTANT PART)
The parentheses capture whatever is matched inside them
When used with re.split(), captured delimiters are included in the result
3. No + quantifier
Matches exactly one punctuation character at a time
Each punctuation is treated as a separate delimiter
What It Does:
Splits the text at each !, ?, or . but keeps these punctuation marks as separate items in the result list.
Example:
python
import re
text = "Hello! How are you? I'm fine. Thanks!"
result = re.split(r'([!?.])', text)
print(result)
Output:
text
['Hello', '!', ' How are you', '?', " I'm fine", '.', ' Thanks', '!', '']
Step-by-step what happens:
Original text: "Hello! How are you? I'm fine. Thanks!"
Split at ! → ['Hello', '!', ' How are you? I'm fine. Thanks!']
Split at ? → ['Hello', '!', ' How are you', '?', " I'm fine. Thanks!"]
Split at . → ['Hello', '!', ' How are you', '?', " I'm fine", '.', ' Thanks!']
"""
# Clean up the result (remove empty strings and strip whitespace)
clean_result = [part.strip() for part in result if part.strip()]
print("\nCleaned up:")
print(clean_result)
# Another example: Split math expression but keep operators
math_expr = "10+20-30*40/50"
math_parts = re.split(r'([+*/-])', math_expr)
print(f"\nMath expression parts: {math_parts}")
Output:
text
Split with delimiters preserved: ['Hello', '!', ' How are you', '?', " I'm fine", '.', ' Thanks', '!', ''] Cleaned up: ['Hello', '!', 'How are you', '?', "I'm fine", '.', 'Thanks', '!'] Math expression parts: ['10', '+', '20', '-', '30', '*', '40', '/', '50']
Example 3: Advanced Splitting with Capture Groups
python
import re
# Example 1: Split by dates but keep date information
text = "Meeting on 2023-12-25 at 10:00, then lunch at 2023-12-25 13:00"
# Split by dates but capture the date parts
result = re.split(r'(\d{4}-\d{2}-\d{2})', text)
print("Split with date capture:")
print(result)
# Example 2: Parse log file entries
log_data = "ERROR:2023-12-25:File not found|WARNING:2023-12-25:Low memory|INFO:2023-12-25:Process completed"
# Split by log level and capture the level
log_entries = re.split(r'(ERROR|WARNING|INFO):', log_data)
print(f"\nLog entries: {log_entries}")
# Example 3: Split by varying whitespace
text_with_spaces = "Hello World\tThis is\na test"
result = re.split(r'\s+', text_with_spaces)
print(f"\nSplit by whitespace: {result}")
Output:
text
Split with date capture: ['Meeting on ', '2023-12-25', ' at 10:00, then lunch at ', '2023-12-25', ' 13:00'] Log entries: ['', 'ERROR', '2023-12-25:File not found|', 'WARNING', '2023-12-25:Low memory|', 'INFO', '2023-12-25:Process completed'] Split by whitespace: ['Hello', 'World', 'This', 'is', 'a', 'test']
Key Features of re.split():
- Regex patterns: Use powerful regex instead of fixed strings
- Capture groups: If pattern has parentheses
(), delimiters are included in result - maxsplit parameter: Limit number of splits
- Multiple delimiters: Split by complex patterns
Comparison with str.split():
python
text = "apple,banana;orange"
# Regular split (limited)
print(text.split(',')) # Only splits by comma: ['apple', 'banana;orange']
# Regex split (powerful)
print(re.split(r'[,;]', text)) # Splits by comma OR semicolon: ['apple', 'banana', 'orange']
Practical Use Cases:
- Parsing CSV files with irregular delimiters
- Processing log files
- Tokenizing text for natural language processing
- Splitting strings while preserving important delimiters