Sets in Python
Sets in Python
A set in Python is an unordered collection of unique elements. Sets are mutable, meaning you can add or remove items, but the elements themselves must be immutable (like numbers, strings, or tuples).
Key Characteristics of Sets:
- Unordered: Elements don’t have a defined order
- Unique: No duplicate elements allowed
- Mutable: Can add/remove elements
- Can contain only immutable types: Elements must be hashable
Different Ways to Create Sets in Python
Here are various methods to create sets in Python, each with 3 examples:
1. Using Curly Braces {} (Most Common)
python
# Set of integers
numbers = {1, 2, 3, 4, 5}
# Set of strings
fruits = {'apple', 'banana', 'orange'}
# Mixed data types
mixed_set = {1, 'hello', 3.14, (1, 2)}
2. Using set() Constructor
python
# From a list
from_list = set([1, 2, 2, 3, 4]) # Duplicates removed: {1, 2, 3, 4}
# From a string
from_string = set('hello') # {'h', 'e', 'l', 'o'}
# From a tuple
from_tuple = set((1, 2, 3, 2, 1)) # {1, 2, 3}
3. Using Set Comprehensions
python
# Squares of numbers
squares = {x**2 for x in range(5)} # {0, 1, 4, 9, 16}
# First letters of words
letters = {word[0] for word in ['apple', 'banana', 'cherry']} # {'a', 'b', 'c'}
# Even numbers filter
evens = {x for x in [1, 2, 3, 4, 5, 6] if x % 2 == 0} # {2, 4, 6}
4. Creating Empty Sets
python
# Correct way to create empty set
empty_set1 = set() # set()
# These DON'T work for empty sets:
empty_set2 = {} # Actually creates a dictionary
empty_set3 = set({}) # Works but unnecessarily complex
5. From Other Iterables
python
# From a dictionary (gets keys)
from_dict = set({'a': 1, 'b': 2}) # {'a', 'b'}
# From a range
from_range = set(range(3)) # {0, 1, 2}
# From a generator expression
from_gen = set(x for x in [1, 2, 1, 3]) # {1, 2, 3}
6. Special Cases
python
# Set with one element
single = {'hello'} # {'hello'}
# Set containing a tuple
tuple_set = { (1, 2), (3, 4) } # {(1, 2), (3, 4)}
# Set from multiple iterables
combined = set().union([1, 2], (3,), 'ab') # {1, 2, 3, 'a', 'b'}
Remember that sets automatically remove duplicates and are unordered collections. The set() constructor is particularly useful when you need to create a set from an existing iterable.
How Set Values are Stored in Memory Using Hash Functions
Sets in Python use a hash table implementation to store their elements efficiently. Here’s how it works, with examples demonstrating the hash function behavior.
Hash Table Basics
When you add an element to a set:
- Python calls the element’s
__hash__()method to get a hash value - This hash determines the “bucket” where the element will be stored
- If two elements have the same hash (collision), Python checks equality with
__eq__()
Example 1: Storing Integers
python
numbers = {10, 20, 30}
# Get hash values
print(hash(10)) # Output: 10
print(hash(20)) # Output: 20
print(hash(30)) # Output: 30
# Memory layout (simplified):
# Bucket 10: contains 10
# Bucket 20: contains 20
# Bucket 30: contains 30
For integers, the hash value is typically the integer itself (except -1 which is special).
Example 2: Storing Strings
python
words = {"apple", "banana", "cherry"}
# Get hash values
print(hash("apple")) # Output: e.g., -4177190166883738454
print(hash("banana")) # Output: e.g., 4132319996380388884
print(hash("cherry")) # Output: e.g., 3384945018269088770
# Memory layout (conceptual):
# Bucket (hash % table_size): contains the string
Strings have more complex hash functions that produce large integers.
How Set Elements are Stored in a Hash Table
Sets in Python are implemented using hash tables, which provide efficient O(1) average time complexity for insertion, deletion, and membership operations. Here’s a detailed explanation with examples:
Hash Table Structure for Sets
A Python set’s hash table consists of:
- An array of buckets (slots) that store elements
- Each bucket can be in one of three states:
- Empty (no element ever occupied it)
- Active (currently contains an element)
- Dummy (element was removed but collision chain must be preserved)
How Elements are Stored
- Hash Calculation: When adding an element, Python first calculates its hash value using
hash() - Index Determination: The hash value is converted to an array index using
hash_value % table_size - Collision Handling: If the bucket is occupied, Python uses open addressing (typically linear probing) to find the next available slot
Example 1: Simple Integer Storage
python
numbers = {10, 20, 30}
# Simplified hash table representation:
# Index | Value
# ------|------
# 0 | None
# 1 | 10
# 2 | 20
# 3 | 30
# 4 | None
Here, with table size=5:
- hash(10) % 5 = 0 → stored at index 1 (if 0 was occupied)
- hash(20) % 5 = 0 → linear probe to next available (index 2)
- hash(30) % 5 = 0 → linear probe to next available (index 3)
Example 2: String Storage with Collisions
python
fruits = {'apple', 'banana', 'avocado'}
# Hash values might be:
# hash('apple') → 12345 → index 0 (assuming table size=3: 12345%3=0)
# hash('banana') → 54321 → index 0 → collision → probe to index 1
# hash('avocado') → 98765 → index 2
Example 3: Hash Table Resizing
When the set grows beyond its load factor (typically ~60% full), Python resizes the table:
python
s = {1, 2, 3, 4} # Initial table size might be 8
# When adding 5th element, Python may resize to table size 16
s.add(5)
# All elements get rehashed and stored in new positions
Key Implementation Details
- Open Addressing: Python uses probing (usually linear) to handle collisions
- Table Sizes: Always powers of 2 for efficient modulo operations
- Load Factor: When ~2/3 full, the table resizes (typically doubles)
- Deleted Markers: Removed elements leave dummy markers to maintain probe chains
Hash Table Implementation for Your Set: {22, 44, 66, 3, 6, 88, 90, 45, 60}
Let me explain how Python would store your set {22, 44, 66, 3, 6, 88, 90, 45, 60} in memory using a hash table.
Initial Setup
- Initial Table Size: Python starts with a small hash table (typically size 8 for small sets)
- Load Factor: When ~60% full, the table resizes (your set with 9 elements would need at least size 16)
- Hash Function: For integers,
hash(x) = x(except -1 which is special)
Step-by-Step Insertion Process
Phase 1: Initial Insertions (Table Size = 8)
Let’s see what happens as we add elements (note: actual implementation may vary):
- Insert 22:
- hash(22) = 22
- index = 22 % 8 = 6
- Table: [None, None, None, None, None, None, 22, None]
- Insert 44:
- hash(44) = 44
- index = 44 % 8 = 4
- Table: [None, None, None, None, 44, None, 22, None]
- Insert 66:
- hash(66) = 66
- index = 66 % 8 = 2
- Table: [None, None, 66, None, 44, None, 22, None]
- Insert 3:
- hash(3) = 3
- index = 3 % 8 = 3
- Table: [None, None, 66, 3, 44, None, 22, None]
At this point, the table is 4/8 = 50% full (still below load factor)
- Insert 6:
- hash(6) = 6
- index = 6 % 8 = 6 (collision with 22)
- Linear probing: try index 7
- Table: [None, None, 66, 3, 44, None, 22, 6]
Now 5/8 = 62.5% full → triggers resize!
Phase 2: Resize to Table Size = 16
Python creates a new table (size 16) and rehashes all elements:
- Rehash 22: 22 % 16 = 6 → [None]*6 + [22] + [None]*9
- Rehash 44: 44 % 16 = 12 → index 12
- Rehash 66: 66 % 16 = 2 → index 2
- Rehash 3: 3 % 16 = 3 → index 3
- Rehash 6: 6 % 16 = 6 → index 6 (now no collision with 22)
Current table (size 16):
[None, None, 66, 3, None, None, 6, None, None, None, None, None, 44, None, None, 22]
Phase 3: Continue Insertions
- Insert 88:
- hash(88) = 88
- index = 88 % 16 = 8
- Table: [.., 88, ..] at index 8
- Insert 90:
- hash(90) = 90
- index = 90 % 16 = 10
- Table: [.., 90, ..] at index 10
- Insert 45:
- hash(45) = 45
- index = 45 % 16 = 13
- Table: [.., 45, ..] at index 13
- Insert 60:
- hash(60) = 60
- index = 60 % 16 = 12 (collision with 44)
- Linear probing: try 13 (occupied by 45), then 14
- Table: [.., 60, ..] at index 14
Final Hash Table Structure (Conceptual)
| Index | Value |
|---|---|
| 0 | None |
| 1 | None |
| 2 | 66 |
| 3 | 3 |
| 4 | None |
| 5 | None |
| 6 | 6 |
| 7 | None |
| 8 | 88 |
| 9 | None |
| 10 | 90 |
| 11 | None |
| 12 | 44 |
| 13 | 45 |
| 14 | 60 |
| 15 | 22 |
Key Observations
- Load Management: The table resized from 8 to 16 when it got too full
- Collision Handling: Used linear probing (try next slot) when collisions occurred
- Order Preservation: Note how 22 moved from index 6 to 15 after resize – this is why sets were unordered in Python < 3.7
- Efficiency: The sparse table (9/16 full) ensures fast lookups with minimal collisions
Lookup Example
To check if 45 is in the set:
- Calculate hash(45) = 45
- Compute index = 45 % 16 = 13
- Check table[13] == 45 → found!
To check if 60 is in the set:
- hash(60) = 60 → index 12
- table[12] is 44 (not 60) → collision
- Probe next index (13) → 45
- Probe next index (14) → 60 → found!
This hash table implementation gives sets their O(1) average time complexity for operations.
Set Operations in Python with Examples
Python’s set provides powerful methods for mathematical set operations. Below are detailed explanations with three examples each for:
- Union (
|orunion()) - Intersection (
&orintersection()) - Difference (
-ordifference()) - Intersection Update (
intersection_update()) - Difference Update (
difference_update()) - Symmetric Difference (
^orsymmetric_difference()) - Symmetric Difference Update (
symmetric_difference_update())
1. Union (| or union())
Returns a new set containing all elements from both sets (no duplicates).
Examples:
python
A = {1, 2, 3}
B = {3, 4, 5}
# Method 1: Using | operator
print(A | B) # Output: {1, 2, 3, 4, 5}
# Method 2: Using union()
print(A.union(B)) # Output: {1, 2, 3, 4, 5}
# Union with multiple sets
C = {5, 6, 7}
print(A.union(B, C)) # Output: {1, 2, 3, 4, 5, 6, 7}
2. Intersection (& or intersection())
Returns a new set containing only common elements between sets.
Examples:
python
A = {1, 2, 3, 4}
B = {3, 4, 5, 6}
# Method 1: Using & operator
print(A & B) # Output: {3, 4}
# Method 2: Using intersection()
print(A.intersection(B)) # Output: {3, 4}
# Intersection with multiple sets
C = {4, 5, 6}
print(A.intersection(B, C)) # Output: {4}
3. Difference (- or difference())
Returns a new set with elements in the first set but not in the second.
Examples:
python
A = {1, 2, 3, 4, 5}
B = {4, 5, 6, 7}
# Method 1: Using - operator
print(A - B) # Output: {1, 2, 3}
# Method 2: Using difference()
print(A.difference(B)) # Output: {1, 2, 3}
# Difference with multiple sets
C = {3, 5, 8}
print(A.difference(B, C)) # Output: {1, 2}
4. Intersection Update (intersection_update())
Updates the set in-place, keeping only common elements.
Examples:
python
A = {1, 2, 3, 4}
B = {3, 4, 5, 6}
# Updates A to keep only common elements
A.intersection_update(B)
print(A) # Output: {3, 4}
# Works with multiple sets
A = {1, 2, 3, 4}
C = {4, 5, 6}
A.intersection_update(B, C)
print(A) # Output: {4}
# Equivalent to &= operator
A = {1, 2, 3, 4}
A &= B
print(A) # Output: {3, 4}
5. Difference Update (difference_update())
Updates the set in-place, removing elements found in the other set(s).
Examples:
python
A = {1, 2, 3, 4, 5}
B = {4, 5, 6, 7}
# Updates A by removing elements in B
A.difference_update(B)
print(A) # Output: {1, 2, 3}
# Works with multiple sets
A = {1, 2, 3, 4, 5}
C = {3, 5, 8}
A.difference_update(B, C)
print(A) # Output: {1, 2}
# Equivalent to -= operator
A = {1, 2, 3, 4, 5}
A -= B
print(A) # Output: {1, 2, 3}
6. Symmetric Difference (^ or symmetric_difference())
Returns a new set with elements in either set but not in both.
Examples:
python
A = {1, 2, 3}
B = {3, 4, 5}
# Method 1: Using ^ operator
print(A ^ B) # Output: {1, 2, 4, 5}
# Method 2: Using symmetric_difference()
print(A.symmetric_difference(B)) # Output: {1, 2, 4, 5}
# Symmetric difference with multiple sets (requires chaining)
C = {5, 6, 7}
print(A ^ B ^ C) # Output: {1, 2, 3, 4, 6, 7}
7. Symmetric Difference Update (symmetric_difference_update())
Updates the set in-place, keeping elements in either set but not in both.
Examples:
python
A = {1, 2, 3}
B = {3, 4, 5}
# Updates A to keep symmetric difference
A.symmetric_difference_update(B)
print(A) # Output: {1, 2, 4, 5}
# Equivalent to ^= operator
A = {1, 2, 3}
A ^= B
print(A) # Output: {1, 2, 4, 5}
# Works with any iterable
A = {1, 2, 3}
A.symmetric_difference_update([3, 4, 5])
print(A) # Output: {1, 2, 4, 5}
Summary Table
| Operation | Method | Operator | In-Place Update | ||
|---|---|---|---|---|---|
| Union | A.union(B) | `A | B` | A.update(B) (`A | = B`) |
| Intersection | A.intersection(B) | A & B | A.intersection_update(B) (A &= B) | ||
| Difference | A.difference(B) | A - B | A.difference_update(B) (A -= B) | ||
| Symmetric Difference | A.symmetric_difference(B) | A ^ B | A.symmetric_difference_update(B) (A ^= B) |
These methods are highly optimized and useful for data filtering, deduplication, and mathematical set operations. 🚀
Set Methods in Python: Adding and Deleting Elements
Sets in Python are unordered collections of unique elements. They provide several methods to add and delete elements. Here’s an explanation of these methods with examples:
Adding Elements to a Set
1. add() method
Adds a single element to the set.
python
# Example 1: Adding a number
numbers = {1, 2, 3}
numbers.add(4)
print(numbers) # Output: {1, 2, 3, 4}
# Example 2: Adding a string
fruits = {'apple', 'banana'}
fruits.add('orange')
print(fruits) # Output: {'apple', 'banana', 'orange'}
# Example 3: Adding a duplicate (has no effect)
colors = {'red', 'green'}
colors.add('red')
print(colors) # Output: {'red', 'green'}
2. update() method
Adds multiple elements to the set (from iterables like lists, tuples, other sets).
python
# Example 1: Adding from a list
numbers = {1, 2}
numbers.update([3, 4, 5])
print(numbers) # Output: {1, 2, 3, 4, 5}
# Example 2: Adding from another set
set1 = {'a', 'b'}
set2 = {'c', 'd'}
set1.update(set2)
print(set1) # Output: {'a', 'b', 'c', 'd'}
# Example 3: Adding from multiple iterables
letters = {'x'}
letters.update(['y', 'z'], ('w',), {'v'})
print(letters) # Output: {'x', 'y', 'z', 'w', 'v'}
Deleting Elements from a Set
1. remove() method
Removes a specified element from the set. Raises KeyError if element doesn’t exist.
python
# Example 1: Removing an existing element
fruits = {'apple', 'banana', 'cherry'}
fruits.remove('banana')
print(fruits) # Output: {'apple', 'cherry'}
# Example 2: Trying to remove non-existent element (raises error)
colors = {'red', 'blue'}
# colors.remove('green') # Raises KeyError: 'green'
# Example 3: Removing after checking existence
numbers = {1, 2, 3}
if 2 in numbers:
numbers.remove(2)
print(numbers) # Output: {1, 3}
2. discard() method
Removes a specified element if it exists, but doesn’t raise an error if it doesn’t.
python
# Example 1: Discarding an existing element
animals = {'cat', 'dog', 'bird'}
animals.discard('dog')
print(animals) # Output: {'cat', 'bird'}
# Example 2: Discarding non-existent element (no error)
colors = {'red', 'blue'}
colors.discard('green') # No error raised
print(colors) # Output: {'red', 'blue'}
# Example 3: Safe removal without checking
numbers = {10, 20, 30}
numbers.discard(20)
numbers.discard(40) # No error
print(numbers) # Output: {10, 30}
3. pop() method
Removes and returns an arbitrary element from the set. Raises KeyError if set is empty.
python
# Example 1: Popping from a set
letters = {'a', 'b', 'c'}
popped = letters.pop()
print(f"Popped: {popped}, Remaining: {letters}") # Output varies (order not guaranteed)
# Example 2: Popping from a single-element set
single = {42}
value = single.pop()
print(value) # Output: 42
print(single) # Output: set()
# Example 3: Trying to pop from empty set
empty_set = set()
# empty_set.pop() # Raises KeyError: 'pop from an empty set'
1. copy() Method
Creates a shallow copy of the set. This is useful when you want to duplicate a set without modifying the original.
Examples:
Example 1: Basic copy
python
original = {1, 2, 3}
new_set = original.copy()
print("Original:", original) # Output: {1, 2, 3}
print("Copy:", new_set) # Output: {1, 2, 3}
# Modifying the copy doesn't affect original
new_set.add(4)
print("Original after modification:", original) # Output: {1, 2, 3}
print("Modified copy:", new_set) # Output: {1, 2, 3, 4}
Example 2: Copy with nested elements (shallow copy limitation)
python
original = {1, 2, [3, 4]} # This will raise TypeError as lists are unhashable
# Sets can only contain hashable (immutable) elements
Example 3: Comparing copy methods
python
colors = {'red', 'green', 'blue'}
# Method 1: copy() method
set1 = colors.copy()
# Method 2: set constructor
set2 = set(colors)
# Method 3: slicing (not applicable to sets)
# set3 = colors[:] # This would raise TypeError
print(set1 == set2 == colors) # Output: True
2. clear() Method
Removes all elements from the set, leaving it empty.
Examples:
Example 1: Basic clear operation
python
numbers = {10, 20, 30, 40}
print("Before clear:", numbers) # Output: {40, 10, 20, 30}
numbers.clear()
print("After clear:", numbers) # Output: set()
Example 2: Clearing an already empty set
python
empty_set = set() empty_set.clear() print(empty_set) # Output: set() (no error raised)
Example 3: Difference between clear() and creating new set
python
# Original set with multiple references
set_a = {1, 2, 3}
set_b = set_a
# Clearing the set
set_a.clear()
print("set_a after clear:", set_a) # Output: set()
print("set_b after clear:", set_b) # Output: set() (both references are cleared)
# Alternative approach with new set
set_x = {4, 5, 6}
set_y = set_x
set_x = set() # Creating new empty set
print("set_x after new assignment:", set_x) # Output: set()
print("set_y after new assignment:", set_y) # Output: {4, 5, 6} (set_y remains unchanged)
Key Differences:
copy()creates a new set object with the same elementsclear()empties the existing set object- After
clear(), all references to the set will see it as empty - With
set(), only the specific reference is changed to a new empty set
These methods are particularly useful when you need to:
- Preserve the original set while working with a copy (
copy()) - Quickly remove all elements from a set without destroying the set object itself (
clear())
Scrambled Words
word_set = {"listen","listen", "silent", "enlist", "tinsel", "inlets",
"heart", "heart", "earth", "hater", "herat", "rathe",
"team", "team", "meat", "mate", "tame", "meta",
"stare", "stare", "tears", "rates", "tares", "aster",
"angel", "angel", "glean", "genal", "angle", "galen",
"duster", "duster", "rudest", "rusted", "strude", "durest"}
result = set()
print("Scrambled word pairs:")
for word1 in word_set:
for word2 in word_set:
if (word1 != word2 and sorted(word1) == sorted(word2)):
pair = tuple(sorted((word1, word2)))
result.add(pair)
for pair in result:
print(pair)
Explanation of the Program: Jaccard Similarity Calculation
import re
sentence_a = ('Patience is the key to success in life,'
' and without it, opportunities may slip away.')
sentence_b = ("Life gives opportunities to those who are patient—"
"success truly comes to those who wait with perseverance.")
# Extract words from both sentences
tokens_a = re.findall(r'\w+', sentence_a.lower())
tokens_b = re.findall(r'\w+', sentence_b.lower())
# Convert to sets for comparison
unique_words_a = set(tokens_a)
unique_words_b = set(tokens_b)
# Calculate intersection and union
shared_words = unique_words_a & unique_words_b
all_unique_words = unique_words_a | unique_words_b
# Compute similarity score
similarity_score = len(shared_words) / len(all_unique_words)
print(f"Text Similarity Score: {similarity_score:.2f}")
Text Similarity Comparison Program
1. Purpose
This program calculates how similar two sentences are by comparing their word content using the Jaccard similarity coefficient. The score ranges from 0 (completely different) to 1 (identical).
2. Key Components
- Input Sentencespythonsentence_a = (‘Patience is the key to success in life…’) sentence_b = (“Life gives opportunities to those who are patient…”)
- Two sentences with similar themes (patience/success) but different wording.
- Word Extractionpythontokens_a = re.findall(r’\w+’, sentence_a.lower()) tokens_b = re.findall(r’\w+’, sentence_b.lower())
re.findall(r'\w+', ...)extracts all words (ignoring punctuation)..lower()ensures case-insensitive comparison (e.g., “Life” = “life”).
- Convert to Setspythonunique_words_a = set(tokens_a) unique_words_b = set(tokens_b)
- Sets automatically remove duplicate words and enable fast comparisons.
- Calculate Shared and Unique Wordspythonshared_words = unique_words_a & unique_words_b # Intersection all_unique_words = unique_words_a | unique_words_b # Union
- Shared words: Words appearing in both sentences (e.g., “patience”, “life”).
- All unique words: Combined vocabulary of both sentences.
- Jaccard Similarity Formulapythonsimilarity_score = len(shared_words) / len(all_unique_words)
- Formula:Similarity=Number of Shared WordsTotal Unique WordsSimilarity=Total Unique WordsNumber of Shared Words
- Example: If 8 words overlap out of 20 total unique words →
8/20 = 0.40.
- Outputpythonprint(f”Text Similarity Score: {similarity_score:.2f}”)
- Prints the score rounded to 2 decimal places (e.g.,
0.40).
- Prints the score rounded to 2 decimal places (e.g.,
Example Walkthrough
For these sentences:
- Sentence A: “Patience is the key to success in life…”
- Sentence B: “Life gives opportunities to those who are patient…”
- Extracted Words:
tokens_a= [“patience”, “is”, “the”, “key”, “to”, “success”, “in”, “life”, …]tokens_b= [“life”, “gives”, “opportunities”, “to”, “those”, “who”, “are”, “patient”, …]
- Sets:
unique_words_a= {“patience”, “is”, “the”, “key”, “to”, “success”, “life”, …}unique_words_b= {“life”, “to”, “patient”, “opportunities”, …}
- Comparisons:
shared_words= {“life”, “to”}all_unique_words= {“patience”, “is”, “the”, “key”, “success”, “life”, “to”, “patient”, …}
- Score Calculation:
- If 2 shared words and 10 total unique words →
2/10 = 0.20.
- If 2 shared words and 10 total unique words →
Why This Matters
- Applications:
- Plagiarism detection
- Search engines (ranking similar documents)
- Chatbots (matching user queries to responses)
- Advantages:
- Simple and interpretable (0–1 scale).
- Ignores word order, focusing on vocabulary overlap.