Sets in Python - How Their Different From Tuples, Lists, and Dictionaries.
In Day 7 of the 30 Days of Python I go on to learn about sets. Sets can be described as storing unordered, un-indexed, and unique values only. These are the functions we can perform with them.
Sets store their data using hash values rather than indices which is why they are unordered. If you were to print the same set twice, the order of items could shuffle and that’s completely normal; however, for that reason we’re unable to use the traditional method of finding/accessing items using indices here. Interestingly, the only values accepted in a set must be unique to each other; hence, when we merge two lists it will automatically remove any duplicates.
This is also the topic that introduces some new functions that I haven’t come across before, which I found very similar to using a MySQL database / CLI. Functions like Union, Intersection, Difference, Symmetric Difference, Subset, Super Set, and Disjoint set.
Creating a Set
Again, like with lists and tuples, you can create a set using initial values or as an empty set.
1
lst = set()
Update: Using
lst = {}won’t work as this creates a dictionary, not a set.
Accessing Items In a Set
As we don’t have the use of indices with sets, in order to find an item in a set, we would use membership operators or loops.
1
2
3
lst = {"Apple", "Banana", "Pear"}
print("Apple" in lst) # True
Adding Items To a Set
There are a couple of methods we could use here:
1
2
3
4
lst = {}
lst.add("Item1") # Adds a single item
lst.update(["Item2", "Item3"]) # Adds multiple items
Removing Items from a Set
We can also use a couple of functions here:
1
2
3
4
5
lst = {"Apple", "Banana", "Pear"}
lst.remove("Apple") # Remove a specific item
lst.pop() # Removes a random item
lst.discard("Plum") # No error method
The .pop() function would remove the last-item from a list type but in a set it removes a random item. The reason to this is because, indices aren’t being used to store data and they aren’t ordered. This allows a set to shuffle the data arbitrarily; which also means that when using sets you shouldn’t rely on the order in which it presents the items. I’ll provide an example to explain.
1
2
3
4
5
lst = {"Apple", "Banana", "Pear"}
print(lst) # {"Pear", "Apple", "Banana"}
# Hash value: 3452 867 8758
print(lst) # {"Banana", "Pear", "Apple"}
We can also obtain the hash value used to store an item:
1
2
3
4
5
6
7
lst = {"Apple", "Banana", "Pear"}
for item in lst:
print(hash(item))
# Banana - -5329514218518754947
# Pear - -5730524034900827218
# Apple - -2970818454656169705
Deleting a Set
You can delete a set using Del or even clear the contents of a set using .clear().
1
2
3
4
5
lst = {"Apple", "Banana", "Pear"}
lst.clear()
print(lst) # {}
del lst # Delete the set
Changing List Type
1
2
3
4
5
6
7
8
9
lst = {"Apple", "Banana", "Pear"}
lst = list(lst) # change to a normal list
lst = ["Apple", "Apple", "Banana", "Pear"]
lst = set(lst) # change to a set
print(lst) # {"Apple", "Banana", "Pear"} - Removes duplicates to create new set
lst = ("Apple", "Apple", "Banana", "Pear")
lst = tuple(lst)
I want to highlight here how changing a normal list or a tuple to a set will result in the removal of duplicate data.
Joining Sets
There are two methods we can use here, update() and union():
1
2
3
4
5
6
lst1 = {"Apple", "Banana", "Pear"}
lst2 = {"Mango", "Cherry", "Grapes"}
lst3 = lst1.union(lst2)
lst3 = lst1.update(lst2)
Intersection in Sets
Since this is a new function that we learn of, I’ll give it a brief rundown. The intersection() function just find the same items found in two sets.
1
2
3
4
5
6
7
lst1 = {"Apple", "Banana", "Pear"}
lst2 = {"Apple", "Mango", "Cherry", "Grapes"}
print(lst1.intersection(lst2)) # {"Apple"}
# If there are multiple items, it will also bring them
Subsets and Super Sets
My initial thoughts on when I came across this function was to think that it is providing a boolean result based on the amount of data inside each set. While partially true, it’s more to do with the contents. A subset is classified as checking to see if all the contents of one set is in another, a superset is doing the same but also making sure that there are more data in the selected set.
1
2
3
4
5
lst1 = {"Apple", "Banana", "Pear"}
lst2 = {"Apple", "Banana", "Pear", "Grapes"}
print(lst1.issubset(lst2)) # True - all items in lst2 and is same size or less
print(lst2.issuperset(lst1)) # True - all items are present from lst1 and is larger
Difference Between Two Sets
This function just returns the differences in data found between two sets.
1
2
3
4
lst1 = {"Apple", "Banana", "Pear", "Kiwi"}
lst2 = {"Apple", "Banana", "Pear", "Grapes"}
print(lst1.difference(lst2)) # Kiwi
print(lst1.symmetric_difference(lst2)) # {"Grapes", "Kiwi"}
Note:
difference()uses one set against another to compare, whereassymmetric_difference()uses both to compare and returns items that are unique in each set.
We can also use isdisjoint() to provide a boolean response to see if two sets contain similar items. If they do then we can’t join them, if they don’t we can.
True if both are unique to each other and False if there are common items:
1
2
3
lst1 = {"Apple", "Banana", "Pear", "Kiwi"}
lst2 = {"Apple", "Banana", "Pear", "Grapes"}
print(lst1.isdisjoint(lst2)) # False
Summary
So far, I’d say that revising over the topic has helped with understanding what each function does. Previously, I would say that I wouldn’t understand functions like symmetric_difference() and difference() or issubset() and issuperset() because I would fail to imagine where their uses or application would be.
Looking back over, though I still don’t know where I would use these functions and what it would be useful for, I would now say I do understand them and it’s just a matter of finding where I would put them.