# Understanding sets in python

As I learn more and more about python’s different data types, I find myself surprised that not enough people use (or even know) *sets*. At my job, I am often taking some data and transforming it. Once transformed, I have to do analysis on how the data may have changed and *sets* are great for such comparisons.

In this post, I will cover how to create *sets* and show some examples on how to use them.

What is a *set*?

A *set* is an unordered collection of unique items in python. They are sort-of like *lists* but they only contain unique items and don’t maintain order. They also have a lot of helpful unique operations.

**Creating sets**

There are two ways in which you can create

*sets*:

# You can create sets using {} In [1]: before = {'My', 'algorithm', 'is', 'the'} In [2]: type(before) Out[2]: set # You can also create sets using set() In [3]: after = set(['best', 'compression', 'algorithm', 'in', 'the', 'world']) In [4]: type(after) Out[4]: set

*Sets* operations

Great, so now we know of the two ways we can create *sets*. What can we actually do with them? It turns out that we can do a lot of useful comparisons with them. For example, *sets* can be used to find elements that are in one *set* but not the other, are in both *sets*, or are in one of the *sets* but not both.

**Union**

Get the union of both *sets*:

In [5]: before.union(after) Out[5]: {'My', 'algorithm', 'best', 'compression', 'in', 'is', 'the', 'world'} # You can also use '|' operator In [6]: before | after Out[6]: {'My', 'algorithm', 'best', 'compression', 'in', 'is', 'the', 'world'}

As you can see, the result of union is another *set* which contains unique elements from both *sets*. Also notice how the resultant *set* is not ordered.

**Intersection**

Get the intersection of both sets:

In [7]: before.intersection(after) Out[7]: {'algorithm', 'the'} # You can also use '&' operator In [8]: before & after Out[8]: {'algorithm', 'the'}

As you would expect, the result of intersection is another *set* which contains elements that appear in both *sets*.

**Difference**

Get elements that are in one *set* but not the other:

# Get elements that are in 'before' set but not in 'after' set In [9]: before.difference(after) Out[9]: {'My', 'is'} # Get elements that are in 'after' set but not in 'before' set In [10]: after.difference(before) Out[10]: {'best', 'compression', 'in', 'world'} # You can also use '-' operator In [11]: before - after Out[11]: {'My', 'is'} In [12]: after - before Out[12]: {'best', 'compression', 'in', 'world'}

**Symmetric difference**

Get elements that are in either *set* but not in both:

In [13]: before.symmetric_difference(after) Out[13]: {'My', 'best', 'compression', 'in', 'is', 'world'} # You can also use '^' operator In [14]: before ^ after Out[14]: {'My', 'best', 'compression', 'in', 'is', 'world'}

**Subsets and supersets**

What if you want to check if a *set* is a subset or a superset of another *set*? In other words, you would like to check if all the items in a *set* are present in another *set* and vice versa.

# Create two sets In [15]: set1 = {1, 2, 3, 4, 5} In [16]: set2 = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} # Check if set1 is subset of set2 In [17]: set1.issubset(set2) Out[17]: True # Check if set2 is superset of set1 In [18]: set2.issuperset(set1) Out[18]: True

**Using sets to get unique items**

A very common use of

*sets*is to get unique items in a

*list*. To do so, all you have to do is create a

*set*from the

*list*.

# List with duplicate items In [19]: old = [1, 3, 3, 5, 4, 4] # Create a set from list using set() In [20]: new = set(old) # Set contains only unique items. As you can see, they are unordered. In [21]: new Out[21]: {1, 3, 4, 5} # To get a list, use list() In [22]: list(new) Out[22]: [1, 3, 4, 5]

*Set* comprehension

I covered this briefly in one of my previous posts but it deserves to be mentioned again since it is not widely known and/or used. Just like you have *list comprehension* in python, you also have *set comprehension*. The only difference is in syntax. Instead of using brackets *[]*, you use braces *{}*.

In [23]: new Out[23]: {1, 3, 4, 5} # Add 1 to each item in 'new' In [24]: {x+1 for x in new} Out[24]: {2, 4, 5, 6}

I hope you found this post helpful. They are certainly more ways you can use sets but these are some of the most important ways. Let me know if you think I missed anything.

Note: You can now subscribe to my blog updates here to receive latest updates in your inbox!

This post mainly describes what you can do with sets in the context of set theory in mathematics. Another important aspect of sets that helps determine whether to use sets or some other data structure, comes from its performance characteristics.

For example, “element in collection” is an O(N) operation in most collections like lists and tuples because you have to look through each item to see if “element” is present. By contrast, sets perform this operation in O(1) time! It doesn’t matter how large the set is, it takes the same amount of time to determine if an element is present in it. This can be the difference between a usable algorithm and an unusable one.

It does this by pre-calculating where any item will be based on its hash value. This implies several features:

1. Items must be hashable to be in a set (which is the same attribute needed to be a key in a dictionary)

2. You can pre-determine where an object would have to be if it was in a set by calculating its hash (this is why “in” is an O(1) operation for sets)

3. You lose order (unless you set up some kind of linked list between the added elements like is done for OrderedDict).

Hi Jason,

You are absolutely correct. If you need to search for an item, sets are definitely faster than lists.

I intended this post for highlighting high level usage of sets. Maybe in one of my future posts, I can compare lists with sets.