Understanding sets in python

As I learn more and more about python’s different data types, I find myself surprised that not enough people use (or even know) sets. At my job, I am often taking some data and transforming it. Once transformed, I have to do analysis on how the data may have changed and sets are great for such comparisons.

In this post, I will cover how to create sets and show some examples on how to use them.

What is a set?
A set is an unordered collection of unique items in python. They are sort-of like lists but they only contain unique items and don’t maintain order. They also have a lot of helpful unique operations.

Creating sets
There are two ways in which you can create sets:

# You can create sets using {}
In [1]: before = {'My', 'algorithm', 'is', 'the'}
In [2]: type(before)
Out[2]: set

# You can also create sets using set()
In [3]: after = set(['best', 'compression', 'algorithm', 'in', 'the', 'world'])
In [4]: type(after)
Out[4]: set

Sets operations
Great, so now we know of the two ways we can create sets. What can we actually do with them? It turns out that we can do a lot of useful comparisons with them. For example, sets can be used to find elements that are in one set but not the other, are in both sets, or are in one of the sets but not both.

Union
Get the union of both sets:

In [5]: before.union(after)
Out[5]: {'My', 'algorithm', 'best', 'compression', 'in', 'is', 'the', 'world'}

# You can also use '|' operator
In [6]: before | after
Out[6]: {'My', 'algorithm', 'best', 'compression', 'in', 'is', 'the', 'world'}

As you can see, the result of union is another set which contains unique elements from both sets. Also notice how the resultant set is not ordered.

Intersection
Get the intersection of both sets:

In [7]: before.intersection(after)
Out[7]: {'algorithm', 'the'}

# You can also use '&' operator
In [8]: before & after
Out[8]: {'algorithm', 'the'}

As you would expect, the result of intersection is another set which contains elements that appear in both sets.

Difference
Get elements that are in one set but not the other:

# Get elements that are in 'before' set but not in 'after' set
In [9]: before.difference(after)
Out[9]: {'My', 'is'}

# Get elements that are in 'after' set but not in 'before' set
In [10]: after.difference(before)
Out[10]: {'best', 'compression', 'in', 'world'}

# You can also use '-' operator
In [11]: before - after
Out[11]: {'My', 'is'}
<span id="tce6ddcc7">Its acts like an anti-impotence pills, which aids thrash down the issue of erectile  <a href="http://amerikabulteni.com/2013/10/13/her-10-amerikalidan-1i-amerikan-vatandasligindan-cikmayi-dusunuyor/">discount levitra online</a> dysfunction. Columbia River Knife & Tool was founded by Rod <a href="http://amerikabulteni.com/2011/07/22/blast-in-oslo-8-people-injured/">hop over to here</a> viagra price Bremer, and Paul Gillespi. Steroids on dihydrotestosterone basis (the strongest androgen) are not subject to the insider trading rules that most Americans are governed by, for example, Bernie Mad off and cheap cialis <a href="http://amerikabulteni.com/2013/03/16/herkes-irlandali-her-yer-yesil-st-patrick-gunu-nedir-2/">http://amerikabulteni.com/2013/03/16/herkes-irlandali-her-yer-yesil-st-patrick-gunu-nedir-2/</a> many others were sanctioned for insider trading and abuse of other people's money. Even though you are able to experience arousal you are not able to get <a href="http://amerikabulteni.com/2012/05/18/obama-bu-kez-sex-and-the-citynin-yildizi-sarah-jessica-parkerin-evine-misafir-olacak/">levitra order</a>  an erection. </span>
In [12]: after - before
Out[12]: {'best', 'compression', 'in', 'world'}

Symmetric difference
Get elements that are in either set but not in both:

In [13]: before.symmetric_difference(after)
Out[13]: {'My', 'best', 'compression', 'in', 'is', 'world'}

# You can also use '^' operator
In [14]: before ^ after
Out[14]: {'My', 'best', 'compression', 'in', 'is', 'world'}

Subsets and supersets
What if you want to check if a set is a subset or a superset of another set? In other words, you would like to check if all the items in a set are present in another set and vice versa.

# Create two sets
In [15]: set1 = {1, 2, 3, 4, 5}
In [16]: set2 = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

# Check if set1 is subset of set2
In [17]: set1.issubset(set2)
Out[17]: True

# Check if set2 is superset of set1
In [18]: set2.issuperset(set1)
Out[18]: True

Using sets to get unique items
A very common use of sets is to get unique items in a list. To do so, all you have to do is create a set from the list.

# List with duplicate items
In [19]: old = [1, 3, 3, 5, 4, 4]

# Create a set from list using set()
In [20]: new = set(old)

# Set contains only unique items. As you can see, they are unordered.
In [21]: new
Out[21]: {1, 3, 4, 5}

# To get a list, use list()
In [22]: list(new)
Out[22]: [1, 3, 4, 5]

Set comprehension
I covered this briefly in one of my previous posts but it deserves to be mentioned again since it is not widely known and/or used. Just like you have list comprehension in python, you also have set comprehension. The only difference is in syntax. Instead of using brackets [], you use braces {}.

In [23]: new
Out[23]: {1, 3, 4, 5}

# Add 1 to each item in 'new'
In [24]: {x+1 for x in new}
Out[24]: {2, 4, 5, 6}

I hope you found this post helpful. They are certainly more ways you can use sets but these are some of the most important ways. Let me know if you think I missed anything.

Note: You can now subscribe to my blog updates here to receive latest updates in your inbox!

Join the Conversation

2 Comments

  1. This post mainly describes what you can do with sets in the context of set theory in mathematics. Another important aspect of sets that helps determine whether to use sets or some other data structure, comes from its performance characteristics.

    For example, “element in collection” is an O(N) operation in most collections like lists and tuples because you have to look through each item to see if “element” is present. By contrast, sets perform this operation in O(1) time! It doesn’t matter how large the set is, it takes the same amount of time to determine if an element is present in it. This can be the difference between a usable algorithm and an unusable one.

    It does this by pre-calculating where any item will be based on its hash value. This implies several features:

    1. Items must be hashable to be in a set (which is the same attribute needed to be a key in a dictionary)
    2. You can pre-determine where an object would have to be if it was in a set by calculating its hash (this is why “in” is an O(1) operation for sets)
    3. You lose order (unless you set up some kind of linked list between the added elements like is done for OrderedDict).

    1. Hi Jason,

      You are absolutely correct. If you need to search for an item, sets are definitely faster than lists.

      I intended this post for highlighting high level usage of sets. Maybe in one of my future posts, I can compare lists with sets.

Leave a comment

Your email address will not be published. Required fields are marked *