In this article I will introduce you the basic collection types in Python 3. Collections are types where you can group elements together and retrieve them later.

We will cover sequence types: tuples and lists; sets and dictionaries.

Tuples

A tuple is an ordered sequence of zero or more elements. The elements in a tuple can be of arbitrary type, this means you can mix up strings, numbers and other kind of objects.

Empty tuples can be created using a set of parentheses () or the by calling the tuple() function without any parameters. Calling the tuple() function with an iterable as argument, it will convert that iterable to a tuple. This can be used if you want to "freeze" a list to save it from mutations (where elements are deleted or replaced).

>>> t1 = () >>> t2 = tuple() >>> type(t1) <class 'tuple'> >>> type(t2) <class 'tuple'> >>> t3 = tuple(['spam', 'spam', 'spam', 'egg', 'spam']) >>> t3 ('spam', 'spam', 'spam', 'egg', 'spam') >>> type(t3) <class 'tuple'>

Tuples can be created by surrounding the elements separated by commas (,) with parentheses (). These parentheses can be omitted if you do not need to provide the tuples as a parameter to a function. In that particular case you have to surround them with parentheses.

>>> t3 = ('spam', 'egg', 'sausage') >>> t4 = 'egg', 'bacon', 'spam' >>> type(t3) <class 'tuple'> >>> type(t4) <class 'tuple'> >>> t4 ('egg', 'bacon', 'spam') >>> mixed_tuple = (2, 'one', 3.14) >>> mixed_tuple (2, 'one', 3.14)

Because tuples are sequence types, you can iterate over the elements, access particular elements by their index, slice them or see if a given item is in the elements.

>>> for e in t4: ... print(e) ... egg bacon spam >>> 'spam' in t3 True >>> t3[2] 'sausage' >>> t3[::-1] ('sausage', 'egg', 'spam') >>> t3[4:5] ()

Tuples are immutable, this means that you cannot delete or replace elements. If you try to do this, you will get an error.

>>> del t3[0] Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'tuple' object doesn't support item deletion >>> t3[0] = 'bacon' Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'tuple' object does not support item assignment

However you can concatenate tuples. Naturally this will not alter the old tuple but create a copy with all elements of the original and the new elements.

>>> t5 = t3 + t4 >>> t5 ('spam', 'egg', 'sausage', 'egg', 'bacon', 'spam')

Lists

A list, just like a tuple, is an ordered sequence of zero or more elements. These elements can be of arbitrary type, which means you can mix up numbers with strings and other kind of objects.

Empty lists can be created using a set of square brackets [] or by calling the list() function without any parameters. If you call the list() function with an iterable as argument then this iterable will be converted to a list. This is useful if you have a tuple and want to create a mutable collection of its items (for example you want to add or remove an element).

>>> l3 = list(('egg', 'sausage', 'spam')) # <-- the argument is a tuple >>> l3 ['egg', 'sausage', 'spam'] >>> type(l3) <class 'list'>

Lists can be created by listing its elements as a comma separated sequence inside square brackets.

>>> l4 = ['egg', 'spam', 'sausage'] >>> l3 == l4 False >>> mixed_list = [3.14, 'one', 2] >>> mixed_list [3.14, 'one', 2]

Because lists, like tuples, are sequence types, you can iterate over the elements, access particular elements by their index, slice them or see if a given item is in the elements.

>>> for l in l3: ... print(l) ... egg sausage spam >>> l3[1] 'sausage' >>> l3[::-1] ['spam', 'sausage', 'egg'] >>> l3[3:5] []

Opposite to tuples, lists are mutable. This means you can add new elements, remove or replace old ones from a list without getting an error.

Removing elements can be done in different ways depending on what you want to achieve and if you want to do something with the removed element.

Lists can be extended with other lists. To do this you can either use the extend() method of lists or use the += operator. Concatenation works just like with tuples: you can use the plus + operator.

>>> l3 ['egg', 'sausage', 'spam'] >>> l3[2] = 'bacon' >>> l3 ['egg', 'sausage', 'bacon'] >>> l3 += ['spam'] >>> l3 ['egg', 'sausage', 'bacon', 'spam'] >>> del l3[3] >>> l3 ['egg', 'sausage', 'bacon'] >>> l3.append('spam') >>> l3 ['egg', 'sausage', 'bacon', 'spam'] >>> l3.extend('baked beans') >>> l3 ['egg', 'sausage', 'bacon', 'spam', 'b', 'a', 'k', 'e', 'd', ' ', 'b', 'e', 'a', 'n', 's'] >>> l3 = l3[:4] >>> l3 ['egg', 'sausage', 'bacon', 'spam'] >>> l3.extend(['baked beans']) >>> l3 ['egg', 'sausage', 'bacon', 'spam', 'baked beans'] >>> l3.pop(3) # <-- works with index 'spam' >>> l3 ['egg', 'sausage', 'bacon', 'baked beans'] >>> l3.insert(0, 'spam') >>> l3 ['spam', 'egg', 'sausage', 'bacon', 'baked beans'] >>> l3.append('spam']) >>> l3 ['spam', 'egg', 'sausage', 'bacon', 'baked beans', 'spam'] >>> l3.remove('spam') # <-- works with value, removes first occurrence >>> l3 ['egg', 'sausage', 'bacon', 'baked beans', 'spam'] >>> l4 ['egg', 'spam', 'sausage'] >>> l3 + l4 ['egg', 'sausage', 'bacon', 'baked beans', 'spam', 'egg', 'spam', 'sausage']

Sets

A set is an unordered collection of unique elements. Unordered means that you cannot access elements with indexing and the elements are stored in an order decided by the interpreter. Unique means that each elements of the set appears only once. The elements of a set have to be hashable. A hashable object has a __hash()__ method which returns the same value through the object's lifetime. However do not get frightened, to have a very brief description all immutable objects are hashable -- mutable objects (like lists, dictionaries or sets themselves) aren't.

Creating an empty set is done by calling the set() function without parameters. There is no other way to create empty sets. If you look a bit ahead you can see, that sets with values can be created using curly brackets {} but if you use only curly braces you get an empty dictionary instead of an empty set.

>>> s1 = set() >>> type(s1) <class 'set'> >>> s2 = {} >>> type(s2) <class 'dict'>

If you provide an iterable to the set() function as an argument, that iterable will be converted to a set. This is useful if you want to remove duplicates of a list with one line of code.

>>> s3 = set([1,1,2,3,4,5,5,6,7,7,7,8,9,9,1,2,3,4,5]) >>> s3 {1, 2, 3, 4, 5, 6, 7, 8, 9} >>> type(s3) <class 'set'> >>> mixed_set {2, 3.14, 'one'}

Sets can be created by putting comma-separated values inside curly brackets {}.

>>> s4 = {'Spam', 'egg', 'sausage', 'Spam'} >>> s4 {'egg', 'sausage', 'Spam'}

Although sets cannot be indexed or sliced they can be iterated through just like lists or tuples. However do not make assumptions on the order of the elements. Beside this sets support the in membership operator too just like lists.

>>> for s in s4: ... print(s) ... egg sausage Spam

Sets are mutable objects. This means you can add or remove elements without problems. However you will see, that the operations are quite different from what you have seen already. This is because sets represent mathematical sets and their operations (union, difference, intersection, symmetric difference) can be used on sets too.

>>> s4.add('baked beans') >>> s4 {'egg', 'baked beans', 'sausage', 'Spam'} >>> s4.remove('Spam') >>> s4 {'egg', 'baked beans', 'sausage'} >>> s5 = {'Spam', 'bacon', 'egg'} >>> s5 {'egg', 'Spam', 'bacon'} >>> s4 = {'Spam', 'egg', 'sausage', 'Spam'} >>> s4 {'egg', 'sausage', 'Spam'} >>> s4.union(s5) {'sausage', 'bacon', 'egg', 'Spam'} >>> s4.intersection(s5) {'egg', 'Spam'}

A good example usage of sets is if you want to get the different characters in a string. Because a String is an iterable you can pass it to the set() function as an argument. The resulting set will contain every character once in the string.

>>> text = 'Spam, Spam, Spam, Spam, Spam, Spam, baked beans, Spam, Spam, Spam and Spam' >>> characters = set(text) >>> characters {'e', 'n', 'b', 's', ',', ' ', 'p', 'm', 'a', 'd', 'S', 'k'} >>> len(characters) 12

The basic solution contains all characters. If you are interested only in word characters and the case does not matter you have to prepare your string a bit before you create a set of it.

>>> import re >>> set(re.sub('\W','',text).lower()) {'e', 'n', 'b', 's', 'm', 'p', 'a', 'd', 'k'}

The re module contains functions which utilize regular expression, and the sub() function substitutes characters matching the given regular expression (first argument) with the replacement (second argument) in the provided string (third argument).

Dictionaries

A dictionary is an unordered collection of zero or more key-value pairs where the keys have to be hashable and keys are unique. This means that if you add a new key-value pair to the dictionary and the key already exists its value will be overwritten. Keys are separated with a colon : from their values.

Empty dictionaries can be created using the dict() function with no arguments or simply defining a pair of empty curly brackets {}. If you provide an iterable to the dict() function it will create a new dictionary based on the elements in that iterable if the provided iterable has iterables of length 2 as elements.

>>> d = {} >>> type(d) <class 'dict'> >>> d2 = dict() >>> type(d2) <class 'dict'> >>> iterables = ['ab', 'cd', 'de'] >>> d3 = dict(iterables) >>> d3 {'d': 'e', 'c': 'd', 'a': 'b'}

Dictionaries can be created by putting comma-separated key-value pairs inside curly brackets.

>>> d4 = {'one': 1, 'two': 2, 'three': 3} >>> d4 {'two': 2, 'one': 1, 'three': 3} >>> type(d4) <class 'dict'> >>> mixed_dict = {'one': 1, 1: 'one', 3.14: 'pi'} >>> mixed_dict {3.14: 'pi', 1: 'one', 'one': 1}

Dictionaries cannot be sliced or indexed, however their elements can be accessed through their keys.

>>> d4[1] Traceback (most recent call last): File "<stdin>", line 1, in <module> KeyError: 1 >>> d4['one'] 1

Dictionaries can be iterated over and support the in membership operator.

>>> for key in d4: ... print(key, d4[key]) ... two 2 one 1 three 3 >>> for key, value in d4.items(): ... print(key, ': ', value) ... two : 2 one : 1 three : 3 >>> 'four' in d4 False >>> 'three' in d4 True

Because dictionaries are mutable objects you can easily add or remove elements. The easiest way for adding new elements is to reference the key and assign the value.

>>> d4['four'] = 4 >>> d4 {'two': 2, 'four': 4, 'one': 1, 'three': 3} >>> del d4['one'] >>> d4 {'two': 2, 'four': 4, 'three': 3}

An example use case for dictionaries could be that you want to count the occurrences of words in a string or a list of strings. This can be used in data science for example -- but this approach is a real basic one however it makes sure you understand dictionaries.

>>> word_counts = {} >>> for word in text.split(): ... if word in word_counts: ... word_counts[word] += 1 ... else: ... word_counts[word] = 1 ... >>> word_counts {'baked': 1, 'Spam': 2, 'beans,': 1, 'Spam,': 8, 'and': 1}

This solution is quite basic just like the example from the sets section. In this case we prepare our string a bit more too and let's revisit the results.

>>> word_counts = {} >>> for word in re.sub('\W',' ',text).lower().split(): ... if word in word_counts: ... word_counts[word] += 1 ... else: ... word_counts[word] = 1 ... >>> word_counts {'baked': 1, 'beans': 1, 'spam': 10, 'and': 1}