Skip to content

Latest commit

 

History

History
402 lines (331 loc) · 10.7 KB

datatypes.md

File metadata and controls

402 lines (331 loc) · 10.7 KB

Handy data types in the standard library

[](this doesn't explain how dict.setdefault and collections.defaultdict work because they're not as simple as the things that are here and i don't actually use them that much)

Now we know how to ues lists, tuples and dictionaries. They are commonly used data types in Python, and there's nothing wrong with them. In this chapter we'll learn more data types that make some things easier. You can always do everything with lists and dictionaries, but these data types can do a lot of the work for you.

If it looks like a duck and quacks like a duck, it must be a duck.

Many things in this tutorial are not really something but they behave like something. For example, we'll learn about many classes that behave like dictionaries. They are not dictionaries, but we can use them just like if they were dictionaries. This programming style is known as duck-typing.

Sets

Let's say we have a program that keeps track of peoples' names. We can store the names in a list, and adding a new name is easy as appending to that list. Lists remember their order and it's possible to add the same thing multiple times.

>>> names = ['wub_wub', 'theelous3', 'RubyPinch', 'go|dfish', 'Nitori']
>>> names.append('Akuli')
>>> names.append('Akuli')
>>> names
['wub_wub', 'theelous3', 'RubyPinch', 'go|dfish', 'Nitori', 'Akuli', 'Akuli']
>>>

This is usually what we need, but sometimes it's not. Sometimes we just want to store a bunch of things. We don't need to have the same thing twice and we don't care about the order.

This is when sets come in. They are like lists without order or duplicates, or keys of dictionaries without the values. We can create a set just like a dictionary, but without :.

>>> names = {'wub_wub', 'theelous3', 'RubyPinch', 'go|dfish', 'Nitori'}
>>> names
{'RubyPinch', 'theelous3', 'go|dfish', 'wub_wub', 'Nitori'}
>>> type(names)
<class 'set'>
>>> 'wub_wub' in names
True
>>>

We can also convert anything iterable to a set by calling the class.

>>> set('hello')
{'o', 'e', 'h', 'l'}
>>> type(set('hello'))
<class 'set'>
>>>

When we did set('hello') we lost one h and the set ended up in a different order because sets don't contain duplicates or keep track of their order.

Note that {} is a dictionary because dictionaries are used more often than sets, so we need set() if we want to create an empty set.

>>> type({'a', 'b'})
<class 'set'>
>>> type({'a'})
<class 'set'>
>>> type({})
<class 'dict'>
>>> type(set())     # set() is an empty set
<class 'set'>
>>>

Sets have a remove method just like lists have, but they have an add method instead of append.

>>> names = {'theelous3', 'wub_wub'}
>>> names.add('Akuli')
>>> names
{'wub_wub', 'Akuli', 'theelous3'}
>>> names.remove('theelous3')
>>> names
{'wub_wub', 'Akuli'}
>>>

That's the boring part. Now let's have a look at some really handy things we can do with sets:

>>> a = {'RubyPinch', 'theelous3', 'go|dfish'}
>>> b = {'theelous3', 'Nitori'}
>>> a & b      # names in a and b
{'theelous3'}
>>> a | b      # names in a, b or both
{'Nitori', 'theelous3', 'go|dfish', 'RubyPinch'}
>>> a ^ b      # names in a or b, but not both
{'RubyPinch', 'Nitori', 'go|dfish'}
>>> a - b      # names in a but not in b
{'go|dfish', 'RubyPinch'}
>>>

Named tuples

It can be tempting to make a class that just contains a bunch of data and that's it.

class Website:

    def __init__(self, url, founding_year, free_to_use):
        self.url = url
        self.founding_year = founding_year
        self.free_to_use = free_to_use


github = Website('https://github.com/', 2008, True)

You should avoid making classes like this. This class has only one method, so it doesn't really need to be a class. We could just use a tuple instead:

github = ('https://github.com/', 2008, True)

The problem with this is that if someone reading our code sees something like website[1] > 2010 it doesn't make much sense, like website.founding_year > 2010 would.

In cases like this, collections.namedtuple is handy:

>>> Website = collections.namedtuple('Website', 'url founding_year free_to_use')
>>> github = Website('https://github.com/', 2008, True)
>>> github[1]
2008
>>> for thing in github:
...     print(thing)
...
https://github.com/
2008
True
>>> github.founding_year
2008
>>> github
Website(url='https://github.com/', founding_year=2008, free_to_use=True)
>>>

As you can see, our github behaves like a tuple, but things like github.founding_year also work and github looks nice when we have a look at it on the >>> prompt.

Deques

To understand deques, we need to first learn about a list method I haven't talked about earlier. It's called pop and it works like this:

>>> names = ['wub_wub', 'theelous3', 'Nitori', 'RubyPinch', 'go|dfish']
>>> names
['wub_wub', 'theelous3', 'Nitori', 'RubyPinch', 'go|dfish']
>>> names.pop()
'go|dfish'
>>> names
['wub_wub', 'theelous3', 'Nitori', 'RubyPinch']
>>> names.pop()
'RubyPinch'
>>> names
['wub_wub', 'theelous3', 'Nitori']
>>>

The list shortens from the end by one when we pop from it, and we also get the removed item back. So we can add an item to the end of a list using append, and we can remove an item from the end using pop.

It's also possible to do these things in the beginning of a list, but lists were not designed to be used that way and it would be slow if our list would be big. The collections.deque class makes appending and popping from both ends easy and fast. It works just like lists, but it also has appendleft and popleft methods.

>>> names = collections.deque(['theelous3', 'Nitori', 'RubyPinch'])
>>> names
deque(['theelous3', 'Nitori', 'RubyPinch'])
>>> names.appendleft('wub_wub')
>>> names.append('go|dfish')
>>> names
deque(['wub_wub', 'theelous3', 'Nitori', 'RubyPinch', 'go|dfish'])
>>> names.popleft()
'wub_wub'
>>> names.pop()
'go|dfish'
>>> names
deque(['theelous3', 'Nitori', 'RubyPinch'])
>>>

The deque behaves a lot like lists do, and we can do list(names) if we need a list instead of a deque for some reason.

Deques are often used as queues. It means that items are always added to one end and popped from the other end.

Counting things

Back in the dictionary chapter we learned to count the number of words in a sentence like this:

sentence = input("Enter a sentence: ")
counts = {}
for word in sentence.split():
    if word in counts:
        counts[word] += 1
    else:
        counts[word] = 1

This code works just fine, but there are easier ways to do this. For example, we could use the get method. It works so that the_dict.get('hi', 'hello') tries to give us the_dict['hi'] but gives us 'hello' instead if 'hi' is not in the dictionary.

>>> the_dict = {'hi': 'this is working'}
>>> the_dict.get('hi', 'lol its not there')
'this is working'
>>> the_dict.get('hello', 'lol its not there')
'lol its not there'
>>>

So we could write code like this instead:

sentence = input("Enter a sentence: ")
counts = {}
for word in sentence.split():
    counts[word] = counts.get(word, 0) + 1

Counting things like this is actually so common that there's a class just for that. It's called collections.Counter and it works like this:

>>> import collections
>>> words = ['hello', 'there', 'this', 'test', 'is', 'a', 'hello', 'test']
>>> counts = collections.Counter(words)
>>> counts
Counter({'test': 2, 'hello': 2, 'is': 1, 'this': 1, 'there': 1, 'a': 1})
>>>

Now counts is a Counter object. It behaves a lot like a dictionary, and everything that works with a dictionary should also work with a counter. We can also convert the counter to a dictionary by doing dict(the_counter) if something doesn't work with a counter.

>>> for word, count in counts.items():
...     print(word, count)
...
test 2
is 1
this 1
there 1
a 1
hello 2
>>>

Combining dictionaries

We can add together strings, lists, tuples and sets easily.

>>> "hello" + "world"
'helloworld'
>>> [1, 2, 3] + [4, 5]
[1, 2, 3, 4, 5]
>>> (1, 2, 3) + (4, 5)
(1, 2, 3, 4, 5)
>>> {1, 2, 3} | {4, 5}
{1, 2, 3, 4, 5}
>>>

But how about dictionaries? They can't be added together with +.

>>> {'a': 1, 'b': 2} + {'c': 3}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'dict' and 'dict'
>>>

Dictionaries have an update method that adds everything from another dictionary into it. So we can merge dictionaries like this:

>>> merged = {}
>>> merged.update({'a': 1, 'b': 2})
>>> merged.update({'c': 3})
>>> merged
{'c': 3, 'b': 2, 'a': 1}
>>>

Or we can write a function like this:

>>> def merge_dicts(dictlist):
...     result = {}
...     for dictionary in dictlist:
...         result.update(dictionary)
...     return result
...
>>> merge_dicts([{'a': 1, 'b': 2}, {'c': 3}])
{'c': 3, 'b': 2, 'a': 1}
>>>

Kind of like counting things, merging dictionaries is also a commonly needed thing and there's a class just for it in the collections module. It's called ChainMap:

>>> import collections
>>> merged = collections.ChainMap({'a': 1, 'b': 2}, {'c': 3})
>>> merged
ChainMap({'b': 2, 'a': 1}, {'c': 3})
>>>

Our merged is kind of like the Counter object we created earlier. It's not a dictionary, but it behaves like a dictionary.

>>> for key, value in merged.items():
...     print(key, value)
...
c 3
b 2
a 1
>>> dict(merged)
{'c': 3, 'b': 2, 'a': 1}
>>>

Starting with Python 3.5 it's possible to merge dictionaries like this. Don't do this unless you are sure that no-one will need to run your code on Python versions older than 3.5.

>>> first = {'a': 1, 'b': 2}
>>> second = {'c': 3, 'd': 4}
>>> {**first, **second}
{'d': 4, 'c': 3, 'a': 1, 'b': 2}
>>>

Summary

  • Duck typing means requiring some behavior instead of some type. For example, instead of making a function that takes a list we could make a function that takes anything iterable.
  • Sets and the collections module are handy. Use them.

If you have trouble with this tutorial please tell me about it and I'll make this tutorial better. If you like this tutorial, please give it a star.

You may use this tutorial freely at your own risk. See LICENSE.

Previous | Next | List of contents