Skip to content

Latest commit

 

History

History
831 lines (653 loc) · 19.9 KB

python-list.rst

File metadata and controls

831 lines (653 loc) · 19.9 KB

List

The list is a common data structure which we use to store objects. Most of the time, programmers concern about getting, setting, searching, filtering, and sorting. Furthermore, sometimes, we waltz ourself into common pitfalls of the memory management. Thus, the main goal of this cheat sheet is to collect some common operations and pitfalls.

From Scratch

There are so many ways that we can manipulate lists in Python. Before we start to learn those versatile manipulations, the following snippet shows the most common operations of lists.

>>> a = [1, 2, 3, 4, 5]
>>> # contains
>>> 2 in a
True
>>> # positive index
>>> a[0]
1
>>> # negative index
>>> a[-1]
5
>>> # slicing list[start:end:step]
>>> a[1:]
[2, 3, 4, 5]
>>> a[1:-1]
[2, 3, 4]
>>> a[1:-1:2]
[2, 4]
>>> # reverse
>>> a[::-1]
[5, 4, 3, 2, 1]
>>> a[:0:-1]
[5, 4, 3, 2]
>>> # set an item
>>> a[0] = 0
>>> a
[0, 2, 3, 4, 5]
>>> # append items to list
>>> a.append(6)
>>> a
[0, 2, 3, 4, 5, 6]
>>> a.extend([7, 8, 9])
>>> a
[0, 2, 3, 4, 5, 6, 7, 8, 9]
>>> # delete an item
>>> del a[-1]
>>> a
[0, 2, 3, 4, 5, 6, 7, 8]
>>> # list comprehension
>>> b = [x for x in range(3)]
>>> b
[0, 1, 2]
>>> # add two lists
>>> a + b
[0, 2, 3, 4, 5, 6, 7, 8, 0, 1, 2]

Initialize

Generally speaking, we can create a list through * operator if the item in the list expression is an immutable object.

>>> a = [None] * 3
>>> a
[None, None, None]
>>> a[0] = "foo"
>>> a
['foo', None, None]

However, if the item in the list expression is a mutable object, the * operator will copy the reference of the item N times. In order to avoid this pitfall, we should use a list comprehension to initialize a list.

>>> a = [[]] * 3
>>> b = [[] for _ in range(3)]
>>> a[0].append("Hello")
>>> a
[['Hello'], ['Hello'], ['Hello']]
>>> b[0].append("Python")
>>> b
[['Python'], [], []]

Copy

Assigning a list to a variable is a common pitfall. This assignment does not copy the list to the variable. The variable only refers to the list and increase the reference count of the list.

import sys
>>> a = [1, 2, 3]
>>> sys.getrefcount(a)
2
>>> b = a
>>> sys.getrefcount(a)
3
>>> b[2] = 123456  # a[2] = 123456
>>> b
[1, 2, 123456]
>>> a
[1, 2, 123456]

There are two types of copy. The first one is called shallow copy (non-recursive copy) and the second one is called deep copy (recursive copy). Most of the time, it is sufficient for us to copy a list by shallow copy. However, if a list is nested, we have to use a deep copy.

>>> # shallow copy
>>> a = [1, 2]
>>> b = list(a)
>>> b[0] = 123
>>> a
[1, 2]
>>> b
[123, 2]
>>> a = [[1], [2]]
>>> b = list(a)
>>> b[0][0] = 123
>>> a
[[123], [2]]
>>> b
[[123], [2]]
>>> # deep copy
>>> import copy
>>> a = [[1], [2]]
>>> b = copy.deepcopy(a)
>>> b[0][0] = 123
>>> a
[[1], [2]]
>>> b
[[123], [2]]

Using slice

Sometimes, our data may concatenate as a large segment such as packets. In this case, we will represent the range of data by using slice objects as explaining variables instead of using slicing expressions.

>>> icmp = (
...     b"080062988e2100005bff49c20005767c"
...     b"08090a0b0c0d0e0f1011121314151617"
...     b"18191a1b1c1d1e1f2021222324252627"
...     b"28292a2b2c2d2e2f3031323334353637"
... )
>>> head = slice(0, 32)
>>> data = slice(32, len(icmp))
>>> icmp[head]
b'080062988e2100005bff49c20005767c'

List Comprehensions

List comprehensions which was proposed in PEP 202 provides a graceful way to create a new list based on another list, sequence, or some object which is iterable. In addition, we can use this expression to substitute map and filter sometimes.

>>> [x for x in range(10)]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> [(lambda x: x**2)(i) for i in range(10)]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
>>> [x for x in range(10) if x > 5]
[6, 7, 8, 9]
>>> [x if x > 5 else 0 for x in range(10)]
[0, 0, 0, 0, 0, 0, 6, 7, 8, 9]
>>> [x + 1 if x < 5 else x + 2 if x > 5 else x + 5 for x in range(10)]
[1, 2, 3, 4, 5, 10, 8, 9, 10, 11]
>>> [(x, y) for x in range(3) for y in range(2)]
[(0, 0), (0, 1), (1, 0), (1, 1), (2, 0), (2, 1)]

Unpacking

Sometimes, we want to unpack our list to variables in order to make our code become more readable. In this case, we assign N elements to N variables as following example.

>>> arr = [1, 2, 3]
>>> a, b, c = arr
>>> a, b, c
(1, 2, 3)

Based on PEP 3132, we can use a single asterisk to unpack N elements to the number of variables which is less than N in Python 3.

>>> arr = [1, 2, 3, 4, 5]
>>> a, b, *c, d = arr
>>> a, b, d
(1, 2, 5)
>>> c
[3, 4]

Using enumerate

enumerate is a built-in function. It helps us to acquire indexes (or a count) and elements at the same time without using range(len(list)). Further information can be found on Looping Techniques.

>>> for i, v in enumerate(range(3)):
...     print(i, v)
...
0 0
1 1
2 2
>>> for i, v in enumerate(range(3), 1): # start = 1
...     print(i, v)
...
1 0
2 1
3 2

Zip Lists

zip enables us to iterate over items contained in multiple lists at a time. Iteration stops whenever one of the lists is exhausted. As a result, the length of the iteration is the same as the shortest list. If this behavior is not desired, we can use itertools.zip_longest in Python 3 or itertools.izip_longest in Python 2.

>>> a = [1, 2, 3]
>>> b = [4, 5, 6]
>>> list(zip(a, b))
[(1, 4), (2, 5), (3, 6)]
>>> c = [1]
>>> list(zip(a, b, c))
[(1, 4, 1)]
>>> from itertools import zip_longest
>>> list(zip_longest(a, b, c))
[(1, 4, 1), (2, 5, None), (3, 6, None)]

Filter Items

filter is a built-in function to assist us to remove unnecessary items. In Python 2, filter returns a list. However, in Python 3, filter returns an iterable object. Note that list comprehension or generator expression provides a more concise way to remove items.

>>> [x for x in range(5) if x > 1]
[2, 3, 4]
>>> l = ['1', '2', 3, 'Hello', 4]
>>> f = lambda x: isinstance(x, int)
>>> filter(f, l)
<filter object at 0x10bee2198>
>>> list(filter(f, l))
[3, 4]
>>> list((i for i in l if f(i)))
[3, 4]

Stacks

There is no need for an additional data structure, stack, in Python because the list provides append and pop methods which enable us use a list as a stack.

>>> stack = []
>>> stack.append(1)
>>> stack.append(2)
>>> stack.append(3)
>>> stack
[1, 2, 3]
>>> stack.pop()
3
>>> stack.pop()
2
>>> stack
[1]

in Operation

We can implement the __contains__ method to make a class do in operations. It is a common way for a programmer to emulate a membership test operations for custom classes.

class Stack:

    def __init__(self):
        self.__list = []

    def push(self, val):
        self.__list.append(val)

    def pop(self):
        return self.__list.pop()

    def __contains__(self, item):
        return True if item in self.__list else False

stack = Stack()
stack.push(1)
print(1 in stack)
print(0 in stack)

Example

python stack.py
True
False

Accessing Items

Making custom classes perform get and set operations like lists is simple. We can implement a __getitem__ method and a __setitem__ method to enable a class to retrieve and overwrite data by index. In addition, if we want to use the function, len, to calculate the number of elements, we can implement a __len__ method.

class Stack:

    def __init__(self):
        self.__list = []

    def push(self, val):
        self.__list.append(val)

    def pop(self):
        return self.__list.pop()

    def __repr__(self):
        return "{}".format(self.__list)

    def __len__(self):
        return len(self.__list)

    def __getitem__(self, idx):
        return self.__list[idx]

    def __setitem__(self, idx, val):
        self.__list[idx] = val


stack = Stack()
stack.push(1)
stack.push(2)
print("stack:", stack)

stack[0] = 3
print("stack:", stack)
print("num items:", len(stack))

Example

$ python stack.py
stack: [1, 2]
stack: [3, 2]
num items: 2

Delegating Iterations

If a custom container class holds a list and we want iterations to work on the container, we can implement a __iter__ method to delegate iterations to the list. Note that the method, __iter__, should return an iterator object, so we cannot return the list directly; otherwise, Python raises a TypeError.

class Stack:

    def __init__(self):
        self.__list = []

    def push(self, val):
        self.__list.append(val)

    def pop(self):
        return self.__list.pop()

    def __iter__(self):
        return iter(self.__list)

stack = Stack()
stack.push(1)
stack.push(2)
for s in stack:
    print(s)

Example

$ python stack.py
1
2

Sorting

Python list provides a built-in list.sort method which sorts a list in-place without using extra memory. Moreover, the return value of list.sort is None in order to avoid confusion with sorted and the function can only be used for list.

>>> l = [5, 4, 3, 2, 1]
>>> l.sort()
>>> l
[1, 2, 3, 4, 5]
>>> l.sort(reverse=True)
>>> l
[5, 4, 3, 2, 1]

The sorted function does not modify any iterable object in-place. Instead, it returns a new sorted list. Using sorted is safer than list.sort if some list's elements are read-only or immutable. Besides, another difference between list.sort and sorted is that sorted accepts any iterable object.

>>> l = [5, 4, 3, 2, 1]
>>> new = sorted(l)
>>> new
[1, 2, 3, 4, 5]
>>> l
[5, 4, 3, 2, 1]
>>> d = {3: 'andy', 2: 'david', 1: 'amy'}
>>> sorted(d)  # sort iterable
[1, 2, 3]

To sort a list with its elements are tuples, using operator.itemgetter is helpful because it assigns a key function to the sorted key parameter. Note that the key should be comparable; otherwise, it will raise a TypeError.

>>> from operator import itemgetter
>>> l = [('andy', 10), ('david', 8), ('amy', 3)]
>>> l.sort(key=itemgetter(1))
>>> l
[('amy', 3), ('david', 8), ('andy', 10)]

operator.itemgetter is useful because the function returns a getter method which can be applied to other objects with a method __getitem__. For example, sorting a list with its elements are dictionary can be achieved by using operator.itemgetter due to all elements have __getitem__.

>>> from pprint import pprint
>>> from operator import itemgetter
>>> l = [
...     {'name': 'andy', 'age': 10},
...     {'name': 'david', 'age': 8},
...     {'name': 'amy', 'age': 3},
... ]
>>> l.sort(key=itemgetter('age'))
>>> pprint(l)
[{'age': 3, 'name': 'amy'},
 {'age': 8, 'name': 'david'},
 {'age': 10, 'name': 'andy'}]

If it is necessary to sort a list with its elements are neither comparable nor having __getitem__ method, assigning a customized key function is feasible.

>>> class Node(object):
...     def __init__(self, val):
...         self.val = val
...     def __repr__(self):
...         return f"Node({self.val})"
...
>>> nodes = [Node(3), Node(2), Node(1)]
>>> nodes.sort(key=lambda x: x.val)
>>> nodes
[Node(1), Node(2), Node(3)]
>>> nodes.sort(key=lambda x: x.val, reverse=True)
>>> nodes
[Node(3), Node(2), Node(1)]

The above snippet can be simplified by using operator.attrgetter. The function returns an attribute getter based on the attribute's name. Note that the attribute should be comparable; otherwise, sorted or list.sort will raise TypeError.

>>> from operator import attrgetter
>>> class Node(object):
...     def __init__(self, val):
...         self.val = val
...     def __repr__(self):
...         return f"Node({self.val})"
...
>>> nodes = [Node(3), Node(2), Node(1)]
>>> nodes.sort(key=attrgetter('val'))
>>> nodes
[Node(1), Node(2), Node(3)]

If an object has __lt__ method, it means that the object is comparable and sorted or list.sort is not necessary to input a key function to its key parameter. A list or an iterable sequence can be sorted directly.

>>> class Node(object):
...     def __init__(self, val):
...         self.val = val
...     def __repr__(self):
...         return f"Node({self.val})"
...     def __lt__(self, other):
...         return self.val - other.val < 0
...
>>> nodes = [Node(3), Node(2), Node(1)]
>>> nodes.sort()
>>> nodes
[Node(1), Node(2), Node(3)]

If an object does not have __lt__ method, it is likely to patch the method after a declaration of the object's class. In other words, after the patching, the object becomes comparable.

>>> class Node(object):
...     def __init__(self, val):
...         self.val = val
...     def __repr__(self):
...         return f"Node({self.val})"
...
>>> Node.__lt__ = lambda s, o: s.val < o.val
>>> nodes = [Node(3), Node(2), Node(1)]
>>> nodes.sort()
>>> nodes
[Node(1), Node(2), Node(3)]

Note that sorted or list.sort in Python3 does not support cmp parameter which is an ONLY valid argument in Python2. If it is necessary to use an old comparison function, e.g., some legacy code, functools.cmp_to_key is useful since it converts a comparison function to a key function.

>>> from functools import cmp_to_key
>>> class Node(object):
...     def __init__(self, val):
...         self.val = val
...     def __repr__(self):
...         return f"Node({self.val})"
...
>>> nodes = [Node(3), Node(2), Node(1)]
>>> nodes.sort(key=cmp_to_key(lambda x,y: x.val - y.val))
>>> nodes
[Node(1), Node(2), Node(3)]

Sorted List

import bisect

class Foo(object):
    def __init__(self, k):
        self.k = k

    def __eq__(self, rhs):
        return self.k == rhs.k

    def __ne__(self, rhs):
        return self.k != rhs.k

    def __lt__(self, rhs):
        return self.k < rhs.k

    def __gt__(self, rhs):
        return self.k > rhs.k

    def __le__(self, rhs):
        return self.k <= rhs.k

    def __ge__(self, rhs):
        return self.k >= rhs.k

    def __repr__(self):
        return f"Foo({self.k})"

    def __str__(self):
        return self.__repr__()

foo = [Foo(1), Foo(3), Foo(2), Foo(0)]
bar = []
for x in foo:
    bisect.insort(bar, x)

print(bar) # [Foo(0), Foo(1), Foo(2), Foo(3)]

New a List

# new a list with size = 3

>>> [0] * 3
[0, 0, 0]

# new a 2d list with size 3x3

>>> [[0] * 3 for _ in range(3)]
[[0, 0, 0], [0, 0, 0], [0, 0, 0]]

Note that we should avoid creating a multi-dimension list via the following snippet because all objects in the list point to the same address.

>>> a = [[0] * 3] * 3
>>> a
[[0, 0, 0], [0, 0, 0], [0, 0, 0]]
>>> a[1][1] = 2
>>> a
[[0, 2, 0], [0, 2, 0], [0, 2, 0]]

Circular Buffer

>>> from collections import deque
>>> d = deque(maxlen=8)
>>> for x in range(9):
...     d.append(x)
...
>>> d
deque([1, 2, 3, 4, 5, 6, 7, 8], maxlen=8)
>>> from collections import deque
>>> def tail(path, n=10):
...     with open(path) as f:
...         return deque(f, n)
...
>>> tail("/etc/hosts")

Chunk

>>> def chunk(lst, n):
...     for i in range(0, len(lst), n):
...         yield lst[i:i+n]
...
>>> a = [1, 2, 3, 4, 5, 6, 7, 8]
>>> list(chunk(a, 3))
[[1, 2, 3], [4, 5, 6], [7, 8]]

Groupby

>>> import itertools
>>> s = "AAABBCCCCC"
>>> for k, v in itertools.groupby(s):
...     print(k, list(v))
...
A ['A', 'A', 'A']
B ['B', 'B']
C ['C', 'C', 'C', 'C', 'C']

# group by key

>>> x = [('gp1', 'a'), ('gp2', 'b'), ('gp2', 'c')]
>>> for k, v in itertools.groupby(x, lambda x: x[0]):
...     print(k, list(v))
...
gp1 [('gp1', 'a')]
gp2 [('gp2', 'b'), ('gp2', 'c')]

Binary Search

>>> def binary_search(arr, x, lo=0, hi=None):
...     if not hi: hi = len(arr)
...     pos = bisect_left(arr, x, lo, hi)
...     return pos if pos != hi and arr[pos] == x else -1
...
>>> a = [1, 1, 1, 2, 3]
>>> binary_search(a, 1)
0
>>> binary_search(a, 2)
3

Lower Bound

>>> import bisect
>>> a = [1,2,3,3,4,5]
>>> bisect.bisect_left(a, 3)
2
>>> bisect.bisect_left(a, 3.5)
4

Upper Bound

>>> import bisect
>>> a = [1,2,3,3,4,5]
>>> bisect.bisect_right(a, 3)
4
>>> bisect.bisect_right(a, 3.5)
4

Lexicographically Order

# python compare lists lexicographically

>>> a = [(1,2), (1,1), (1,0), (2,1)]
>>> a.sort()
>>> a
[(1, 0), (1, 1), (1, 2), (2, 1)]

Trie

>>> from functools import reduce
>>> from collections import defaultdict
>>> Trie = lambda: defaultdict(Trie)
>>> prefixes = ['abc', 'de', 'g']
>>> trie = Trie()
>>> end = True
>>> for p in prefixes:
...     reduce(dict.__getitem__, p, trie)[end] = p
...

# search prefix

>>> def find(trie, word):
...     curr = trie
...     for c in word:
...         if c not in curr:
...             return False
...         curr = curr[c]
...     return True
...
>>> find(trie, "abcdef")
False
>>> find(trie, "abc")
True
>>> find(trie, "ab")
True

# search word

>>> def find(trie, p):
...     curr = trie
...     for c in p:
...         if c not in curr or True in curr:
...             break
...         curr = curr[c]
...     return True if True in curr else False
...
>>> find(trie, "abcdef")
True
>>> find(trie, "abc")
True
>>> find(trie, "ab")
False