-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
/
pep-0201.txt
299 lines (221 loc) · 9.31 KB
/
pep-0201.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
PEP: 201
Title: Lockstep Iteration
Version: $Revision$
Last-Modified: $Date$
Author: barry@python.org (Barry Warsaw)
Status: Final
Type: Standards Track
Content-Type: text/x-rst
Created: 13-Jul-2000
Python-Version: 2.0
Post-History: 27-Jul-2000
Introduction
============
This PEP describes the 'lockstep iteration' proposal. This PEP tracks
the status and ownership of this feature, slated for introduction in
Python 2.0. It contains a description of the feature and outlines
changes necessary to support the feature. This PEP summarizes
discussions held in mailing list forums, and provides URLs for further
information, where appropriate. The CVS revision history of this file
contains the definitive historical record.
Motivation
==========
Standard for-loops in Python iterate over every element in a sequence
until the sequence is exhausted [1]_. However, for-loops iterate over
only a single sequence, and it is often desirable to loop over more
than one sequence in a lock-step fashion. In other words, in a way
such that the i-th iteration through the loop returns an object
containing the i-th element from each sequence.
The common idioms used to accomplish this are unintuitive. This PEP
proposes a standard way of performing such iterations by introducing a
new builtin function called ``zip``.
While the primary motivation for zip() comes from lock-step iteration,
by implementing zip() as a built-in function, it has additional
utility in contexts other than for-loops.
Lockstep For-Loops
==================
Lockstep for-loops are non-nested iterations over two or more
sequences, such that at each pass through the loop, one element from
each sequence is taken to compose the target. This behavior can
already be accomplished in Python through the use of the map() built-
in function::
>>> a = (1, 2, 3)
>>> b = (4, 5, 6)
>>> for i in map(None, a, b): print i
...
(1, 4)
(2, 5)
(3, 6)
>>> map(None, a, b)
[(1, 4), (2, 5), (3, 6)]
The for-loop simply iterates over this list as normal.
While the map() idiom is a common one in Python, it has several
disadvantages:
* It is non-obvious to programmers without a functional programming
background.
* The use of the magic ``None`` first argument is non-obvious.
* It has arbitrary, often unintended, and inflexible semantics when
the lists are not of the same length: the shorter sequences are
padded with ``None``::
>>> c = (4, 5, 6, 7)
>>> map(None, a, c)
[(1, 4), (2, 5), (3, 6), (None, 7)]
For these reasons, several proposals were floated in the Python 2.0
beta time frame for syntactic support of lockstep for-loops. Here are
two suggestions::
for x in seq1, y in seq2:
# stuff
::
for x, y in seq1, seq2:
# stuff
Neither of these forms would work, since they both already mean
something in Python and changing the meanings would break existing
code. All other suggestions for new syntax suffered the same problem,
or were in conflict with other another proposed feature called 'list
comprehensions' (see PEP 202).
The Proposed Solution
=====================
The proposed solution is to introduce a new built-in sequence
generator function, available in the ``__builtin__`` module. This
function is to be called ``zip`` and has the following signature::
zip(seqa, [seqb, [...]])
``zip()`` takes one or more sequences and weaves their elements
together, just as ``map(None, ...)`` does with sequences of equal
length. The weaving stops when the shortest sequence is exhausted.
Return Value
============
``zip()`` returns a real Python list, the same way ``map()`` does.
Examples
========
Here are some examples, based on the reference implementation below::
>>> a = (1, 2, 3, 4)
>>> b = (5, 6, 7, 8)
>>> c = (9, 10, 11)
>>> d = (12, 13)
>>> zip(a, b)
[(1, 5), (2, 6), (3, 7), (4, 8)]
>>> zip(a, d)
[(1, 12), (2, 13)]
>>> zip(a, b, c, d)
[(1, 5, 9, 12), (2, 6, 10, 13)]
Note that when the sequences are of the same length, ``zip()`` is
reversible::
>>> a = (1, 2, 3)
>>> b = (4, 5, 6)
>>> x = zip(a, b)
>>> y = zip(*x) # alternatively, apply(zip, x)
>>> z = zip(*y) # alternatively, apply(zip, y)
>>> x
[(1, 4), (2, 5), (3, 6)]
>>> y
[(1, 2, 3), (4, 5, 6)]
>>> z
[(1, 4), (2, 5), (3, 6)]
>>> x == z
1
It is not possible to reverse zip this way when the sequences are not
all the same length.
Reference Implementation
========================
Here is a reference implementation, in Python of the zip() built-in
function. This will be replaced with a C implementation after final
approval::
def zip(*args):
if not args:
raise TypeError('zip() expects one or more sequence arguments')
ret = []
i = 0
try:
while 1:
item = []
for s in args:
item.append(s[i])
ret.append(tuple(item))
i = i + 1
except IndexError:
return ret
BDFL Pronouncements
===================
Note: the BDFL refers to Guido van Rossum, Python's Benevolent
Dictator For Life.
* The function's name. An earlier version of this PEP included an
open issue listing 20+ proposed alternative names to ``zip()``. In
the face of no overwhelmingly better choice, the BDFL strongly
prefers ``zip()`` due to its Haskell [2]_ heritage. See version 1.7
of this PEP for the list of alternatives.
* ``zip()`` shall be a built-in function.
* Optional padding. An earlier version of this PEP proposed an
optional ``pad`` keyword argument, which would be used when the
argument sequences were not the same length. This is similar
behavior to the ``map(None, ...)`` semantics except that the user
would be able to specify pad object. This has been rejected by the
BDFL in favor of always truncating to the shortest sequence, because
of the KISS principle. If there's a true need, it is easier to add
later. If it is not needed, it would still be impossible to delete
it in the future.
* Lazy evaluation. An earlier version of this PEP proposed that
``zip()`` return a built-in object that performed lazy evaluation
using ``__getitem__()`` protocol. This has been strongly rejected
by the BDFL in favor of returning a real Python list. If lazy
evaluation is desired in the future, the BDFL suggests an ``xzip()``
function be added.
* ``zip()`` with no arguments. the BDFL strongly prefers this raise a
TypeError exception.
* ``zip()`` with one argument. the BDFL strongly prefers that this
return a list of 1-tuples.
* Inner and outer container control. An earlier version of this PEP
contains a rather lengthy discussion on a feature that some people
wanted, namely the ability to control what the inner and outer
container types were (they are tuples and list respectively in this
version of the PEP). Given the simplified API and implementation,
this elaboration is rejected. For a more detailed analysis, see
version 1.7 of this PEP.
Subsequent Change to ``zip()``
==============================
In Python 2.4, zip() with no arguments was modified to return an empty
list rather than raising a TypeError exception. The rationale for the
original behavior was that the absence of arguments was thought to
indicate a programming error. However, that thinking did not
anticipate the use of zip() with the ``*`` operator for unpacking
variable length argument lists. For example, the inverse of zip could
be defined as: ``unzip = lambda s: zip(*s)``. That transformation
also defines a matrix transpose or an equivalent row/column swap for
tables defined as lists of tuples. The latter transformation is
commonly used when reading data files with records as rows and fields
as columns. For example, the code::
date, rain, high, low = zip(*csv.reader(file("weather.csv")))
rearranges columnar data so that each field is collected into
individual tuples for straightforward looping and summarization::
print "Total rainfall", sum(rain)
Using ``zip(*args)`` is more easily coded if ``zip(*[])`` is handled
as an allowable case rather than an exception. This is especially
helpful when data is either built up from or recursed down to a null
case with no records.
Seeing this possibility, the BDFL agreed (with some misgivings) to
have the behavior changed for Py2.4.
Other Changes
=============
* The ``xzip()`` function discussed above was implemented in Py2.3 in
the ``itertools`` module as ``itertools.izip()``. This function
provides lazy behavior, consuming single elements and producing a
single tuple on each pass. The "just-in-time" style saves memory
and runs faster than its list based counterpart, ``zip()``.
* The ``itertools`` module also added ``itertools.repeat()`` and
``itertools.chain()``. These tools can be used together to pad
sequences with ``None`` (to match the behavior of ``map(None,
seqn)``)::
zip(firstseq, chain(secondseq, repeat(None)))
References
==========
.. [1] http://docs.python.org/reference/compound_stmts.html#for
.. [2] http://www.haskell.org/onlinereport/standard-prelude.html#$vzip
Greg Wilson's questionaire on proposed syntax to some CS grad students
http://www.python.org/pipermail/python-dev/2000-July/013139.html
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
End: