Commit f476ec3
Detect source encoding to properly interpret string literals
Buffers now always hold unicode source, whereas before they could hold bytes if that's what were passed in to the constructor. This is possible because we determine the encoding and then use that to decode() the bytes. The side effect is that Buffer.__init__ could raise UnicodeDecodeError if the input is badly encoded.
The Buffer encoding is then used by the lexer to produce a strdata token of the correct type for string literals. For unicode literals, escaping happens much as before via _replace_escape(). For bytes, there's a different code path that calls encode() using the Buffer's encoding followed by a special escaping function that ensures the value's not accidentally promoted to unicode.
The parser behavior for multi-string literals (e.g. "foo" "bar") also had to change. When any of the literals are unicode, the result is unicode. When all the literals are bytes the resulting value is also bytes.1 parent cd2f7ac commit f476ec3
File tree
7 files changed
+250
-63
lines changed- pythonparser
- test
7 files changed
+250
-63
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
13 | 16 | | |
14 | 17 | | |
15 | 18 | | |
| |||
105 | 108 | | |
106 | 109 | | |
107 | 110 | | |
| 111 | + | |
108 | 112 | | |
109 | 113 | | |
110 | 114 | | |
| |||
184 | 188 | | |
185 | 189 | | |
186 | 190 | | |
187 | | - | |
| 191 | + | |
188 | 192 | | |
189 | 193 | | |
190 | 194 | | |
191 | 195 | | |
192 | 196 | | |
193 | | - | |
| 197 | + | |
| 198 | + | |
194 | 199 | | |
195 | | - | |
| 200 | + | |
196 | 201 | | |
197 | 202 | | |
198 | 203 | | |
199 | 204 | | |
200 | 205 | | |
201 | 206 | | |
202 | 207 | | |
203 | | - | |
204 | | - | |
205 | 208 | | |
206 | 209 | | |
207 | 210 | | |
| |||
419 | 422 | | |
420 | 423 | | |
421 | 424 | | |
422 | | - | |
423 | | - | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
424 | 432 | | |
425 | 433 | | |
426 | 434 | | |
427 | 435 | | |
428 | | - | |
429 | | - | |
430 | | - | |
431 | | - | |
432 | | - | |
433 | | - | |
434 | | - | |
435 | | - | |
436 | | - | |
437 | | - | |
| 436 | + | |
438 | 437 | | |
| 438 | + | |
439 | 439 | | |
440 | 440 | | |
441 | 441 | | |
442 | | - | |
| 442 | + | |
443 | 443 | | |
444 | 444 | | |
445 | 445 | | |
| |||
499 | 499 | | |
500 | 500 | | |
501 | 501 | | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
502 | 544 | | |
503 | 545 | | |
504 | 546 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
520 | 520 | | |
521 | 521 | | |
522 | 522 | | |
| 523 | + | |
| 524 | + | |
523 | 525 | | |
524 | 526 | | |
525 | 527 | | |
| |||
1522 | 1524 | | |
1523 | 1525 | | |
1524 | 1526 | | |
1525 | | - | |
| 1527 | + | |
| 1528 | + | |
| 1529 | + | |
| 1530 | + | |
1526 | 1531 | | |
1527 | 1532 | | |
1528 | 1533 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| 10 | + | |
10 | 11 | | |
11 | 12 | | |
12 | 13 | | |
| |||
18 | 19 | | |
19 | 20 | | |
20 | 21 | | |
21 | | - | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
22 | 27 | | |
23 | 28 | | |
24 | 29 | | |
| |||
65 | 70 | | |
66 | 71 | | |
67 | 72 | | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
68 | 96 | | |
69 | 97 | | |
70 | 98 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
| 4 | + | |
4 | 5 | | |
5 | 6 | | |
6 | 7 | | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
7 | 11 | | |
8 | 12 | | |
9 | 13 | | |
| |||
152 | 156 | | |
153 | 157 | | |
154 | 158 | | |
155 | | - | |
156 | | - | |
157 | | - | |
158 | | - | |
159 | | - | |
160 | | - | |
161 | | - | |
162 | | - | |
163 | | - | |
164 | | - | |
165 | | - | |
166 | | - | |
167 | | - | |
168 | | - | |
169 | | - | |
170 | | - | |
171 | | - | |
172 | | - | |
173 | | - | |
174 | | - | |
175 | | - | |
176 | | - | |
177 | | - | |
178 | | - | |
179 | | - | |
180 | | - | |
181 | | - | |
182 | | - | |
183 | | - | |
184 | | - | |
185 | | - | |
186 | | - | |
187 | | - | |
188 | | - | |
189 | | - | |
190 | | - | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
191 | 220 | | |
192 | 221 | | |
193 | 222 | | |
| |||
211 | 240 | | |
212 | 241 | | |
213 | 242 | | |
214 | | - | |
215 | | - | |
216 | | - | |
217 | | - | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
218 | 248 | | |
219 | | - | |
| 249 | + | |
220 | 250 | | |
221 | 251 | | |
222 | 252 | | |
| |||
0 commit comments