Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix logic for quoting special characters #276

Merged
merged 2 commits into from
Mar 31, 2019
Merged

Conversation

perlpunk
Copy link
Member

@perlpunk perlpunk commented Mar 17, 2019

Fixes #275

Currently some strings with special characters are dumped as plain strings, on systems where sys.maxunicode <= 0xffff (typically Python 2.7 on Win/Mac) and when using allow_unicode=True. That doesn't roundtrip.

>>> import yaml
>>> string = "\tpart1\tpart2"
>>> print(yaml.dump(string, allow_unicode=True))

        part1   part2
...

@perlpunk
Copy link
Member Author

While this fixes issue #275, I'm still not sure if the logic is correct now for systems where sys.maxunicode < 0xffff.
Will add an example later.

@perlpunk
Copy link
Member Author

perlpunk commented Mar 21, 2019

# coding=utf-8
import yaml

string1 = "part1\tpart2".decode('utf8')
string2 = "ü".decode('utf8')
string3 = "part1\rpart2"
string4 = "😀".decode('utf8')
data = [string1, string2, string3, string4]
print(data)
res = yaml.safe_dump(data,
                     default_flow_style=False, explicit_start=True, canonical=False,
                     allow_unicode=True, encoding='utf-8', width=float("inf"))
res = res.decode('utf8')
print(res)
newdata = yaml.safe_load(res)
print(newdata)

Tested on system with sys.maxunicode > 0xffff

Output with my fix no. 1:

[u'part1\tpart2', u'\xfc', 'part1\rpart2', u'\U0001f600']
---
- "part1\tpart2"
- ü
- "part1\rpart2"
- "\U0001F600"

['part1\tpart2', u'\xfc', 'part1\rpart2', u'\U0001f600']

Output if I leave out has_ucs4 from the condition:

[u'part1\tpart2', u'\xfc', 'part1\rpart2', u'\U0001f600']
---
- "part1\tpart2"
- ü
- "part1\rpart2"
- 😀

['part1\tpart2', u'\xfc', 'part1\rpart2', u'\U0001f600']

I think we don't need has_ucs4 here, but I'm not a unicode expert, especially not in python. But if sys.maxunicode <= 0xffff then (u'\U00010000' <= ch < u'\U0010ffff') can't actually be true. So I'm not sure what the has_ucs4 was supposed to do in this if condition.

I also tried it out on a Mac with python 2.7.

on systems with `sys.maxunicode <= 0xffff` the comparison
(u'\U00010000' <= ch < u'\U0010ffff') can't be true anyway I think
@perlpunk
Copy link
Member Author

Pushed my second fix

@perlpunk
Copy link
Member Author

@peterkmurphy Do you remember what the purpose of has_ucs4 was in your original PR?

or ((not has_ucs4) or (u'\U00010000' <= ch < u'\U0010ffff'))) and ch != u'\uFEFF':

@peterkmurphy
Copy link
Contributor

peterkmurphy commented Mar 22, 2019 via email

@perlpunk
Copy link
Member Author

Thanks @peterkmurphy
The purpose of the PR is clear; my question was more what the has_ucs4 was supposed to do in the if statement.
The problem is, on those systems where it is true, basically the whole condition became true, for characters like tabs, for example. As a result, a string like "\tstring" was dumped as a plain string which didn't roundtrip. See #275
I fixed it by taking the has_ucs4 out and can't think of a case where it would be needed.
Here's the current statement from emitter.py:

    if (ch == u'\x85' or u'\xA0' <= ch <= u'\uD7FF'
            or u'\uE000' <= ch <= u'\uFFFD'
            or ((not has_ucs4) or (u'\U00010000' <= ch < u'\U0010ffff'))) and ch != u'\uFEFF':
        unicode_characters = True
        if not self.allow_unicode:
            special_characters = True
    else:
        special_characters = True

@peterkmurphy
Copy link
Contributor

peterkmurphy commented Mar 23, 2019 via email

@perlpunk
Copy link
Member Author

ok thanks @peterkmurphy

@perlpunk perlpunk changed the base branch from master to release/5.2 March 31, 2019 14:14
@perlpunk perlpunk merged commit 60ca52d into release/5.2 Mar 31, 2019
perlpunk added a commit that referenced this pull request Nov 18, 2019
* Fix logic for quoting special characters

* Remove has_ucs4 from condition

on systems with `sys.maxunicode <= 0xffff` the comparison
(u'\U00010000' <= ch < u'\U0010ffff') can't be true anyway I think
@perlpunk perlpunk deleted the perlpunk/fix-unicode branch December 2, 2019 22:49
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Dec 15, 2019
5.2:
* Repair incompatibilities introduced with 5.1. The default Loader was changed,
  but several methods like add_constructor still used the old default
  yaml/pyyaml#279 -- A more flexible fix for custom tag constructors
  yaml/pyyaml#287 -- Change default loader for yaml.add_constructor
  yaml/pyyaml#305 -- Change default loader for add_implicit_resolver, add_path_resolver
* Make FullLoader safer by removing python/object/apply from the default FullLoader
  yaml/pyyaml#347 -- Move constructor for object/apply to UnsafeConstructor
* Fix bug introduced in 5.1 where quoting went wrong on systems with sys.maxunicode <= 0xffff
  yaml/pyyaml#276 -- Fix logic for quoting special characters
* Other PRs:
  yaml/pyyaml#280 -- Update CHANGES for 5.1
asherf added a commit to asherf/pants that referenced this pull request Apr 28, 2020
https://github.com/yaml/pyyaml/blob/d0d660d035905d9c49fc0f8dafb579d2cc68c0c8/CHANGES#L7

5.3.1 (2020-03-18)

* yaml/pyyaml#386 -- Prevents arbitrary code execution during python/object/new constructor

5.3 (2020-01-06)

* yaml/pyyaml#290 -- Use `is` instead of equality for comparing with `None`
* yaml/pyyaml#270 -- fix typos and stylistic nit
* yaml/pyyaml#309 -- Fix up small typo
* yaml/pyyaml#161 -- Fix handling of __slots__
* yaml/pyyaml#358 -- Allow calling add_multi_constructor with None
* yaml/pyyaml#285 -- Add use of safe_load() function in README
* yaml/pyyaml#351 -- Fix reader for Unicode code points over 0xFFFF
* yaml/pyyaml#360 -- Enable certain unicode tests when maxunicode not > 0xffff
* yaml/pyyaml#359 -- Use full_load in yaml-highlight example
* yaml/pyyaml#244 -- Document that PyYAML is implemented with Cython
* yaml/pyyaml#329 -- Fix for Python 3.10
* yaml/pyyaml#310 -- increase size of index, line, and column fields
* yaml/pyyaml#260 -- remove some unused imports
* yaml/pyyaml#163 -- Create timezone-aware datetimes when parsed as such
* yaml/pyyaml#363 -- Add tests for timezone

5.2 (2019-12-02)
------------------

* Repair incompatibilities introduced with 5.1. The default Loader was changed,
  but several methods like add_constructor still used the old default
  yaml/pyyaml#279 -- A more flexible fix for custom tag constructors
  yaml/pyyaml#287 -- Change default loader for yaml.add_constructor
  yaml/pyyaml#305 -- Change default loader for add_implicit_resolver, add_path_resolver
* Make FullLoader safer by removing python/object/apply from the default FullLoader
  yaml/pyyaml#347 -- Move constructor for object/apply to UnsafeConstructor
* Fix bug introduced in 5.1 where quoting went wrong on systems with sys.maxunicode <= 0xffff
  yaml/pyyaml#276 -- Fix logic for quoting special characters
* Other PRs:
  yaml/pyyaml#280 -- Update CHANGES for 5.1
asherf added a commit to asherf/pants that referenced this pull request Apr 29, 2020
https://github.com/yaml/pyyaml/blob/d0d660d035905d9c49fc0f8dafb579d2cc68c0c8/CHANGES#L7

5.3.1 (2020-03-18)

* yaml/pyyaml#386 -- Prevents arbitrary code execution during python/object/new constructor

5.3 (2020-01-06)

* yaml/pyyaml#290 -- Use `is` instead of equality for comparing with `None`
* yaml/pyyaml#270 -- fix typos and stylistic nit
* yaml/pyyaml#309 -- Fix up small typo
* yaml/pyyaml#161 -- Fix handling of __slots__
* yaml/pyyaml#358 -- Allow calling add_multi_constructor with None
* yaml/pyyaml#285 -- Add use of safe_load() function in README
* yaml/pyyaml#351 -- Fix reader for Unicode code points over 0xFFFF
* yaml/pyyaml#360 -- Enable certain unicode tests when maxunicode not > 0xffff
* yaml/pyyaml#359 -- Use full_load in yaml-highlight example
* yaml/pyyaml#244 -- Document that PyYAML is implemented with Cython
* yaml/pyyaml#329 -- Fix for Python 3.10
* yaml/pyyaml#310 -- increase size of index, line, and column fields
* yaml/pyyaml#260 -- remove some unused imports
* yaml/pyyaml#163 -- Create timezone-aware datetimes when parsed as such
* yaml/pyyaml#363 -- Add tests for timezone

5.2 (2019-12-02)
------------------

* Repair incompatibilities introduced with 5.1. The default Loader was changed,
  but several methods like add_constructor still used the old default
  yaml/pyyaml#279 -- A more flexible fix for custom tag constructors
  yaml/pyyaml#287 -- Change default loader for yaml.add_constructor
  yaml/pyyaml#305 -- Change default loader for add_implicit_resolver, add_path_resolver
* Make FullLoader safer by removing python/object/apply from the default FullLoader
  yaml/pyyaml#347 -- Move constructor for object/apply to UnsafeConstructor
* Fix bug introduced in 5.1 where quoting went wrong on systems with sys.maxunicode <= 0xffff
  yaml/pyyaml#276 -- Fix logic for quoting special characters
* Other PRs:
  yaml/pyyaml#280 -- Update CHANGES for 5.1
Eric-Arellano pushed a commit to pantsbuild/pants that referenced this pull request May 1, 2020
https://github.com/yaml/pyyaml/blob/d0d660d035905d9c49fc0f8dafb579d2cc68c0c8/CHANGES#L7

5.3.1 (2020-03-18)

* yaml/pyyaml#386 -- Prevents arbitrary code execution during python/object/new constructor

5.3 (2020-01-06)

* yaml/pyyaml#290 -- Use `is` instead of equality for comparing with `None`
* yaml/pyyaml#270 -- fix typos and stylistic nit
* yaml/pyyaml#309 -- Fix up small typo
* yaml/pyyaml#161 -- Fix handling of __slots__
* yaml/pyyaml#358 -- Allow calling add_multi_constructor with None
* yaml/pyyaml#285 -- Add use of safe_load() function in README
* yaml/pyyaml#351 -- Fix reader for Unicode code points over 0xFFFF
* yaml/pyyaml#360 -- Enable certain unicode tests when maxunicode not > 0xffff
* yaml/pyyaml#359 -- Use full_load in yaml-highlight example
* yaml/pyyaml#244 -- Document that PyYAML is implemented with Cython
* yaml/pyyaml#329 -- Fix for Python 3.10
* yaml/pyyaml#310 -- increase size of index, line, and column fields
* yaml/pyyaml#260 -- remove some unused imports
* yaml/pyyaml#163 -- Create timezone-aware datetimes when parsed as such
* yaml/pyyaml#363 -- Add tests for timezone

5.2 (2019-12-02)
------------------

* Repair incompatibilities introduced with 5.1. The default Loader was changed,
  but several methods like add_constructor still used the old default
  yaml/pyyaml#279 -- A more flexible fix for custom tag constructors
  yaml/pyyaml#287 -- Change default loader for yaml.add_constructor
  yaml/pyyaml#305 -- Change default loader for add_implicit_resolver, add_path_resolver
* Make FullLoader safer by removing python/object/apply from the default FullLoader
  yaml/pyyaml#347 -- Move constructor for object/apply to UnsafeConstructor
* Fix bug introduced in 5.1 where quoting went wrong on systems with sys.maxunicode <= 0xffff
  yaml/pyyaml#276 -- Fix logic for quoting special characters
* Other PRs:
  yaml/pyyaml#280 -- Update CHANGES for 5.1
mtremer pushed a commit to ipfire/ipfire-2.x that referenced this pull request Feb 14, 2022
- Update from 3.13 to 6.0
- Update of rootfile
- Changelog
6.0 (2021-10-13)
* yaml/pyyaml#327 -- Change README format to Markdown
* yaml/pyyaml#483 -- Add a test for YAML 1.1 types
* yaml/pyyaml#497 -- fix float resolver to ignore `.` and `._`
* yaml/pyyaml#550 -- drop Python 2.7
* yaml/pyyaml#553 -- Fix spelling of “hexadecimal”
* yaml/pyyaml#556 -- fix representation of Enum subclasses
* yaml/pyyaml#557 -- fix libyaml extension compiler warnings
* yaml/pyyaml#560 -- fix ResourceWarning on leaked file descriptors
* yaml/pyyaml#561 -- always require `Loader` arg to `yaml.load()`
* yaml/pyyaml#564 -- remove remaining direct distutils usage
5.4.1 (2021-01-20)
* yaml/pyyaml#480 -- Fix stub compat with older pyyaml versions that may unwittingly load it
5.4 (2021-01-19)
* yaml/pyyaml#407 -- Build modernization, remove distutils, fix metadata, build wheels, CI to GHA
* yaml/pyyaml#472 -- Fix for CVE-2020-14343, moves arbitrary python tags to UnsafeLoader
* yaml/pyyaml#441 -- Fix memory leak in implicit resolver setup
* yaml/pyyaml#392 -- Fix py2 copy support for timezone objects
* yaml/pyyaml#378 -- Fix compatibility with Jython
5.3.1 (2020-03-18)
* yaml/pyyaml#386 -- Prevents arbitrary code execution during python/object/new constructor
5.3 (2020-01-06)
* yaml/pyyaml#290 -- Use `is` instead of equality for comparing with `None`
* yaml/pyyaml#270 -- Fix typos and stylistic nit
* yaml/pyyaml#309 -- Fix up small typo
* yaml/pyyaml#161 -- Fix handling of __slots__
* yaml/pyyaml#358 -- Allow calling add_multi_constructor with None
* yaml/pyyaml#285 -- Add use of safe_load() function in README
* yaml/pyyaml#351 -- Fix reader for Unicode code points over 0xFFFF
* yaml/pyyaml#360 -- Enable certain unicode tests when maxunicode not > 0xffff
* yaml/pyyaml#359 -- Use full_load in yaml-highlight example
* yaml/pyyaml#244 -- Document that PyYAML is implemented with Cython
* yaml/pyyaml#329 -- Fix for Python 3.10
* yaml/pyyaml#310 -- Increase size of index, line, and column fields
* yaml/pyyaml#260 -- Remove some unused imports
* yaml/pyyaml#163 -- Create timezone-aware datetimes when parsed as such
* yaml/pyyaml#363 -- Add tests for timezone
5.2 (2019-12-02)
* Repair incompatibilities introduced with 5.1. The default Loader was changed,
  but several methods like add_constructor still used the old default
  yaml/pyyaml#279 -- A more flexible fix for custom tag constructors
  yaml/pyyaml#287 -- Change default loader for yaml.add_constructor
  yaml/pyyaml#305 -- Change default loader for add_implicit_resolver, add_path_resolver
* Make FullLoader safer by removing python/object/apply from the default FullLoader
  yaml/pyyaml#347 -- Move constructor for object/apply to UnsafeConstructor
* Fix bug introduced in 5.1 where quoting went wrong on systems with sys.maxunicode <= 0xffff
  yaml/pyyaml#276 -- Fix logic for quoting special characters
* Other PRs:
  yaml/pyyaml#280 -- Update CHANGES for 5.1
5.1.2 (2019-07-30)
* Re-release of 5.1 with regenerated Cython sources to build properly for Python 3.8b2+
5.1.1 (2019-06-05)
* Re-release of 5.1 with regenerated Cython sources to build properly for Python 3.8b1
5.1 (2019-03-13)
* yaml/pyyaml#35 -- Some modernization of the test running
* yaml/pyyaml#42 -- Install tox in a virtualenv
* yaml/pyyaml#45 -- Allow colon in a plain scalar in a flow context
* yaml/pyyaml#48 -- Fix typos
* yaml/pyyaml#55 -- Improve RepresenterError creation
* yaml/pyyaml#59 -- Resolves #57, update readme issues link
* yaml/pyyaml#60 -- Document and test Python 3.6 support
* yaml/pyyaml#61 -- Use Travis CI built in pip cache support
* yaml/pyyaml#62 -- Remove tox workaround for Travis CI
* yaml/pyyaml#63 -- Adding support to Unicode characters over codepoint 0xffff
* yaml/pyyaml#75 -- add 3.12 changelog
* yaml/pyyaml#76 -- Fallback to Pure Python if Compilation fails
* yaml/pyyaml#84 -- Drop unsupported Python 3.3
* yaml/pyyaml#102 -- Include license file in the generated wheel package
* yaml/pyyaml#105 -- Removed Python 2.6 & 3.3 support
* yaml/pyyaml#111 -- Remove commented out Psyco code
* yaml/pyyaml#129 -- Remove call to `ord` in lib3 emitter code
* yaml/pyyaml#149 -- Test on Python 3.7-dev
* yaml/pyyaml#158 -- Support escaped slash in double quotes "\/"
* yaml/pyyaml#175 -- Updated link to pypi in release announcement
* yaml/pyyaml#181 -- Import Hashable from collections.abc
* yaml/pyyaml#194 -- Reverting yaml/pyyaml#74
* yaml/pyyaml#195 -- Build libyaml on travis
* yaml/pyyaml#196 -- Force cython when building sdist
* yaml/pyyaml#254 -- Allow to turn off sorting keys in Dumper (2)
* yaml/pyyaml#256 -- Make default_flow_style=False
* yaml/pyyaml#257 -- Deprecate yaml.load and add FullLoader and UnsafeLoader classes
* yaml/pyyaml#261 -- Skip certain unicode tests when maxunicode not > 0xffff
* yaml/pyyaml#263 -- Windows Appveyor build

Signed-off-by: Adolf Belka <adolf.belka@ipfire.org>

 --git a/config/rootfiles/packages/python3-yaml b/config/rootfiles/packages/python3-yaml
x 0870a2346..bd4009a08 100644
* yaml/pyyaml#195 -- Build libyaml on travis
* yaml/pyyaml#196 -- Force cython when building sdist
* yaml/pyyaml#254 -- Allow to turn off sorting keys in Dumper (2)
* yaml/pyyaml#256 -- Make default_flow_style=False
* yaml/pyyaml#257 -- Deprecate yaml.load and add FullLoader and Uns
oader classes
* yaml/pyyaml#261 -- Skip certain unicode tests when maxunicode not
xffff
* yaml/pyyaml#263 -- Windows Appveyor build

Signed-off-by: Adolf Belka <adolf.belka@ipfire.org>
Reviewed-by: Peter Müller <peter.mueller@ipfire.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants