bpo-37760: Avoid cluttering work tree with downloaded Unicode files. #15128

gnprice · 2019-08-05T04:00:49Z

https://bugs.python.org/issue37760

gnprice · 2019-08-09T00:15:31Z

@benjaminp , might I persuade you to take a look at this patch? The Git history suggests you might run this script as much as anybody, so you may have a direct interest in making that more convenient 🙂

Then #15171 makes the resulting diffs a bit nicer, too.

benjaminp

I actually don't mind the git status clutter because then I remember to clean the data files out when I'm done. If there are partial downloads, git clean -f will put you back in a working state.

This script does trigger my distaste for build tools that write to the source tree. Of course, doing that is its raison d'être, so I can hardly ask that it stop doing that completely.

benjaminp · 2019-08-13T04:46:41Z

Tools/unicode/makeunicodedata.py

        else:
            url = ('http://www.unicode.org/Public/%s/ucd/'+template) % (version, '')
+        if not os.path.exists(DATA_DIR):
+            os.mkdir(DATA_DIR)


os.makedirs has a nice exist_ok parameter now.

Oh neat! I have written little helpers with that behavior more times than I would like. :-) Update pushed.

gnprice · 2019-08-13T07:32:49Z

Thanks for the review (and the merges!)

I actually don't mind the git status clutter because then I remember to clean the data files out when I'm done. If there are partial downloads, git clean -f will put you back in a working state.

Ah, fair enough. I guess that means the main situation where the clutter is annoying might be when doing development work on this script... as I've been doing recently. 😉 That means rerunning the script a lot, and it's much faster when it doesn't have to download the files fresh. So I want to keep them around, and I'd like that not to become clutter.

To keep good support for your use case too, where you want the data to disappear between the occasional times you run the script, perhaps I'll add it to make clean? That way it gets handled the same as the many other random build products.

This script does trigger my distaste for build tools that write to the source tree. Of course, doing that is its raison d'être, so I can hardly ask that it stop doing that completely.

Yeah, I'd love for the whole build to go into some tidy set-aside place. I'd be glad to adjust this to follow such a thing, or to go to an existing somewhat-better place if you can suggest one.

benjaminp · 2019-08-14T02:40:28Z

Thanks for the review (and the merges!)

I actually don't mind the git status clutter because then I remember to clean the data files out when I'm done. If there are partial downloads, git clean -f will put you back in a working state.

Ah, fair enough. I guess that means the main situation where the clutter is annoying might be when doing development work on this script... as I've been doing recently. wink That means rerunning the script a lot, and it's much faster when it doesn't have to download the files fresh. So I want to keep them around, and I'd like that not to become clutter.

To keep good support for your use case too, where you want the data to disappear between the occasional times you run the script, perhaps I'll add it to make clean? That way it gets handled the same as the many other random build products.

The problem with the make clean change is that it breaks when doing out-of-tree builds.

I don't feel strongly about this, so I could merge the original PR without the Makefile change.

This script does trigger my distaste for build tools that write to the source tree. Of course, doing that is its raison d'être, so I can hardly ask that it stop doing that completely.

Yeah, I'd love for the whole build to go into some tidy set-aside place. I'd be glad to adjust this to follow such a thing, or to go to an existing somewhat-better place if you can suggest one.

The only other thing I can think of is adding a script flag to control the cache location.

This reverts commit 7268d81.

gnprice · 2019-08-14T04:28:38Z

The problem with the make clean change is that it breaks when doing out-of-tree builds.

Ah, out-of-tree builds -- that does sound nice 🙂 and I've been meaning to go try those.

I don't feel strongly about this, so I could merge the original PR without the Makefile change.

Cool -- I pushed a revert.

The only other thing I can think of is adding a script flag to control the cache location.

Sure, that'd be reasonable. Alternatively, perhaps it could notice if you've set up an out-of-tree build (that's done at configure time, right?) and place its output automatically in an appropriate spot in the build tree? ... Or maybe the right way to implement that is a script flag like you described, and then a make rule that passes that flag.

I'll probably have a clearer sense of the right design there later, after I've played with out-of-tree builds a bit.

benjaminp · 2019-08-15T01:20:25Z

The only other thing I can think of is adding a script flag to control the cache location.

Sure, that'd be reasonable. Alternatively, perhaps it could notice if you've set up an out-of-tree build (that's done at configure time, right?) and place its output automatically in an appropriate spot in the build tree? ... Or maybe the right way to implement that is a script flag like you described, and then a make rule that passes that flag.

Yeah, I think everything else that knows the builddir finds it out by parsing the Makefile (!) or being invoked by it.

gnprice · 2019-08-15T03:01:12Z

by parsing the Makefile (!)

Yikes! I can imagine situations where that's locally the least-bad option -- but yeah, here I'd opt for having the Makefile invoke it.

gnprice · 2019-08-15T03:01:32Z

(Thanks for the review and the merge!)

…ythonGH-15128)

…(GH-15128) python/cpython#15128 commit 3e4498d35c34aeaf4a9c3d57509b0d3277048ac6 Author: Greg Price <gnprice@gmail.com> Date: Wed Aug 14 18:18:53 2019 -0700 bpo-37760: Avoid cluttering work tree with downloaded Unicode files. (GH-15128)

* Clean up and reduce visual clutter in the makeunicodedata scripts python/cpython#7558 commit faa2948654d15a859bc4317e00730ff213295764 Author: Stefan Behnel <stefan_ml@behnel.de> Date: Sat Jun 1 21:49:03 2019 +0200 Clean up and reduce visual clutter in the makeunicode.py script. (GH-7558) * bpo-37760: Factor out the basic UCD parsing logic of makeunicodedata. (GH-15130) python/cpython#15130 commit ef2af1ad44be0542a47270d5173a0b920c3a450d Author: Greg Price <gnprice@gmail.com> Date: Mon Aug 12 22:20:56 2019 -0700 bpo-37760: Factor out the basic UCD parsing logic of makeunicodedata. (GH-15130) There were 10 copies of this, and almost as many distinct versions of exactly how it was written. They're all implementing the same standard. Pull them out to the top, so the more interesting logic that remains becomes easier to read. ~~~ I removed the type hints from UcdFile class to apply the same patch to both python 2 and 3 * bpo-37760: Constant-fold some old options in makeunicodedata. (GH-15129) python/cpython#15129 commit 99d208efed97e02d813e8166925b998bbd0d3993 (HEAD) Author: Greg Price <gnprice@gmail.com> Date: Mon Aug 12 22:59:30 2019 -0700 bpo-37760: Constant-fold some old options in makeunicodedata. (GH-15129) The `expand` option was introduced in 2000 in commit fad27aee1. It appears to have been always set since it was committed, and what it does is tell the code to do something essential. So, just always do that, and cut the option. Also cut the `linebreakprops` option, which isn't consulted anymore. * bpo-37760: Factor out standard range-expanding logic in makeunicodedata. (GH-15248) python/cpython#15248 commit c03e698c344dfc557555b6b07a3ee2702e45f6ee (HEAD) Author: Greg Price <gnprice@gmail.com> Date: Tue Aug 13 19:28:38 2019 -0700 bpo-37760: Factor out standard range-expanding logic in makeunicodedata. (GH-15248) Much like the lower-level logic in commit ef2af1ad4, we had 4 copies of this logic, written in a couple of different ways. They're all implementing the same standard, so write it just once. * bpo-37760: Avoid cluttering work tree with downloaded Unicode files. (GH-15128) python/cpython#15128 commit 3e4498d35c34aeaf4a9c3d57509b0d3277048ac6 Author: Greg Price <gnprice@gmail.com> Date: Wed Aug 14 18:18:53 2019 -0700 bpo-37760: Avoid cluttering work tree with downloaded Unicode files. (GH-15128) * Convert from length-18 lists to namedtuple, in makeunicodedata. (GH-15265) Adapted from: python/cpython#15265 commit a65678c5c90002c5e40fa82746de07e6217df625 Author: Greg Price <gnprice@gmail.com> Date: Thu Sep 12 02:23:43 2019 -0700 bpo-37760: Convert from length-18 lists to a dataclass, in makeunicodedata. (GH-15265) Now the fields have names! Much easier to keep straight as a reader than the elements of an 18-tuple. Runs about 10-15% slower: from 10.8s to 12.3s, on my laptop. Fortunately that's perfectly fine for this maintenance script. ~~~ The original patch uses dataclasses but I use namedtuple here so that it works on both python 2 and 3. * closes bpo-39926: Update Unicode to 13.0.0. (GH-18910) Fixes #34 Adapted from: python/cpython#18910 commit 051b9d08d1e6a8b1022a2bd9166be51c0b152698 Author: Benjamin Peterson <benjamin@python.org> Date: Tue Mar 10 20:41:34 2020 -0700 closes bpo-39926: Update Unicode to 13.0.0. (GH-18910) * Update some www.unicode.org URLs to use HTTPS. (GH-18912) Adapted from: python/cpython#18912 commit 51796e5d2632e6ada81ca677b4153f4ccd490702 Author: Benjamin Peterson <benjamin@python.org> Date: Tue Mar 10 21:10:59 2020 -0700 Update some www.unicode.org URLs to use HTTPS. (GH-18912) * Update checksum test for Unicode 13; extend test to all of Unicode This commit combines the following two upstream patches: python/cpython#18913 commit c77aa2d60b420747886f4258cf159bdbb7354100 Author: Benjamin Peterson <benjamin@python.org> Date: Tue Mar 10 21:18:33 2020 -0700 bpo-39926: Update unicodedata checksum tests for Unicode 13.0 update. (GH-18913) I forget these tests required the cpu resource. python/cpython#15125 commit 6954be815a16fad11d1d66be576865bbbeb2b97d Author: Greg Price <gnprice@gmail.com> Date: Thu Sep 12 02:25:25 2019 -0700 closes bpo-37758: Extend unicodedata checksum tests to cover all of Unicode. (GH-15125) Unicode has grown since Python first gained support for it, when Unicode itself was still rather new. This pair of test cases was added in commit 6a20ee7de back in 2000, and they haven't needed to change much since then. But do change them to look beyond the Basic Multilingual Plane (range(0x10000)) and cover all 17 planes of Unicode's final form. This adds about 5 seconds to the test suite's runtime. Mark the tests as CPU-using accordingly. * test_unicodedata2: add unichr for 'narrow' python builds * Update multibuild to latest 'devel' branch * Build and run tests on Python 3.8 * .travis.yml: remove implicit job or else is rejected with "Build config did not create any jobs" travis-ci/travis-ci#8536 * test_unicodedata2: do not import test.support.requires_resource import fails for some reason on some older 2.7 versions, see https://travis-ci.org/github/mikekap/unicodedata2/jobs/663493029 It should not make any difference without this.

…ythonGH-15128)

bpo-37760: Avoid cluttering work tree with downloaded Unicode files.

4f9a51d

the-knights-who-say-ni added the CLA signed label Aug 5, 2019

bedevere-bot added the awaiting review label Aug 5, 2019

benjaminp reviewed Aug 13, 2019

View reviewed changes

Use spiffy os.makedirs(..., exist_ok=True).

f8d11c6

Add to make clean.

7268d81

Revert "Add to make clean."

de76a66

This reverts commit 7268d81.

benjaminp added the skip news label Aug 15, 2019

benjaminp merged commit 3e4498d into python:master Aug 15, 2019

bedevere-bot removed the awaiting review label Aug 15, 2019

gnprice deleted the pr-makeud-gitignore branch August 15, 2019 02:59

lisroach pushed a commit to lisroach/cpython that referenced this pull request Sep 10, 2019

bpo-37760: Avoid cluttering work tree with downloaded Unicode files. (p…

0540e82

…ythonGH-15128)

DinoV pushed a commit to DinoV/cpython that referenced this pull request Jan 14, 2020

bpo-37760: Avoid cluttering work tree with downloaded Unicode files. (p…

88dc9fd

…ythonGH-15128)

websurfer5 pushed a commit to websurfer5/cpython that referenced this pull request Jul 20, 2020

bpo-37760: Avoid cluttering work tree with downloaded Unicode files. (p…

995fbd9

…ythonGH-15128)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

bpo-37760: Avoid cluttering work tree with downloaded Unicode files. #15128

bpo-37760: Avoid cluttering work tree with downloaded Unicode files. #15128

Uh oh!

gnprice commented Aug 5, 2019 •

edited by bedevere-bot

Loading

Uh oh!

gnprice commented Aug 9, 2019

Uh oh!

benjaminp left a comment

Uh oh!

benjaminp Aug 13, 2019

Uh oh!

gnprice Aug 13, 2019

Uh oh!

gnprice commented Aug 13, 2019

Uh oh!

benjaminp commented Aug 14, 2019

Uh oh!

gnprice commented Aug 14, 2019

Uh oh!

benjaminp commented Aug 15, 2019

Uh oh!

gnprice commented Aug 15, 2019

Uh oh!

gnprice commented Aug 15, 2019

Uh oh!

Uh oh!

Uh oh!

bpo-37760: Avoid cluttering work tree with downloaded Unicode files. #15128

bpo-37760: Avoid cluttering work tree with downloaded Unicode files. #15128

Uh oh!

Conversation

gnprice commented Aug 5, 2019 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gnprice commented Aug 9, 2019

Uh oh!

benjaminp left a comment

Choose a reason for hiding this comment

Uh oh!

benjaminp Aug 13, 2019

Choose a reason for hiding this comment

Uh oh!

gnprice Aug 13, 2019

Choose a reason for hiding this comment

Uh oh!

gnprice commented Aug 13, 2019

Uh oh!

benjaminp commented Aug 14, 2019

Uh oh!

gnprice commented Aug 14, 2019

Uh oh!

benjaminp commented Aug 15, 2019

Uh oh!

gnprice commented Aug 15, 2019

Uh oh!

gnprice commented Aug 15, 2019

Uh oh!

Uh oh!

gnprice commented Aug 5, 2019 •

edited by bedevere-bot

Loading