-
Notifications
You must be signed in to change notification settings - Fork 62
Do not distribute any data files #199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5771238 to
3767634
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #199 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 14 14
Lines 885 885
Branches 225 225
=========================================
Hits 885 885 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
8590c51 to
6472455
Compare
- lists of emojis may require licenses in your jurisdiction. - please check with local laws. - do not distribute unicode data files. - do not distribute universal declaration of human rights. - delete them from git. - delete from sdist. Closes #198
6472455 to
8fb879e
Compare
Merging this PR will degrade performance by 99.97%
Performance Changes
Comparing Footnotes
|
|
the benchmarks were modified to do the full UDHR text corpus and so are slower by design, it is the new base, no actual performance impact |
| - name: Prepare sdist and source-dir | ||
| - name: Build wheel | ||
| shell: bash | ||
| run: | | ||
| python -Im pip install build | ||
| python -Im build | ||
| python -Im build --wheel | ||
|
|
||
| mkdir source-dir | ||
| tar -xzvf dist/wcwidth-*.tar.gz -C source-dir --strip-components=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well to me this is absurd you have to deal with licensing simply to run the tests
The Consortium’s software and data files are generally licensed under the OSI-approved Unicode License v3, a free, open source, highly permissive license based on the MIT License. The primary difference between the MIT License and the Unicode License is that the Unicode License expressly covers data and data files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will make a PR to change CI to use sdist again if you approve, I chose bdist because after removing data files, sdist and bdist packages have no functional difference for tests, except that bdist is faster, I changed this to reduce test time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will make a PR to change CI to use sdist again if you approve
That would be good.
I chose bdist because after removing data files, sdist and bdist packages have no functional difference for tests, except that bdist is faster, I changed this to reduce test time
The key idea of #100 is to prepare the sdist once; then all later steps run in the source directory unpacked from the sdist and no longer use files from git. Indeed it's a bit slower, but I think it's worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our CI does exercise with the data files, running update-tables.py --fetch-only prior to python3.14 tests https://github.com/jquast/wcwidth/actions/runs/21465737084/job/61827437773#step:6:28
Downstream dependencies can do the same. Maybe someone who really cares about it can work with all of the many popular packaging systems to package up unicode data files, they are useful to have just like /usr/share/unicode/, maybe some of them already do.
I had some great trouble with a fetch of only a single data file last week, because of cloudflare, because my ISP uses CGNAT and the whole web is terribly hostile for my IP. I also expect CI will have trouble from time to time, so I will also add an "ok if fetch fails" into CI for fetching data files.
I don't wish the wcwidth license to change, most downstreams would trigger a re-evaluation before accepting version changes, Even if its also another MIT license, the cost is still incurred for review and categorization of metadata and so on. And then there are also quarterly SOX2 happening in private all over the world.
Closes #198