Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use proper file download library in ICU4X #414

Closed
sffc opened this issue Dec 8, 2020 · 9 comments · Fixed by #704 or #1744
Closed

Use proper file download library in ICU4X #414

sffc opened this issue Dec 8, 2020 · 9 comments · Fixed by #704 or #1744
Assignees
Labels
C-data-infra Component: provider, datagen, fallback, adapters S-tiny Size: Less than an hour (trivial fixes) T-enhancement Type: Nice-to-have but not required
Milestone

Comments

@sffc
Copy link
Member

sffc commented Dec 8, 2020

Right now ICU4X uses a mechanism I cobbled together to download and cache CLDR JSON data. We should look for a common solution to this problem on crates.io so that we don't have to maintain file downloader code in ICU4X.

@sffc sffc added T-enhancement Type: Nice-to-have but not required C-data-infra Component: provider, datagen, fallback, adapters labels Dec 8, 2020
@sffc sffc added this to the 2020 Q4 milestone Dec 8, 2020
@sffc sffc self-assigned this Dec 8, 2020
@sffc sffc added backlog good first issue Good for newcomers help wanted Issue needs an assignee labels Jan 7, 2021
@sffc sffc removed this from the 2020 Q4 milestone Jan 7, 2021
@gregtatum
Copy link
Member

I'm interested in this, but I'd like to finish #107 and start #272 first. If someone else wants to take it before then that's fine.

@sffc sffc removed their assignment Jan 8, 2021
@nordzilla
Copy link
Member

I just wanted to highlight a poor experience that I had with the current file download system, as motivation to add this functionality.

I apparently had a partially downloaded CLDR archive stored in my cache. This caused unintended behavior (deleting files) when trying to generate new test data.

The eventual fix was for me to delete the cached archive to force a re-download. In addition to using a more standard tool for this, I think it would also be nice to run some sort of checksum on the downloaded archive to ensure that the data was downloaded correctly and completely.

@sffc
Copy link
Member Author

sffc commented Jan 12, 2021

FYI, the code for this (which should be replaced) is here:

https://github.com/unicode-org/icu4x/blob/master/components/provider_cldr/src/download/io_util.rs

@sffc
Copy link
Member Author

sffc commented Mar 27, 2021

@nordzilla @gregtatum Any interest in taking this backlog issue for the April cleanup?

@gregtatum
Copy link
Member

I'm probably at capacity already for April.

@sffc sffc self-assigned this May 4, 2021
@sffc sffc added S-small Size: One afternoon (small bug fix or enhancement) and removed backlog good first issue Good for newcomers help wanted Issue needs an assignee labels May 4, 2021
@sffc sffc added this to the 2021-Q2-m1 milestone May 4, 2021
@sffc
Copy link
Member Author

sffc commented May 4, 2021

I'm going to do this by not actually downloading the full CLDR JSON zip file. Instead, I'll download each file one at a time. Then we don't have to touch the system cache directory at all, which seems good.

The file listing is available via the GitHub API:

https://api.github.com/repos/unicode-org/cldr-json/contents/cldr-json/cldr-cal-buddhist-full/main?ref=39.0.0

@sffc
Copy link
Member Author

sffc commented May 5, 2021

I got the new downloader working in #704, but I'd like to discuss before I delete the old one.

@sffc sffc added the discuss-priority Discuss at the next ICU4X meeting label May 5, 2021
@sffc
Copy link
Member Author

sffc commented May 7, 2021

2021-05-07: Remove the old flaky download utility (+1 from @Manishearth , @nordzilla , @mihnita ). Open new issue to introduce a new, better download utility for 1.0.

@sffc sffc added S-tiny Size: Less than an hour (trivial fixes) and removed discuss-priority Discuss at the next ICU4X meeting S-small Size: One afternoon (small bug fix or enhancement) labels May 7, 2021
@sffc sffc linked a pull request May 13, 2021 that will close this issue
@sffc sffc closed this as completed May 13, 2021
@sffc
Copy link
Member Author

sffc commented May 16, 2021

Sorry, not quite fixed yet. I still need to remove the old tool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-data-infra Component: provider, datagen, fallback, adapters S-tiny Size: Less than an hour (trivial fixes) T-enhancement Type: Nice-to-have but not required
Projects
None yet
3 participants