Closed
Description
Is your feature request related to a problem? Please describe
The current "1 TB" corpus packaged with big5 is actually only around 880 GiB as a consequence of the CloudFront limitation on file sizes that can be downloaded.
The workload should be updated to download corpora in parts to work around this limitation.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
✅ Done
Activity