Skip to content

Update big5 to use the full 1 TiB corpus #680

Closed
@gkamat

Description

@gkamat

Is your feature request related to a problem? Please describe

The current "1 TB" corpus packaged with big5 is actually only around 880 GiB as a consequence of the CloudFront limitation on file sizes that can be downloaded.

The workload should be updated to download corpora in parts to work around this limitation.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

  • Status

    ✅ Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions