This repository was archived by the owner on Oct 31, 2023. It is now read-only.
This repository was archived by the owner on Oct 31, 2023. It is now read-only.
Is this odd Python specific format of encoding texts intentional? #5
Open
Description
The dataset format seems to be changed so that it is not compatible with Laser repository, and one change seems like an unintentional side effect. Namely, the articles texts are encoded as repr
representation of python byte string which makes the data set very python specific and hard to parse even in python.
Please have a look at this laser issue: facebookresearch/LASER#39 for examples of how the encoding look like.
Metadata
Metadata
Assignees
Labels
No labels