make concatenation of AMR files consistent in preprocessing #79

jgroschwitz · 2019-12-05T14:42:00Z

Each AMR dataset (e.g. the dev set of LDC2017T10) consists of multiple files. In preprocessing, these files are concatenated in arbitrary order (determined by the OS). This order is not always consistent (I just had a case where one file from a first preprocessing run and a file from a different preprocessing run had different orders, leading to problems).

The concatenation should instead be consistent over different runs, e.g. by lexical ordering of filenames.

jgroschwitz · 2020-01-16T14:10:27Z

Also while at it, make sure to exclude hidden files from iteration

jgroschwitz added the bug Something isn't working label Dec 5, 2019

jgroschwitz self-assigned this Dec 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make concatenation of AMR files consistent in preprocessing #79

make concatenation of AMR files consistent in preprocessing #79

jgroschwitz commented Dec 5, 2019

jgroschwitz commented Jan 16, 2020

make concatenation of AMR files consistent in preprocessing #79

make concatenation of AMR files consistent in preprocessing #79

Comments

jgroschwitz commented Dec 5, 2019

jgroschwitz commented Jan 16, 2020