Skip to content

ocr.nf cannot find FoLiA-hocr output files #30

@peterdekker

Description

@peterdekker

When running ocr.nf, the expected file format outputted by FoLiA-hocr is the basename of the original file + *.folia.xml: https://github.com/LanguageMachines/PICCL/blob/master/ocr.nf#L229

However, since the fix for issue LanguageMachines/foliautils#21, "id-" is prepended to filenames starting with a number by FoLiA-hocr: LanguageMachines/foliautils@6af7fa4
Now ocr.nf cannot find the files outputted by FoLiA-hocr anymore.

Not all files get the "id-" prefix from FoLiA-hocr, only the ones starting with a number. So a solution could be to make ocr.nf look for a broader output pattern. Or maybe FoLiA-hocr should add the "id-" prefix to all files, and ocr.nf could always look for this prefix.

@kosloot @proycon

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions