-
Notifications
You must be signed in to change notification settings - Fork 5
Closed
Labels
Description
I am trying to tokenize a bunch of txt-files and store them as folia.xml-files.
The first file works fine, but after that the kernel crashes.
A little bit more info:
- I'm working with the latest ucto version (0.6.4);
- I've tried this in both VSCode an Colab. In both cases it crashes;
- I've tried it withPython 3.11.3 and 3.8.10. In both cases it crashes;
- It doesn't seem to have anything to do with the input txt-files: even if the txt-files are exactly the same it will work for the first file and crash at the second file;
- When running the code out of a notebook I get this error: Canceled future for execute_request message before replies were done
The Kernel crashed while executing code in the the current cell or a previous cell; - When running the code from the command line I get this error: Segmentation fault.
import ucto
configurationfile_ucto = "tokconfig-nld-historical"
tokenizer = ucto.Tokenizer(configurationfile_ucto, foliaoutput = True)
for f in list_with_paths_to_exact_same_files:
tokenizer.tokenize(f, output_path)
Am I doing something wrong, or is there a bug here?