-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Description
When the fast_langdetect("lid.176.bin") model is corrupted or inaccessible, the library automatically falls back to the low-memory mode. This behavior is too aggressive, because it treats all errors the same as a MemoryError.
This makes debugging very difficult and causes significant slowdowns, since the fallback mode downloads and reloads the model every time.
In my case (using GPTSoVITS), the issue was caused by incorrect file permissions. Instead of failing fast with a clear error, the library only emitted an INFO log and silently retried in low-memory mode. The result was a slowdown from ~0.3s to ~19s per call.
Why this is a problem
-
Hides real issues: Permission or corruption errors should raise exceptions, not be handled as if memory were insufficient.
-
Increases debugging time: Only an info log is printed, leaving no clear indication of the root cause.
-
Performance regression: Automatic downloading/reloading makes execution dramatically slower.
Suggested Fix
Only fall back to low-memory mode when catching a MemoryError. For all other exceptions (e.g., I/O errors, permission issues, corruption), raise the error explicitly so users can debug properly. The fix can be found under PR20
Steps to Reproduce
test script as following
import logging
logging.basicConfig(level=logging.DEBUG, format='%(levelname)s:%(name)s:%(message)s')
import fast_langdetect
fast_langdetect.infer._default_detector = fast_langdetect.infer.LangDetector(fast_langdetect.infer.LangDetectConfig(cache_dir="/tmp/fasttext-langdetect/lid.176.bin"))
import os, sys, resource
limit = 1000 * 1024 * 1024
resource.setrlimit(resource.RLIMIT_AS, (limit, limit))
print('=== Testing with debug logging ===')
print('Testing low_memory=False with corrupted model:')
result = fast_langdetect.detect('Hello world', low_memory=False)
print('Result:', result)
result with corrupted model:
(GPTSoVITS) hekh@ljz-MS-7D76:~/code/fast-langdetect$ /home/hekh/miniconda3/envs/GPTSoVITS/bin/python /home/hekh/code/fast-langdetect/test.py
=== Testing with debug logging ===
Testing low_memory=False with corrupted model:
INFO:fast_langdetect.infer:fast-langdetect: Downloading model from https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.bin
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): dl.fbaipublicfiles.com:443
DEBUG:urllib3.connectionpool:https://dl.fbaipublicfiles.com:443 "GET /fasttext/supervised-models/lid.176.bin HTTP/1.1" 200 131266198
INFO:fast_langdetect.infer:fast-langdetect: Falling back to low-memory model...
Result: {'lang': 'en', 'score': 0.1682594120502472}