Skip to content

Low-memory model fallback is too aggressive, masking real errors and slowing down execution #19

@JackyHe398

Description

@JackyHe398

Description

When the fast_langdetect("lid.176.bin") model is corrupted or inaccessible, the library automatically falls back to the low-memory mode. This behavior is too aggressive, because it treats all errors the same as a MemoryError.

This makes debugging very difficult and causes significant slowdowns, since the fallback mode downloads and reloads the model every time.

In my case (using GPTSoVITS), the issue was caused by incorrect file permissions. Instead of failing fast with a clear error, the library only emitted an INFO log and silently retried in low-memory mode. The result was a slowdown from ~0.3s to ~19s per call.

Why this is a problem

  • Hides real issues: Permission or corruption errors should raise exceptions, not be handled as if memory were insufficient.

  • Increases debugging time: Only an info log is printed, leaving no clear indication of the root cause.

  • Performance regression: Automatic downloading/reloading makes execution dramatically slower.

Suggested Fix

Only fall back to low-memory mode when catching a MemoryError. For all other exceptions (e.g., I/O errors, permission issues, corruption), raise the error explicitly so users can debug properly. The fix can be found under PR20

Steps to Reproduce

test script as following

import logging
logging.basicConfig(level=logging.DEBUG, format='%(levelname)s:%(name)s:%(message)s')
import fast_langdetect
fast_langdetect.infer._default_detector = fast_langdetect.infer.LangDetector(fast_langdetect.infer.LangDetectConfig(cache_dir="/tmp/fasttext-langdetect/lid.176.bin"))

import os, sys, resource
limit = 1000 * 1024 * 1024
resource.setrlimit(resource.RLIMIT_AS, (limit, limit))

print('=== Testing with debug logging ===')
print('Testing low_memory=False with corrupted model:')
result = fast_langdetect.detect('Hello world', low_memory=False)
print('Result:', result)

result with corrupted model:

(GPTSoVITS) hekh@ljz-MS-7D76:~/code/fast-langdetect$ /home/hekh/miniconda3/envs/GPTSoVITS/bin/python /home/hekh/code/fast-langdetect/test.py
=== Testing with debug logging ===
Testing low_memory=False with corrupted model:
INFO:fast_langdetect.infer:fast-langdetect: Downloading model from https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.bin
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): dl.fbaipublicfiles.com:443
DEBUG:urllib3.connectionpool:https://dl.fbaipublicfiles.com:443 "GET /fasttext/supervised-models/lid.176.bin HTTP/1.1" 200 131266198
INFO:fast_langdetect.infer:fast-langdetect: Falling back to low-memory model...
Result: {'lang': 'en', 'score': 0.1682594120502472}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions