Skip to content

Conversation

sudoskys
Copy link
Member

@sudoskys sudoskys commented Sep 15, 2025

from fast_langdetect import detect

# Lite model (offline, smaller, faster) — never falls back
print(detect("Hello", model='lite', k=1))          # -> [{'lang': 'en', 'score': ...}]

# Full model (downloaded to cache, higher accuracy) — never falls back
print(detect("Hello", model='full', k=1))          # -> [{'lang': 'en', 'score': ...}]

# Auto mode: try full, fallback to lite only on MemoryError
print(detect("Hello", model='auto', k=1))          # -> [{'lang': 'en', 'score': ...}]

# Multilingual: top 3 candidates (always a list)
print(detect("Hello 世界 こんにちは", model='auto', k=3))

Changes

  • feature_test/init.py
    • Remove detect_multilingual and related low_memory and allow_fallback methods.
    • Unify with new interface detect(text, model=‘full’/'auto', k=N) which prints the returned candidate list.
    • Retain and demonstrate detect_language (which returns two uppercase language codes).
    • New DetectError catching example to show that the full model throws errors in I/O error scenarios such as offline; the lite model is available offline.
    • Use new LangDetectConfig and LangDetector construct signatures.
  • feature_test/lingua_t.py
    • Remove imports and calls to detect_multilingual.
    • Introduce detect and use detect(sentence, model=‘full’, k=5) to get multi-candidates, keeping the same functionality as the old example.

@sudoskys
Copy link
Member Author

keep simple and stupid

@sudoskys
Copy link
Member Author

sudoskys commented Sep 15, 2025

@JackyHe398 What do you think of this change?

I think it would be intuitive to use different kinds of models corresponding to different func.
And the caller can re-combine these methods on its own, avoiding the need for hidden internal control strategies.

@JackyHe398
Copy link
Contributor

Provide default option setting and allow overridde

I am wondering if it will be better if it provide an option in config as default like

LangDetectConfig(model = "auto")
LangDetectConfig(model = "full")
LangDetectConfig(model = "lite")

so that we can use the model neatly

fast_langdetect.infer._default_detector = fast_langdetect.infer.LangDetector(fast_langdetect.infer.LangDetectConfig(model = "lite") # set default to lite
print(detect("Hello"))   # default model selection, default k value
print(detect("Bonjour"))
print(detect("こんにちは"))
print(detect("Ciallo"))
print(detect("Hello", model = "full")) # override the model to full
print(detect("Hello", model = "auto")) # override the model to auto

instead of

print(detect("Hello"), mode = "lite")
print(detect("Bonjour"), mode = "lite")
print(detect("こんにちは"), mode = "lite")
print(detect("Ciallo"), mode = "lite")
print(detect("Hello", model = "full"))
print(detect("Hello"))

Error Handling

I saw you trying to use

except Exception as e:
        raise DetectError(f"Failed to load model using temporary file: {e}") from e

to wrap everything into DetectError for a few times.

It's very likely that the error is not because of the model or fast_langdetect. Most of the time the error message can be much more descriptive like using MemoryError, TypeError, ValueError, etcetc. The traceback message is already enough to find the root of error, you don't need to tell them by wrapping the error into DetectError. Instead, you may leave it only when the model has failed or other error only occur inside your lib.

for example in infer.py#class LanDetectConfig#def _get_model:

try:
    <...>
except MemoryError as e:
    if not low_memory and fallback_on_memory_error: # instead of low_memory is not True
        logger.info("fast-langdetect: Falling back to low-memory model...")
        return self._get_model(low_memory=True, fallback_on_memory_error=False)
    raise # preserve original error and traceback

Long to short, only raise DetectError for fast-langdetect–specific failures. Let standard Python exceptions (ValueError, TypeError, FileNotFoundError, MemoryError, etc.) propagate.

Optional Polish

Rename DetectErrorFastLangdetectError and add narrower subclasses like ModelLoadError.

@sudoskys
Copy link
Member Author

@JackyHe398 Now it should look much better. However, I don't recommend changing the global settings as this may cause cross-thread or invisible changes.

@sudoskys sudoskys merged commit 9a828e4 into main Sep 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

docs(Minor): Memory usage lower than expected.

2 participants