♻️ refactor(tests/infer): Unify detect functions, make the API more intuitive and avoid implicit fallbacks #22

sudoskys · 2025-09-15T16:04:10Z

from fast_langdetect import detect

# Lite model (offline, smaller, faster) — never falls back
print(detect("Hello", model='lite', k=1))          # -> [{'lang': 'en', 'score': ...}]

# Full model (downloaded to cache, higher accuracy) — never falls back
print(detect("Hello", model='full', k=1))          # -> [{'lang': 'en', 'score': ...}]

# Auto mode: try full, fallback to lite only on MemoryError
print(detect("Hello", model='auto', k=1))          # -> [{'lang': 'en', 'score': ...}]

# Multilingual: top 3 candidates (always a list)
print(detect("Hello 世界 こんにちは", model='auto', k=3))

Changes

feature_test/init.py
- Remove detect_multilingual and related low_memory and allow_fallback methods.
- Unify with new interface detect(text, model=‘full’/'auto', k=N) which prints the returned candidate list.
- Retain and demonstrate detect_language (which returns two uppercase language codes).
- New DetectError catching example to show that the full model throws errors in I/O error scenarios such as offline; the lite model is available offline.
- Use new LangDetectConfig and LangDetector construct signatures.
feature_test/lingua_t.py
- Remove imports and calls to detect_multilingual.
- Introduce detect and use detect(sentence, model=‘full’, k=5) to get multi-candidates, keeping the same functionality as the old example.

…lingual usage

…ctory behavior comments for clarity

sudoskys · 2025-09-15T16:08:22Z

keep simple and stupid

…just parameters for accuracy

sudoskys · 2025-09-15T16:13:54Z

@JackyHe398 What do you think of this change?

I think it would be intuitive to use different kinds of models corresponding to different func.
And the caller can re-combine these methods on its own, avoiding the need for hidden internal control strategies.

JackyHe398 · 2025-09-16T02:33:37Z

Provide default option setting and allow overridde

I am wondering if it will be better if it provide an option in config as default like

LangDetectConfig(model = "auto")
LangDetectConfig(model = "full")
LangDetectConfig(model = "lite")

so that we can use the model neatly

fast_langdetect.infer._default_detector = fast_langdetect.infer.LangDetector(fast_langdetect.infer.LangDetectConfig(model = "lite") # set default to lite
print(detect("Hello"))   # default model selection, default k value
print(detect("Bonjour"))
print(detect("こんにちは"))
print(detect("Ciallo"))
print(detect("Hello", model = "full")) # override the model to full
print(detect("Hello", model = "auto")) # override the model to auto

instead of

print(detect("Hello"), mode = "lite")
print(detect("Bonjour"), mode = "lite")
print(detect("こんにちは"), mode = "lite")
print(detect("Ciallo"), mode = "lite")
print(detect("Hello", model = "full"))
print(detect("Hello"))

Error Handling

I saw you trying to use

except Exception as e:
        raise DetectError(f"Failed to load model using temporary file: {e}") from e

to wrap everything into DetectError for a few times.

It's very likely that the error is not because of the model or fast_langdetect. Most of the time the error message can be much more descriptive like using MemoryError, TypeError, ValueError, etcetc. The traceback message is already enough to find the root of error, you don't need to tell them by wrapping the error into DetectError. Instead, you may leave it only when the model has failed or other error only occur inside your lib.

for example in infer.py#class LanDetectConfig#def _get_model:

try:
    <...>
except MemoryError as e:
    if not low_memory and fallback_on_memory_error: # instead of low_memory is not True
        logger.info("fast-langdetect: Falling back to low-memory model...")
        return self._get_model(low_memory=True, fallback_on_memory_error=False)
    raise # preserve original error and traceback

Long to short, only raise DetectError for fast-langdetect–specific failures. Let standard Python exceptions (ValueError, TypeError, FileNotFoundError, MemoryError, etc.) propagate.

Optional Polish

Rename DetectError → FastLangdetectError and add narrower subclasses like ModelLoadError.

…or handling in detection methods

sudoskys · 2025-09-16T09:20:51Z

@JackyHe398 Now it should look much better. However, I don't recommend changing the global settings as this may cause cross-thread or invisible changes.

…essions to avoid inflated peak RSS readings

…odels

sudoskys added 2 commits September 16, 2025 00:02

♻️ refactor(tests/infer): unify detect functions, remove detect_multi…

b775f8c

…lingual usage

🗑️ refactor(infer.py): remove outdated fallback policy and cache dire…

aeac7b6

…ctory behavior comments for clarity

🔨 refactor(feature_test): replace detect_multilingual with detect, ad…

cfe3e1a

…just parameters for accuracy

sudoskys linked an issue Sep 15, 2025 that may be closed by this pull request

Low-memory model fallback is too aggressive, masking real errors and slowing down execution #19

Closed

⚡️ refactor(exceptions): Rename exceptions for clarity and update err…

fc18980

…or handling in detection methods

sudoskys removed a link to an issue Sep 16, 2025

Low-memory model fallback is too aggressive, masking real errors and slowing down execution #19

Closed

sudoskys linked an issue Sep 16, 2025 that may be closed by this pull request

docs(Minor): Memory usage lower than expected. #23

Closed

sudoskys added 2 commits September 16, 2025 20:41

📝 docs(README): Add note on running memory checks in clean terminal s…

34e852a

…essions to avoid inflated peak RSS readings

🆕 feat(examples): add memory usage check script for fast-langdetect m…

197ed22

…odels

sudoskys merged commit 9a828e4 into main Sep 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

♻️ refactor(tests/infer): Unify detect functions, make the API more intuitive and avoid implicit fallbacks #22

♻️ refactor(tests/infer): Unify detect functions, make the API more intuitive and avoid implicit fallbacks #22

Uh oh!

sudoskys commented Sep 15, 2025 •

edited

Loading

Uh oh!

sudoskys commented Sep 15, 2025

Uh oh!

sudoskys commented Sep 15, 2025 •

edited

Loading

Uh oh!

JackyHe398 commented Sep 16, 2025

Uh oh!

sudoskys commented Sep 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

♻️ refactor(tests/infer): Unify detect functions, make the API more intuitive and avoid implicit fallbacks #22

♻️ refactor(tests/infer): Unify detect functions, make the API more intuitive and avoid implicit fallbacks #22

Uh oh!

Conversation

sudoskys commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sudoskys commented Sep 15, 2025

Uh oh!

sudoskys commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JackyHe398 commented Sep 16, 2025

Provide default option setting and allow overridde

Error Handling

Optional Polish

Uh oh!

sudoskys commented Sep 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sudoskys commented Sep 15, 2025 •

edited

Loading

sudoskys commented Sep 15, 2025 •

edited

Loading