-
Notifications
You must be signed in to change notification settings - Fork 9
♻️ refactor(tests/infer): Unify detect functions, make the API more intuitive and avoid implicit fallbacks #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ctory behavior comments for clarity
keep simple and stupid |
…just parameters for accuracy
@JackyHe398 What do you think of this change? I think it would be intuitive to use different kinds of models corresponding to different func. |
Provide default option setting and allow overriddeI am wondering if it will be better if it provide an option in config as default like LangDetectConfig(model = "auto")
LangDetectConfig(model = "full")
LangDetectConfig(model = "lite") so that we can use the model neatly fast_langdetect.infer._default_detector = fast_langdetect.infer.LangDetector(fast_langdetect.infer.LangDetectConfig(model = "lite") # set default to lite
print(detect("Hello")) # default model selection, default k value
print(detect("Bonjour"))
print(detect("こんにちは"))
print(detect("Ciallo"))
print(detect("Hello", model = "full")) # override the model to full
print(detect("Hello", model = "auto")) # override the model to auto instead of print(detect("Hello"), mode = "lite")
print(detect("Bonjour"), mode = "lite")
print(detect("こんにちは"), mode = "lite")
print(detect("Ciallo"), mode = "lite")
print(detect("Hello", model = "full"))
print(detect("Hello")) Error HandlingI saw you trying to use except Exception as e:
raise DetectError(f"Failed to load model using temporary file: {e}") from e to wrap everything into DetectError for a few times. It's very likely that the error is not because of the model or fast_langdetect. Most of the time the error message can be much more descriptive like using MemoryError, TypeError, ValueError, etcetc. The traceback message is already enough to find the root of error, you don't need to tell them by wrapping the error into DetectError. Instead, you may leave it only when the model has failed or other error only occur inside your lib. for example in infer.py#class LanDetectConfig#def _get_model: try:
<...>
except MemoryError as e:
if not low_memory and fallback_on_memory_error: # instead of low_memory is not True
logger.info("fast-langdetect: Falling back to low-memory model...")
return self._get_model(low_memory=True, fallback_on_memory_error=False)
raise # preserve original error and traceback Long to short, only raise DetectError for fast-langdetect–specific failures. Let standard Python exceptions (ValueError, TypeError, FileNotFoundError, MemoryError, etc.) propagate. Optional PolishRename |
…or handling in detection methods
@JackyHe398 Now it should look much better. However, I don't recommend changing the global settings as this may cause cross-thread or invisible changes. |
…essions to avoid inflated peak RSS readings
Changes