-
Notifications
You must be signed in to change notification settings - Fork 36
[BUG] Can't initialize detector after disable/re-enable AD plugin #132
Comments
Root causeDetector first initialization done, then we store model checkpoints to checkpoint index. SolutionTo be safe, we can delete model checkpoint document from index if AD job stopped automatically. Will build a new model when detector restarted. |
Another solution would be to clear model checkpoints regardless of where the request is received or the model is hosted. I will take a deeper look and see if that is feasible. |
by design, model data should be cleared with an update, correct? |
Yes, when user stop detector, AD will delete checkpoint automatically. But when we stop detector for some exception, we don't delete checkpoint. I'm working on job runner to add checkpoint deletion when stop AD job. Can you help check the model run part? When restore model from checkpoint and find the model dimension can't match current detector's feature count, we should delete the checkpoint and build new model. |
@ylwu-amzn Dimension check cannot be relied on. For example, the features start with sum(x), avg(y) and after the update, they become max(s) and min(t). While the dimension is the same, the previous model is not usable for the new features. |
Yes, agree. We can't trust the case for dimension equals feature count. At least we can check the not equal case. Ideally, we can store detector configuration in model checkpoint and check if it equals to current detector configuration. If the detector interval, window delay, indices, time field, filter or feature definition changed, we should not trust the model checkpoint. |
Size check is not the right solution. Model versioning was the initial design but the current design changed it to relying on updates to prevent model/config mismatch. |
As this problem occurs in some edge case, plan to fix it with a long-term solution rather than a work around like feature size check. Here are some options. Option1: delete checkpoints for every updateWhen update detector, no matter detector is running or not, will delete current checkpoint. Pros:
Cons:
Option2: delete checkpoints when stop detectorWhen detector stopped, no matter from REST API or from EndRunException, will delete current checkpoint. Pros:
Cons:
Option3: store detector configuration in checkpoint and compare with latest detector configurationWe store both model and detector configuration in checkpoint. We check latest detector configuration with the checkpoint, if not match, throw Pros:
Cons:
|
Thanks for the redesigns. First two options are finicky and lead to poor customer experiences. The third option is better in all regards and is the closest to the original design except for some minor details.
The efforts are manageable and can be done in steps.
|
Describe the bug
After disabling AD plugin, all detectors stopped. Re-enable AD plugin, and start detector. Found the detector state became initialization failure.
Exception:
To Reproduce
Steps to reproduce the behavior:
1.Create a detector with two features and 1minute interval. Start the detector and wait until its state becomes running.
2. Disable AD plugin
The text was updated successfully, but these errors were encountered: