-
Notifications
You must be signed in to change notification settings - Fork 816
DoBatch preference to 4xx if error #4783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
LGTM! |
if i.failed5xx.Load() > i.failed4xx.Load() { | ||
return i.err5xx.Load() | ||
} | ||
|
||
return i.err4xx.Load() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about fetching the two values into local variables, so you don't make 4x atomic calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bryan, i am not sure i get your idea. They are 4 different variables. 2 for amount of erros and 2 for the actual error. Current we can only return the last known error, the change is basically to hold one error for each family type.
Signed-off-by: Daniel Blando <ddeluigg@amazon.com>
Signed-off-by: Daniel Blando <ddeluigg@amazon.com>
* DoBatch preference to 4xx if error Signed-off-by: Daniel Blando <ddeluigg@amazon.com> * Fix comment Signed-off-by: Daniel Blando <ddeluigg@amazon.com> Signed-off-by: Alex Le <leqiyue@amazon.com>
* Introduced lock file to shuffle sharding grouper Signed-off-by: Alex Le <leqiyue@amazon.com> * let redis cache logs log with context (#4785) * let redis cache logs log with context Signed-off-by: Mengmeng Yang <mengmengyang616@gmail.com> * fix import Signed-off-by: Mengmeng Yang <mengmengyang616@gmail.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * DoBatch preference to 4xx if error (#4783) * DoBatch preference to 4xx if error Signed-off-by: Daniel Blando <ddeluigg@amazon.com> * Fix comment Signed-off-by: Daniel Blando <ddeluigg@amazon.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Updated CHANGELOG and ordered imports Signed-off-by: Alex Le <leqiyue@amazon.com> * Fixed lint and removed groupCallLimit Signed-off-by: Alex Le <leqiyue@amazon.com> * Changed lock file to json format and make sure planner would not pick up group that is locked by other compactor Signed-off-by: Alex Le <leqiyue@amazon.com> * Fix updateCachedShippedBlocks - new thanos (#4806) Signed-off-by: Alan Protasio <approtas@amazon.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Join memberlist on starting with no retry (#4804) Signed-off-by: Daniel Blando <ddeluigg@amazon.com> * Fix alertmanager log message (#4801) Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Grafana Cloud uses Mimir now, so remove Grafana Cloud as hosted service in documents (#4809) * Grafana Cloud uses Mimir, for of Cortex, now Signed-off-by: Alvin Lin <alvinlin123@gmail.com> * Improve doc Signed-off-by: Alvin Lin <alvinlin@amazon.com> Signed-off-by: Alex Le <leqiyue@amazon.com> * Created block_locker to handle all block lock file operations. Added block lock metrics. Signed-off-by: Alex Le <leqiyue@amazon.com> * Moved lock file heart beat into planner and refined planner logic to make sure blocks are locked by current compactor Signed-off-by: Alex Le <leqiyue@amazon.com> * Updated documents Signed-off-by: Alex Le <leqiyue@amazon.com> * Return concurrency number of group. Use ticker for lock file heart beat Signed-off-by: Alex Le <leqiyue@amazon.com> * Renamed lock file to be visit marker file Signed-off-by: Alex Le <leqiyue@amazon.com> * Fixed unit test Signed-off-by: Alex Le <leqiyue@amazon.com> * Make sure visited block can be picked by compactor visited it Signed-off-by: Alex Le <leqiyue@amazon.com> Signed-off-by: Alex Le <leqiyue@amazon.com> Signed-off-by: Mengmeng Yang <mengmengyang616@gmail.com> Signed-off-by: Daniel Blando <ddeluigg@amazon.com> Signed-off-by: Alan Protasio <approtas@amazon.com> Signed-off-by: Xiaochao Dong (@damnever) <the.xcdong@gmail.com> Signed-off-by: Alvin Lin <alvinlin@amazon.com> Signed-off-by: Alex Le <emoc1989@gmail.com> Co-authored-by: Mengmeng Yang <mengmengyang616@gmail.com> Co-authored-by: Daniel Blando <ddeluigg@amazon.com> Co-authored-by: Alan Protasio <approtas@amazon.com> Co-authored-by: Xiaochao Dong <the.xcdong@gmail.com> Co-authored-by: Alvin Lin <alvinlin@amazon.com>
Signed-off-by: Daniel Blando ddeluigg@amazon.com
What this PR does:
After #4388, we started returning the error which most failed. This CR improves the logic to also prioritize 4xx if it was the same amount of 5xx errors. The logic being that in a case of 4xx and 5xx, we are predicting the customer was close to their limits and 4xx is more relevant than 5xx. The change also creates more consistency in responses.
If we had a 3 quorum
Before:
2xx, 5xx, 4xx -> returns 4xx
2xx, 4xx, 5xx -> returns 5xx
5xx, 4xx, 5xx -> returns 5xx
4xx, 5xx, 4xx -> returns 4xx
After change:
2xx, 5xx, 4xx -> returns 4xx
2xx, 4xx, 5xx -> returns 4xx
5xx, 4xx, 5xx -> returns 5xx
4xx, 5xx, 4xx -> returns 4xx
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]