-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
renaming metrics #1224
renaming metrics #1224
Conversation
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
are we removing all |
Before we had only data note. But in model serving framework we also initiated ML Node. So having this |
LGTM, just could you check why all the builds are failing? Will approve after CI is passing |
It's failing here, we applied some node level vs cluster level distinction in the stats. Now it's treating all of them as Cluster level stats. |
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
@@ -148,6 +149,10 @@ protected RestChannelConsumer prepareRequest(RestRequest request, NodeClient cli | |||
} | |||
|
|||
MLStatsInput createMlStatsInputFromRequestParams(RestRequest request) { | |||
|
|||
Set<String> mlNodeStatNames = EnumSet.allOf(MLNodeLevelStat.class).stream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we construct a new Set for every request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can construct the new set in the class initialization. Let me do that.
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
@@ -392,9 +391,6 @@ private void indexRemoteModel(MLRegisterModelInput registerModelInput, MLTask ml | |||
String taskId = mlTask.getTaskId(); | |||
FunctionName functionName = mlTask.getFunctionName(); | |||
try (ThreadContext.StoredContext context = client.threadPool().getThreadContext().stashContext()) { | |||
mlStats.getStat(MLNodeLevelStat.ML_REQUEST_COUNT).increment(); | |||
mlStats.createCounterStatIfAbsent(functionName, REGISTER, ML_ACTION_REQUEST_COUNT).increment(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is to track how many register requests on function level. By removing this, can we still track that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, because we are tracking this in the parent function registerMLModel
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
@@ -11,7 +11,8 @@ | |||
*/ | |||
public enum MLNodeLevelStat { | |||
ML_JVM_HEAP_USAGE, | |||
ML_EXECUTING_TASK_COUNT, | |||
ML_EXECUTING_TASK_COUNT, // How many tasks are executing currently. If any task starts, then it will be 1, if the task finished then it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then it will be 1
-> then it will increase by 1
will get back to 0
-> will decrease by 1
?
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
@@ -234,12 +233,6 @@ private void registerModel(MLRegisterModelInput registerModelInput, ActionListen | |||
throw new IllegalArgumentException("URL can't match trusted url regex"); | |||
} | |||
} | |||
// mlStats.getStat(MLNodeLevelStat.ML_NODE_EXECUTING_TASK_COUNT).increment(); | |||
mlStats.getStat(MLNodeLevelStat.ML_NODE_TOTAL_REQUEST_COUNT).increment(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are already counting this in the registerMLModel
function in MLModelManager class
* renaming metrics Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * updating tests Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * updating test cases Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * removing the ML_NODE checking for node level stats Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * updating constructing new set Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * spotless Apply Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * updating ML_NODE_TOTAL_MODEL_COUNT to ML_DEPLOYED_MODEL_COUNT Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * fixing metrics count Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * spotless Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * fixing executing task Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * updating comment Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> --------- Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> (cherry picked from commit 86eb953)
* renaming metrics Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * updating tests Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * updating test cases Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * removing the ML_NODE checking for node level stats Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * updating constructing new set Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * spotless Apply Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * updating ML_NODE_TOTAL_MODEL_COUNT to ML_DEPLOYED_MODEL_COUNT Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * fixing metrics count Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * spotless Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * fixing executing task Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * updating comment Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> --------- Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> (cherry picked from commit 86eb953) Co-authored-by: Dhrubo Saha <dhrubo@amazon.com>
* renaming metrics Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * updating tests Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * updating test cases Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * removing the ML_NODE checking for node level stats Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * updating constructing new set Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * spotless Apply Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * updating ML_NODE_TOTAL_MODEL_COUNT to ML_DEPLOYED_MODEL_COUNT Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * fixing metrics count Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * spotless Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * fixing executing task Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * updating comment Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> --------- Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
* renaming metrics Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * updating tests Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * updating test cases Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * removing the ML_NODE checking for node level stats Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * updating constructing new set Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * spotless Apply Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * updating ML_NODE_TOTAL_MODEL_COUNT to ML_DEPLOYED_MODEL_COUNT Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * fixing metrics count Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * spotless Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * fixing executing task Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> * updating comment Signed-off-by: Dhrubo Saha <dhrubo@amazon.com> --------- Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
Description
[We renamed the node level metrics:
ml_node_executing_task_count --> ml_executing_task_count
ml_node_total_model_count --> ml_deployed_model_count
ml_node_total_failure_count. --> ml_failure_count
ml_node_total_circuit_breaker_trigger_count --> ml_circuit_breaker_trigger_count
ml_node_total_request_count --> ml_request_count
ml_node_jvm_heap_usage --> ml_jvm_heap_usage
]
Issues Resolved
[List any issues this PR will resolve]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.