Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Support display datacache metircs by http service. #40220

Merged
merged 1 commit into from
Jan 30, 2024

Conversation

GavinMar
Copy link
Contributor

@GavinMar GavinMar commented Jan 29, 2024

Why I'm doing:
We need some more metrics to evaluate the datacache performance. But now it is difficult because we only expose the datacache memory tracker to trace the memory usage.

What I'm doing:

  • Update the starcache version which returns more metrics from the api.
  • Expose more detailed cache metrics by http service to help observe datacache states.

With this PR, we can get detailed datacache metrics by the following URL:

http://${BE_HOST}:${BE_WEBSERVER_PORT}/api/datacache/stat

The response looks like below:

{
"status": "NORMAL",
"mem_quota_bytes": 5368709120,
"mem_used_bytes": 4237890114,
"disk_quota_bytes": 214748364800,
"disk_used_bytes": 0,
"mem_used_rate": 0.79,
"disk_used_rate": 0,
"disk_spaces": "/home/disk1/datacache:214748364800",
"meta_used_bytes": 3123454,
"hit_count": 114438,
"miss_count": 1126,
"hit_rate": 0.99,
"hit_bytes": 29809132182,
"miss_bytes": 268505666,
"hit_count_last_minute": 114061,
"miss_count_last_minute": 1126,
"hit_bytes_last_minute": 29713540072,
"miss_bytes_last_minute": 268505666,
"read_mem_bytes": 29809132182,
"read_disk_bytes": 0,
"write_bytes": 4237890114,
"write_success_count": 16268,
"write_fail_count": 20856,
"remove_bytes": 0,
"remove_success_count": 0,
"remove_fail_count": 0,
"current_reading_count": 2,
"current_writing_count": 0,
"current_removing_count": 0
}

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.2
    • 3.1
    • 3.0
    • 2.5

@GavinMar GavinMar requested review from a team as code owners January 29, 2024 05:29
@github-actions github-actions bot added the 3.2 label Jan 29, 2024
ExecEnv* _exec_env;
};

} // namespace starrocks
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code snippet provided is a header file for what appears to be part of an HTTP server component in the StarRocks system, and as such, it does not contain the function implementations which would allow for more in-depth bug identification. Most bugs typically reside in the implementation details rather than declarations.

However, if we were to speculate about potential issues based on the header file and common pitfalls:

  • The use of bare pointers like ExecEnv* exec_env; could lead to memory management issues, such as leaks or double deletion, if proper ownership semantics are not followed in the implementation.
  • Since DataCacheAction inherits from HttpHandler, there may be thread-safety concerns if the handler's methods are called concurrently, but without sufficient context and without seeing the rest of the code (particularly the method implementations), it's hard to say whether this will be a problem.

To provide a more accurate analysis, we would need the corresponding source file (.cpp) that includes the implementations of the handle, _handle, _handle_stat, and _handle_error functions.

If you have specific concerns or known problems you want to discuss, please share more context or a portion of the implementation code, so the analysis can be more precise.

});
}

} // namespace starrocks
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most risky bug in this code is:
Possible capture of this pointer by a lambda which can be invoked after the DataCacheAction object has been destroyed.

You can modify the code like this:

void DataCacheAction::_handle_stat(HttpRequest* req, BlockCache* cache) {
    auto block_cache_shared = _exec_env->block_cache_shared(); // Presuming that such a method returns a shared_ptr.
    _handle(req, [block_cache_shared](rapidjson::Document& root) {
        // Now use block_cache_shared instead of directly using the member variable.
        // Implement the rest of the code with the understanding that
        // the shared_ptr ensures the cache object remains alive as long as it's needed by the lambda.
    });
}

This change assumes there is a way to retrieve a shared_ptr to the block cache from the execution environment (as alluded to by _exec_env->block_cache_shared()), preventing the cache and any associated resources from being destroyed while still in use by any lambdas capturing the shared pointer.

@GavinMar GavinMar force-pushed the cache_metrics branch 2 times, most recently from 01df2bb to 6762f2b Compare January 29, 2024 10:19
@@ -168,8 +168,8 @@ void BlockCache::record_read_cache(size_t size, int64_t lateny_us) {
_kv_cache->record_read_cache(size, lateny_us);
}

const DataCacheMetrics BlockCache::cache_metrics() const {
return _kv_cache->cache_metrics();
const DataCacheMetrics BlockCache::cache_metrics(int level) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use enum instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use enum instead?

It is mainly to be consistent with the dependent library interface, and we can replace it after the library is updated.

_handle_error(req, strings::Substitute("Not support GET method: '$0'", req->uri()));
}
} else {
_handle_error(req,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually if the method is not allowed, it should be a HTTP 405 error. not a HTTP 200 with response body error: not supported ....

Signed-off-by: Gavin <yangguansuo@starrocks.com>
Copy link

Quality Gate Passed Quality Gate passed

Kudos, no new issues were introduced!

0 New issues
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

Copy link

[FE Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

[BE Incremental Coverage Report]

fail : 9 / 112 (08.04%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 be/src/block_cache/block_cache.cpp 0 2 00.00% [171, 172]
🔵 be/src/runtime/exec_env.h 0 1 00.00% [326]
🔵 be/src/http/action/datacache_action.cpp 0 98 00.00% [37, 38, 39, 40, 41, 43, 44, 46, 47, 50, 53, 54, 55, 56, 58, 59, 60, 65, 66, 67, 70, 71, 72, 74, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 100, 101, 102, 103, 105, 106, 107, 108, 110, 111, 113, 114, 115, 116, 117, 118, 120, 122, 123, 125, 126, 128, 129, 131, 132, 133, 135, 136, 138, 139, 140, 141, 142, 143, 145, 146, 148, 149, 150, 152, 153, 154, 156, 157, 158, 159, 160, 161, 163, 164, 165, 166, 167, 168]
🔵 be/src/block_cache/starcache_wrapper.cpp 3 5 60.00% [109, 110]
🔵 be/src/http/action/datacache_action.h 2 2 100.00% []
🔵 be/src/runtime/exec_env.cpp 1 1 100.00% []
🔵 be/src/service/service_be/http_service.cpp 3 3 100.00% []

@Youngwb Youngwb merged commit b054af8 into StarRocks:main Jan 30, 2024
42 of 44 checks passed
Copy link

@Mergifyio backport branch-3.2

@github-actions github-actions bot removed the 3.2 label Jan 30, 2024
Copy link
Contributor

mergify bot commented Jan 30, 2024

backport branch-3.2

✅ Backports have been created

mergify bot pushed a commit that referenced this pull request Jan 30, 2024
)

Signed-off-by: Gavin <yangguansuo@starrocks.com>
(cherry picked from commit b054af8)

# Conflicts:
#	be/src/block_cache/cachelib_wrapper.cpp
#	be/src/block_cache/cachelib_wrapper.h
#	build.sh
#	run-be-ut.sh
kevincai pushed a commit that referenced this pull request Jan 31, 2024
…port #40220)  (#40375)

Signed-off-by: Gavin <yangguansuo@starrocks.com>
liubotao pushed a commit to liubotao/starrocks that referenced this pull request Jan 31, 2024
…rRocks#40220)

Signed-off-by: Gavin <yangguansuo@starrocks.com>
Signed-off-by: liubotao <316945435@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants