-
Notifications
You must be signed in to change notification settings - Fork 74
feat(HGraph): support use tau-mng #1338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: HeHuMing <hehuming434@gmail.com>
Reviewer's GuideIntroduce TAU-based edge pruning alongside existing alpha-based heuristic by templating the edge-selection logic, extending HGraph and Pyramid implementations to choose the mode at runtime, enhancing parameter parsing to include tau, adding comprehensive unit tests for TAU-MNG behavior, and updating benchmark configurations. ER diagram for new TAU parameter in HGraph and PyramiderDiagram
HGRAPH {
float alpha
float tau
string select_edge_param
}
PYRAMID {
float alpha
float tau
string select_edge_param
}
HGRAPH_PARAMETER {
float alpha
float tau
string selectedgeparam
}
PYRAMID_PARAMETERS {
float alpha
float tau
string selectedgeparam
}
HGRAPH_PARAMETER ||--o| HGRAPH : configures
PYRAMID_PARAMETERS ||--o| PYRAMID : configures
Class diagram for updated edge selection logic (TAU and ALPHA)classDiagram
class HGraph {
+float alpha
+float tau
+string select_edge_param
}
class Pyramid {
+float alpha
+float tau
+string select_edge_param
}
class HGraphParameter {
+float alpha
+float tau
+string selectedgeparam
}
class PyramidParameters {
+float alpha
+float tau
+string selectedgeparam
}
class EdgeSelectionParam {
<<enum>>
ALPHA
TAU
}
HGraphParameter <|-- HGraph
PyramidParameters <|-- Pyramid
HGraph --> EdgeSelectionParam
Pyramid --> EdgeSelectionParam
HGraphParameter --> EdgeSelectionParam
PyramidParameters --> EdgeSelectionParam
Class diagram for templated edge selection functionsclassDiagram
class PruningStrategy {
+select_edges_by_heuristic<Param>(edges, max_size, flatten, allocator, param_value)
+mutually_connect_new_element<Param>(cur_c, top_candidates, graph, flatten, neighbors_mutexes, allocator, param_value)
}
class EdgeSelectionParam {
<<enum>>
ALPHA
TAU
}
PruningStrategy --> EdgeSelectionParam
Flow diagram for runtime edge selection (ALPHA vs TAU)flowchart TD
A["Parse parameters (alpha/tau)"] --> B["Set select_edge_param"]
B --> C{select_edge_param}
C -- "alpha" --> D["mutually_connect_new_element<ALPHA>"]
C -- "tau" --> E["mutually_connect_new_element<TAU>"]
D --> F["select_edges_by_heuristic<ALPHA>"]
E --> G["select_edges_by_heuristic<TAU>"]
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Summary of ChangesHello @HeHuMing, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request integrates the TAU-mng edge pruning algorithm into the HGRAPH and Pyramid index structures. The primary goal is to enhance the Query Per Second (QPS) performance by providing an alternative, more efficient edge selection strategy during graph construction. The changes involve introducing a new 'tau' parameter, modifying core graph building functions to support conditional edge selection, and updating parameter handling across the system. This allows for flexible configuration and optimization of the graph index. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `src/algorithm/pyramid.cpp:406-414` </location>
<code_context>
- neighbors_mutex_,
- allocator_,
- alpha_);
+ if (select_edge_param == "alpha") {
+ mutually_connect_new_element<EdgeSelectionParam::ALPHA>(inner_id,
+ result,
+ this->bottom_graph_,
+ flatten_codes,
+ neighbors_mutex_,
+ allocator_,
+ alpha_);
+ } else if (select_edge_param == "tau") {
+ mutually_connect_new_element<EdgeSelectionParam::TAU>(inner_id,
+ result,
</code_context>
<issue_to_address>
**issue (bug_risk):** Incorrect template parameter used for 'tau' branch in mutually_connect_new_element.
Using EdgeSelectionParam::ALPHA instead of EdgeSelectionParam::TAU in the 'tau' branch may cause the function to apply the wrong edge selection logic.
</issue_to_address>
### Comment 2
<location> `src/impl/pruning_strategy_test.cpp:299-263` </location>
<code_context>
+ REQUIRE(kept == std::vector<InnerIdType>{1, 2});
+ }
+
+ SECTION("Tau-MNG with very small tau (approaching original heuristic)") {
+ auto edges = std::make_shared<StandardHeap<true, false>>(allocator.get(), -1);
+ edges->Push(d01, 1);
+ edges->Push(d02, 2);
+ edges->Push(d03, 3);
+ edges->Push(d04, 4);
+
+ // τ-MNG with very small tau=0.1 should behave similarly to original heuristic
+ // For tau=0.1, 3*tau=0.3
+ // This will be very close to original heuristic behavior
+ select_edges_by_heuristic<EdgeSelectionParam::TAU>(
+ edges, 3, flatten, allocator.get(), 0.1F);
+
+ REQUIRE(edges->Size() == 1);
+ std::vector<InnerIdType> kept;
+ while (!edges->Empty()) {
+ kept.push_back(edges->Top().second);
+ edges->Pop();
+ }
+ std::sort(kept.begin(), kept.end());
+ REQUIRE(kept == std::vector<InnerIdType>{1});
+ }
+
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a test for negative tau values.
Adding a test for negative tau will verify that the function correctly handles invalid input, either by reverting to a default behavior or raising an appropriate error.
Suggested implementation:
```cpp
SECTION("Tau-MNG with very small tau (approaching original heuristic)") {
auto edges = std::make_shared<StandardHeap<true, false>>(allocator.get(), -1);
edges->Push(d01, 1);
edges->Push(d02, 2);
edges->Push(d03, 3);
edges->Push(d04, 4);
// τ-MNG with very small tau=0.1 should behave similarly to original heuristic
// For tau=0.1, 3*tau=0.3
// This will be very close to original heuristic behavior
select_edges_by_heuristic<EdgeSelectionParam::TAU>(
edges, 3, flatten, allocator.get(), 0.1F);
REQUIRE(edges->Size() == 1);
std::vector<InnerIdType> kept;
while (!edges->Empty()) {
kept.push_back(edges->Top().second);
edges->Pop();
}
std::sort(kept.begin(), kept.end());
REQUIRE(kept == std::vector<InnerIdType>{1});
}
SECTION("Tau-MNG with negative tau value") {
auto edges = std::make_shared<StandardHeap<true, false>>(allocator.get(), -1);
edges->Push(d01, 1);
edges->Push(d02, 2);
edges->Push(d03, 3);
edges->Push(d04, 4);
// Expect the function to throw or handle negative tau gracefully
// If the function is supposed to throw, use REQUIRE_THROWS
// If it should fallback, check the output
// Here, we assume it should throw an exception for invalid tau
REQUIRE_THROWS(select_edges_by_heuristic<EdgeSelectionParam::TAU>(
edges, 3, flatten, allocator.get(), -0.5F));
}
```
If `select_edges_by_heuristic` does not currently throw for negative tau, you may need to update its implementation to check for invalid tau values and throw an appropriate exception (e.g., `std::invalid_argument`). Adjust the test accordingly if the intended behavior is to fallback to a default value instead.
</issue_to_address>
### Comment 3
<location> `src/impl/pruning_strategy_test.cpp:322-296` </location>
<code_context>
+ SECTION("Tau-MNG with short edge forced connection") {
</code_context>
<issue_to_address>
**suggestion (testing):** Test case for max_size less than number of forced short edges.
Please add a test where short edges outnumber max_size to confirm the forced connection logic handles this scenario correctly and its behavior is documented.
</issue_to_address>
### Comment 4
<location> `src/impl/pruning_strategy_test.cpp:351-360` </location>
<code_context>
+ SECTION("Mutual connection with Tau-MNG algorithm") {
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a test for mutual connection with tau=0 and tau < 0.
Please include tests for tau=0 and tau<0 to cover fallback and error handling in mutual connection scenarios.
Suggested implementation:
```cpp
SECTION("Mutual connection with Tau-MNG algorithm") {
auto graph_param = std::make_shared<GraphDataCellParameter>();
graph_param->io_parameter_ = std::make_shared<MemoryIOParameter>();
graph_param->max_degree_ = 4;
auto graph = GraphInterface::MakeInstance(graph_param, common_param);
auto candidates = std::make_shared<StandardHeap<true, false>>(allocator.get(), -1);
candidates->Push(d01, 1);
candidates->Push(d02, 2);
candidates->Push(d03, 3);
candidates->Push(d04, 4);
}
SECTION("Mutual connection with Tau-MNG algorithm, tau=0 (fallback)") {
auto graph_param = std::make_shared<GraphDataCellParameter>();
graph_param->io_parameter_ = std::make_shared<MemoryIOParameter>();
graph_param->max_degree_ = 4;
graph_param->tau_ = 0; // tau=0
auto graph = GraphInterface::MakeInstance(graph_param, common_param);
auto candidates = std::make_shared<StandardHeap<true, false>>(allocator.get(), -1);
candidates->Push(d01, 1);
candidates->Push(d02, 2);
// Add your mutual connection logic here, e.g.:
// auto edges = PruningStrategy::MutualConnection(graph, candidates, ...);
// REQUIRE(edges->Size() == expected_size);
// Add more assertions as needed to verify fallback behavior
}
SECTION("Mutual connection with Tau-MNG algorithm, tau<0 (error handling)") {
auto graph_param = std::make_shared<GraphDataCellParameter>();
graph_param->io_parameter_ = std::make_shared<MemoryIOParameter>();
graph_param->max_degree_ = 4;
graph_param->tau_ = -1; // tau<0
auto graph = GraphInterface::MakeInstance(graph_param, common_param);
auto candidates = std::make_shared<StandardHeap<true, false>>(allocator.get(), -1);
candidates->Push(d01, 1);
candidates->Push(d02, 2);
// Add your mutual connection logic here, e.g.:
// auto edges = PruningStrategy::MutualConnection(graph, candidates, ...);
// REQUIRE(edges->Size() == expected_size);
// Add more assertions as needed to verify error handling or expected behavior
}
```
You will need to fill in the mutual connection logic and assertions for the new test sections, using the same approach as in your existing tests. Adjust the expected results and error handling checks according to your implementation of tau=0 and tau<0 in the mutual connection algorithm.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| if (select_edge_param == "alpha") { | ||
| mutually_connect_new_element<EdgeSelectionParam::ALPHA>(inner_id, | ||
| results, | ||
| node->graph_, | ||
| flatten_interface_ptr_, | ||
| empty_mutex, | ||
| allocator_, | ||
| alpha_); | ||
| } else if (select_edge_param == "tau") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Incorrect template parameter used for 'tau' branch in mutually_connect_new_element.
Using EdgeSelectionParam::ALPHA instead of EdgeSelectionParam::TAU in the 'tau' branch may cause the function to apply the wrong edge selection logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for the TAU-mng edge pruning algorithm as an alternative to the existing alpha-based pruning. The changes are well-structured, refactoring the pruning strategy into a templated function and adding comprehensive unit tests for the new logic. However, I've identified a critical bug in HGraph and a high-severity bug in Pyramid due to copy-paste errors, which would prevent the new algorithm from functioning correctly. I've also included some medium-severity suggestions to make the new parameter handling more robust by enforcing mutual exclusivity between alpha and tau. Please address the identified bugs to ensure the feature works as intended.
| if (select_edge_param == "alpha") { | ||
| mutually_connect_new_element<EdgeSelectionParam::ALPHA>(inner_id, | ||
| result, | ||
| this->bottom_graph_, | ||
| flatten_codes, | ||
| neighbors_mutex_, | ||
| allocator_, | ||
| alpha_); | ||
| } else if (select_edge_param == "tau") { | ||
| mutually_connect_new_element<EdgeSelectionParam::TAU>(inner_id, | ||
| result, | ||
| this->bottom_graph_, | ||
| flatten_codes, | ||
| neighbors_mutex_, | ||
| allocator_, | ||
| tau_); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There appears to be a copy-paste error in this block. The mutually_connect_new_element function is called with this->bottom_graph_ for both alpha and tau strategies inside the loop over route_graphs_. It should be using route_graphs_[j] to connect the new element to the correct hierarchical level, as was done in the original code. This will cause incorrect graph construction for all but the bottom layer.
Additionally, this block of code is almost identical to the one for bottom_graph_ above (lines 1034-1050). You could consider refactoring this logic into a helper function to avoid duplication and prevent such errors in the future.
if (select_edge_param == "alpha") {
mutually_connect_new_element<EdgeSelectionParam::ALPHA>(inner_id,
result,
route_graphs_[j],
flatten_codes,
neighbors_mutex_,
allocator_,
alpha_);
} else if (select_edge_param == "tau") {
mutually_connect_new_element<EdgeSelectionParam::TAU>(inner_id,
result,
route_graphs_[j],
flatten_codes,
neighbors_mutex_,
allocator_,
tau_);
}| mutually_connect_new_element<EdgeSelectionParam::ALPHA>(inner_id, | ||
| results, | ||
| node->graph_, | ||
| flatten_interface_ptr_, | ||
| empty_mutex, | ||
| allocator_, | ||
| tau_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be a copy-paste error here. Inside the else if (select_edge_param == "tau") block, you are calling mutually_connect_new_element<EdgeSelectionParam::ALPHA>. This should be mutually_connect_new_element<EdgeSelectionParam::TAU> to correctly apply the TAU-mng pruning strategy. With the current code, the tau parameter will not be used, and the alpha strategy will be applied instead.
| mutually_connect_new_element<EdgeSelectionParam::ALPHA>(inner_id, | |
| results, | |
| node->graph_, | |
| flatten_interface_ptr_, | |
| empty_mutex, | |
| allocator_, | |
| tau_); | |
| mutually_connect_new_element<EdgeSelectionParam::TAU>(inner_id, | |
| results, | |
| node->graph_, | |
| flatten_interface_ptr_, | |
| empty_mutex, | |
| allocator_, | |
| tau_); |
| if (json.Contains(ALPHA_KEY)) { | ||
| this->alpha = json[ALPHA_KEY].GetFloat(); | ||
| this->selectedgeparam = "alpha"; | ||
| } else if (json.Contains(TAU_KEY)) { | ||
| this->tau = json[TAU_KEY].GetFloat(); | ||
| this->selectedgeparam = "tau"; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current logic gives precedence to alpha if both alpha and tau are present in the configuration. While this is a valid approach, it might be confusing for users who might provide both parameters by mistake. To improve clarity and prevent misconfiguration, consider adding a check to ensure that alpha and tau are mutually exclusive, and throw an exception if both are defined.
| if (json.Contains(ALPHA_KEY)) { | |
| this->alpha = json[ALPHA_KEY].GetFloat(); | |
| this->selectedgeparam = "alpha"; | |
| } else if (json.Contains(TAU_KEY)) { | |
| this->tau = json[TAU_KEY].GetFloat(); | |
| this->selectedgeparam = "tau"; | |
| } | |
| if (json.Contains(ALPHA_KEY) && json.Contains(TAU_KEY)) { | |
| throw VsagException(ErrorType::INVALID_ARGUMENT, "alpha and tau are mutually exclusive, please provide only one."); | |
| } | |
| if (json.Contains(ALPHA_KEY)) { | |
| this->alpha = json[ALPHA_KEY].GetFloat(); | |
| this->selectedgeparam = "alpha"; | |
| } else if (json.Contains(TAU_KEY)) { | |
| this->tau = json[TAU_KEY].GetFloat(); | |
| this->selectedgeparam = "tau"; | |
| } |
| if (json.Contains(ALPHA_KEY)) { | ||
| this->alpha = json[ALPHA_KEY].GetFloat(); | ||
| this->selectedgeparam = "alpha"; | ||
| } else if (json.Contains(TAU_KEY)) { | ||
| this->tau = json[TAU_KEY].GetFloat(); | ||
| this->selectedgeparam = "tau"; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to hgraph_parameter.cpp, the current logic gives precedence to alpha if both alpha and tau are present. This could lead to user confusion. It would be more robust to enforce mutual exclusivity by throwing an error if both parameters are provided in the configuration.
if (json.Contains(ALPHA_KEY) && json.Contains(TAU_KEY)) {
throw VsagException(ErrorType::INVALID_ARGUMENT, "alpha and tau are mutually exclusive, please provide only one.");
}
if (json.Contains(ALPHA_KEY)) {
this->alpha = json[ALPHA_KEY].GetFloat();
this->selectedgeparam = "alpha";
} else if (json.Contains(TAU_KEY)) {
this->tau = json[TAU_KEY].GetFloat();
this->selectedgeparam = "tau";
}
The introduction of TAU-mng edge pruning algorithm has improved QPS.
Summary by Sourcery
Introduce TAU-MNG support alongside the existing alpha-based pruning in HGraph and Pyramid by templating edge selection logic, adding a 'tau' parameter and runtime selection, and covering it with new unit tests.
New Features:
Enhancements:
Documentation:
Tests:
Chores: