-
Notifications
You must be signed in to change notification settings - Fork 18
Adds a method to get all related records from case graph #1460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThis update extends the storage utility interface by adding a new method for bulk retrieval of record IDs, along with its implementation in the dummy storage utility. The DAG class now includes a method that performs a breadth-first search to locate connected records. Test classes and JSON files have been modified to support an additional relationship outcome parameter, allowing enhanced validation of case purges. Existing functionalities remain unaffected while the new additions expand data retrieval and testing capabilities. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant StorageUtility
participant InternalMethod
Client->>StorageUtility: getBulkIdsForIndex(metaField, matchingValues)
StorageUtility->>InternalMethod: getIDsForValues(metaField, matchingValues)
InternalMethod-->>StorageUtility: List of IDs
StorageUtility-->>Client: Returns Vector<Integer>
sequenceDiagram
participant Client
participant DAG
participant Helper
Client->>DAG: findConnectedRecords(recordIds)
loop For each record in queue
DAG->>Helper: enqueueUnvisitedNeighbors(currentRecord, edges, queue, visited)
Helper-->>DAG: Add unvisited neighbors
end
DAG-->>Client: Returns set of connected records
sequenceDiagram
participant Runner as TestRunner
participant CaseTest as CasePurgeTest
participant Storage
Runner->>CaseTest: executeTest()
CaseTest->>Storage: getFullCaseGraph(ownerIds)
Storage-->>CaseTest: fullCaseGraph
CaseTest->>CaseTest: Populate relation outcomes and compare with graph
CaseTest-->>Runner: Assert test result
Possibly related PRs
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
ctsims
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very minor feedback on code structure / naming, once addressed everything else is good.
| return roots; | ||
| } | ||
|
|
||
| public Set<I> getRelatedRecords(Set<I> recordIds) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These methods should have clearer definitions of what they refer to, mostly this public one. "Related records" is a fairly weak description of what is / isn't included in the DAG walk (Is it all subgraphs which include one of the included nodes?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return visited; | ||
| } | ||
|
|
||
| private void addNeighbors(Hashtable<I, Vector<Edge<I, E>>> edges, I current, LinkedList<I> queue, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Add" is likely to be a misleading name for this method. As stated it sounds like it would be adding neighbors to the DAG itself.
I'd probably just rename this to enqueueNeighbors or accumulateNeighbors, but it should be clear that the method itself is stateless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
b8c13f1 to
24da04f
Compare
ctsims
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I'd like an update to the vocab still, but can come after merge
| } | ||
|
|
||
| /** | ||
| * Performs a breadth-first search (BFS) to find all related records |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More nitpicking:
1 - the BFS part of this is (from my perspective) an implementation detail that I don't think you want. We should be able to change the implementation of the search in the future and stating in the docstring that it's BFS locks us in (or 'should' lock us in) to that behavior in the future.
2 - "Record" is a downstream implementation detail here (IE: The use of the DAG holds records in your case, but that's not fundamentally part of the data type), and shouldn't be used in the descriptions or the variable names. All of the other methods use the shared vocabulary of the DAG (Node, Edge, etc), and this method should do the same.
3 - The big thing this docstring needs to communicate isn't how it's doing it, just a more precise description of what the outcomes and concepts are. Words like "Related" aren't really appropriate for the root datatype, so it's ambiguous what the method returns. I think you just want a specific sentence here like "Return all nodes which are reachable from any of the set of input nodes." I'm not sure if "reachable" in this context is actually quite right since you're traversing edges both directions, there might be another word that formalizes edge traversal in either direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that makes a lot of sense, corrected here - 6d9ea68
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (1)
src/test/resources/case_relationship_tests.json (1)
563-594: Consider simplifying redundant related_cases entries.In this test case, all four
related_casesarrays contain identical elements. This redundancy could be simplified to reduce file size and improve readability, especially for large test cases like this one."relation_outcome": [ { "related_cases": [ "a", "b", "c", "d" ] } - , - { - "related_cases": [ - "a", - "b", - "c", - "d" - ] - }, - { - "related_cases": [ - "a", - "b", - "c", - "d" - ] - }, - { - "related_cases": [ - "a", - "b", - "c", - "d" - ] - } ]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
src/main/java/org/javarosa/core/services/storage/IStorageUtilityIndexed.java(1 hunks)src/main/java/org/javarosa/core/services/storage/util/DummyIndexedStorageUtility.java(1 hunks)src/main/java/org/javarosa/core/util/DAG.java(2 hunks)src/test/java/org/commcare/test/utilities/CasePurgeTest.java(6 hunks)src/test/resources/case_relationship_tests.json(35 hunks)
🔇 Additional comments (17)
src/main/java/org/javarosa/core/services/storage/util/DummyIndexedStorageUtility.java (1)
370-375: Implementation looks correct and consistent with existing patterns.The implementation of
getBulkIdsForIndexfollows the same approach as the existinggetBulkRecordsForIndexmethod, delegating to the existinggetIDsForValuesmethod and then converting the result to a Vector. This maintains consistency in the implementation pattern.src/test/resources/case_relationship_tests.json (2)
26-41: Structured and consistent data addition for test cases.The new
relation_outcomefield provides a detailed specification of related cases, which will be valuable for testing the new functionality. The structure is consistent with the rest of the file and follows a clear pattern.
59-71: Good test coverage for bidirectional relationships.The test case includes both "d to b" and "b to d" relationships, which thoroughly tests the bidirectional traversal of the case graph. This will help ensure that the
findConnectedRecordsmethod correctly handles bidirectional relationships.src/main/java/org/javarosa/core/services/storage/IStorageUtilityIndexed.java (1)
283-290: JavaDoc is clear and consistent with existing patterns.The method signature and documentation are well-structured and follow the existing patterns in the interface. The documentation clearly explains the purpose of the method and its parameters.
src/main/java/org/javarosa/core/util/DAG.java (4)
6-7: Import additions are appropriate.The addition of the import statements for
LinkedListandSetis appropriate for the new functionality.
143-164: Consider revising method name and JavaDoc to avoid implementation details.The current method name and documentation include implementation details like "BFS" and domain-specific terminology like "records" that aren't part of the DAG abstraction.
Consider:
- /** - * Performs a breadth-first search (BFS) to find all related records - * starting from the given set of record IDs by traversing both - * forward and inverse edges. - * - * @param recordIds The set of starting record IDs. - * @return A set containing all reachable records. - */ - public Set<I> findConnectedRecords(Set<I> recordIds) { + /** + * Finds all nodes reachable from the given set of starting nodes + * by traversing both forward and inverse edges in the graph. + * + * @param startNodes The set of starting nodes. + * @return A set containing all reachable nodes. + */ + public Set<I> findReachableNodes(Set<I> startNodes) {This change removes the implementation detail (BFS) and uses the appropriate DAG terminology (nodes vs. records).
166-177: Method name could be more precise.The helper method's name could be more precise to match the DAG's domain vocabulary.
- private void enqueueUnvisitedNeighbors(Hashtable<I, Vector<Edge<I, E>>> edges, I current, LinkedList<I> queue, + private void enqueueUnvisitedNodes(Hashtable<I, Vector<Edge<I, E>>> edges, I current, LinkedList<I> queue,Also, remember to update the corresponding method call in
findConnectedRecordsif you make this change.
156-158: Efficient cycle detection in BFS implementation.The early check for already visited nodes is an efficient way to avoid processing the same node multiple times and prevent potential infinite loops in case of cycles.
src/test/java/org/commcare/test/utilities/CasePurgeTest.java (9)
3-4: Good addition of the static import.The static import of
getFullCaseGraphfromCasePurgeFilteris appropriate here since it's used directly in the code multiple times.
11-11: Appropriate imports for new functionality.The imports for
DAG,HashMap, and the adjustedSetimport align well with the new functionality being added to the test class.Also applies to: 23-23, 26-26
59-64: Proper extension of the parameter array size and JSON keys.The parameter array has been correctly updated to accommodate the new
relation_outcomeparameter, and the JSON keys array has been extended accordingly.
84-84: Good data structure choice for relation outcomes.Using a
HashMap<String, HashSet<String>>to store relation outcomes is an appropriate choice as it allows for efficient lookups of related cases by case ID.
87-90: Constructor signature properly updated.The constructor signature has been correctly updated to include the new
relationOutcomesparameter, and the parameter is passed to thecreateTestObjectsFromParametersmethod.
93-96: Method signature and parameter handling correctly updated.The
createTestObjectsFromParametersmethod signature has been properly updated to include the new parameter, and the method correctly checks ifrelationOutcomesis null before attempting to process it.Also applies to: 114-117
166-166: Appropriate use of the imported graph utility.The code correctly uses the statically imported
getFullCaseGraphmethod to generate the case graph for the purge filter.
169-170: Good comments explaining graph recreation.The comment clearly explains why the graph needs to be recreated after purging cases, which is helpful for maintaining code clarity.
180-180:Details
✅ Verification successful
Verify usage of DAG.findConnectedRecords method.
The code uses the
findConnectedRecordsmethod from theDAGclass to find connected records starting from a single input case. Make sure this is the correct method to use for finding all related cases in the graph.
🏁 Script executed:
#!/bin/bash # Looking for the implementation of findConnectedRecords in DAG class to confirm its behavior rg -A 10 "findConnectedRecords" --type javaLength of output: 2159
Usage of DAG.findConnectedRecords is Correct
- The
findConnectedRecordsmethod inDAG.javais implemented to traverse all connected nodes using both direct and inverse edges, ensuring that it returns the complete connected component for a given set of record IDs.- The test in
CasePurgeTest.javapasses the expected input (a set containing the initial case) and validates the outcome via assertions, confirming that the method behaves as intended.- No changes to the invocation or underlying logic are necessary.
Product Description
Required for dimagi/commcare-android#2955
Safety Assurance
Automated test coverage
PR adds a few tests to verify basic functionality
Special deploy instructions
Rollback instructions
Review
Duplicate PR
Automatically duplicate this PR as defined in contributing.md.
Summary by CodeRabbit
New Features
Tests