Skip to content

Conversation

@shubham1g5
Copy link
Contributor

@shubham1g5 shubham1g5 commented Feb 19, 2025

Product Description

Required for dimagi/commcare-android#2955

Safety Assurance

Automated test coverage

PR adds a few tests to verify basic functionality

Special deploy instructions

  • This PR can be deployed after merge with no further considerations.

Rollback instructions

  • This PR can be reverted after deploy with no further considerations.

Review

  • The set of people pinged as reviewers is appropriate for the level of risk of the change.

Duplicate PR

Automatically duplicate this PR as defined in contributing.md.

Summary by CodeRabbit

  • New Features

    • Introduced bulk record retrieval to streamline the fetching of multiple record identifiers.
    • Enhanced connectivity tracking to efficiently discover and link related records.
  • Tests

    • Updated testing scenarios to incorporate relationship outcomes, improving the validation of case purging workflows and ensuring data integrity.

@coderabbitai
Copy link

coderabbitai bot commented Feb 19, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This update extends the storage utility interface by adding a new method for bulk retrieval of record IDs, along with its implementation in the dummy storage utility. The DAG class now includes a method that performs a breadth-first search to locate connected records. Test classes and JSON files have been modified to support an additional relationship outcome parameter, allowing enhanced validation of case purges. Existing functionalities remain unaffected while the new additions expand data retrieval and testing capabilities.

Changes

File(s) Change Summary
src/main/java/org/javarosa/.../IStorageUtilityIndexed.java
src/main/java/org/javarosa/.../DummyIndexedStorageUtility.java
Added new method getBulkIdsForIndex(String metaFieldName, Collection<String> matchingValues) for bulk retrieval of record IDs; implementation in DummyIndexedStorageUtility calls getIDsForValues and returns a Vector<Integer>.
src/main/java/org/javarosa/.../DAG.java Introduced findConnectedRecords(Set<I> recordIds) using BFS to gather connected records; added helper method enqueueUnvisitedNeighbors and imported necessary classes (LinkedList, Set).
src/test/java/org/commcare/test/utilities/CasePurgeTest.java
src/test/resources/case_relationship_tests.json
Updated the test class to include a new relation outcome parameter in the constructor and parameter parsing; added populateRelationOutcomes method and modified test logic to validate case purge relationships with the new JSON key relation_outcome.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant StorageUtility
    participant InternalMethod
    Client->>StorageUtility: getBulkIdsForIndex(metaField, matchingValues)
    StorageUtility->>InternalMethod: getIDsForValues(metaField, matchingValues)
    InternalMethod-->>StorageUtility: List of IDs
    StorageUtility-->>Client: Returns Vector<Integer>
Loading
sequenceDiagram
    participant Client
    participant DAG
    participant Helper
    Client->>DAG: findConnectedRecords(recordIds)
    loop For each record in queue
        DAG->>Helper: enqueueUnvisitedNeighbors(currentRecord, edges, queue, visited)
        Helper-->>DAG: Add unvisited neighbors
    end
    DAG-->>Client: Returns set of connected records
Loading
sequenceDiagram
    participant Runner as TestRunner
    participant CaseTest as CasePurgeTest
    participant Storage
    Runner->>CaseTest: executeTest()
    CaseTest->>Storage: getFullCaseGraph(ownerIds)
    Storage-->>CaseTest: fullCaseGraph
    CaseTest->>CaseTest: Populate relation outcomes and compare with graph
    CaseTest-->>Runner: Assert test result
Loading

Possibly related PRs

Poem

I'm a bunny in the code garden bright,
Hopping as new methods take flight.
Bulk IDs and connected records dance along,
While tests ensure the data sings its song.
With floppy ears and a joyful beat, I code in leaps so neat! 🐇💻


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@shubham1g5 shubham1g5 marked this pull request as ready for review February 27, 2025 15:17
Copy link
Member

@ctsims ctsims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor feedback on code structure / naming, once addressed everything else is good.

return roots;
}

public Set<I> getRelatedRecords(Set<I> recordIds) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods should have clearer definitions of what they refer to, mostly this public one. "Related records" is a fairly weak description of what is / isn't included in the DAG walk (Is it all subgraphs which include one of the included nodes?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return visited;
}

private void addNeighbors(Hashtable<I, Vector<Edge<I, E>>> edges, I current, LinkedList<I> queue,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Add" is likely to be a misleading name for this method. As stated it sounds like it would be adding neighbors to the DAG itself.

I'd probably just rename this to enqueueNeighbors or accumulateNeighbors, but it should be clear that the method itself is stateless.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shubham1g5 shubham1g5 force-pushed the cacheExpirationChanges branch from b8c13f1 to 24da04f Compare March 3, 2025 07:39
@shubham1g5 shubham1g5 requested a review from ctsims March 3, 2025 07:40
ctsims
ctsims previously approved these changes Mar 3, 2025
Copy link
Member

@ctsims ctsims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I'd like an update to the vocab still, but can come after merge

}

/**
* Performs a breadth-first search (BFS) to find all related records
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More nitpicking:

1 - the BFS part of this is (from my perspective) an implementation detail that I don't think you want. We should be able to change the implementation of the search in the future and stating in the docstring that it's BFS locks us in (or 'should' lock us in) to that behavior in the future.
2 - "Record" is a downstream implementation detail here (IE: The use of the DAG holds records in your case, but that's not fundamentally part of the data type), and shouldn't be used in the descriptions or the variable names. All of the other methods use the shared vocabulary of the DAG (Node, Edge, etc), and this method should do the same.
3 - The big thing this docstring needs to communicate isn't how it's doing it, just a more precise description of what the outcomes and concepts are. Words like "Related" aren't really appropriate for the root datatype, so it's ambiguous what the method returns. I think you just want a specific sentence here like "Return all nodes which are reachable from any of the set of input nodes." I'm not sure if "reachable" in this context is actually quite right since you're traversing edges both directions, there might be another word that formalizes edge traversal in either direction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes a lot of sense, corrected here - 6d9ea68

Base automatically changed from cacheAndIndexModifications to master March 4, 2025 02:51
@shubham1g5 shubham1g5 dismissed ctsims’s stale review March 4, 2025 02:51

The base branch was changed.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
src/test/resources/case_relationship_tests.json (1)

563-594: Consider simplifying redundant related_cases entries.

In this test case, all four related_cases arrays contain identical elements. This redundancy could be simplified to reduce file size and improve readability, especially for large test cases like this one.

"relation_outcome": [
    {
        "related_cases": [
            "a",
            "b",
            "c",
            "d"
        ]
    }
-   ,
-   {
-       "related_cases": [
-           "a",
-           "b",
-           "c",
-           "d"
-       ]
-   },
-   {
-       "related_cases": [
-           "a",
-           "b",
-           "c",
-           "d"
-       ]
-   },
-   {
-       "related_cases": [
-           "a",
-           "b",
-           "c",
-           "d"
-       ]
-   }
]
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f5aee21 and 4f97f68.

📒 Files selected for processing (5)
  • src/main/java/org/javarosa/core/services/storage/IStorageUtilityIndexed.java (1 hunks)
  • src/main/java/org/javarosa/core/services/storage/util/DummyIndexedStorageUtility.java (1 hunks)
  • src/main/java/org/javarosa/core/util/DAG.java (2 hunks)
  • src/test/java/org/commcare/test/utilities/CasePurgeTest.java (6 hunks)
  • src/test/resources/case_relationship_tests.json (35 hunks)
🔇 Additional comments (17)
src/main/java/org/javarosa/core/services/storage/util/DummyIndexedStorageUtility.java (1)

370-375: Implementation looks correct and consistent with existing patterns.

The implementation of getBulkIdsForIndex follows the same approach as the existing getBulkRecordsForIndex method, delegating to the existing getIDsForValues method and then converting the result to a Vector. This maintains consistency in the implementation pattern.

src/test/resources/case_relationship_tests.json (2)

26-41: Structured and consistent data addition for test cases.

The new relation_outcome field provides a detailed specification of related cases, which will be valuable for testing the new functionality. The structure is consistent with the rest of the file and follows a clear pattern.


59-71: Good test coverage for bidirectional relationships.

The test case includes both "d to b" and "b to d" relationships, which thoroughly tests the bidirectional traversal of the case graph. This will help ensure that the findConnectedRecords method correctly handles bidirectional relationships.

src/main/java/org/javarosa/core/services/storage/IStorageUtilityIndexed.java (1)

283-290: JavaDoc is clear and consistent with existing patterns.

The method signature and documentation are well-structured and follow the existing patterns in the interface. The documentation clearly explains the purpose of the method and its parameters.

src/main/java/org/javarosa/core/util/DAG.java (4)

6-7: Import additions are appropriate.

The addition of the import statements for LinkedList and Set is appropriate for the new functionality.


143-164: Consider revising method name and JavaDoc to avoid implementation details.

The current method name and documentation include implementation details like "BFS" and domain-specific terminology like "records" that aren't part of the DAG abstraction.

Consider:

- /**
-  * Performs a breadth-first search (BFS) to find all related records
-  * starting from the given set of record IDs by traversing both
-  * forward and inverse edges.
-  *
-  * @param recordIds The set of starting record IDs.
-  * @return A set containing all reachable records.
-  */
- public Set<I> findConnectedRecords(Set<I> recordIds) {
+ /**
+  * Finds all nodes reachable from the given set of starting nodes
+  * by traversing both forward and inverse edges in the graph.
+  *
+  * @param startNodes The set of starting nodes.
+  * @return A set containing all reachable nodes.
+  */
+ public Set<I> findReachableNodes(Set<I> startNodes) {

This change removes the implementation detail (BFS) and uses the appropriate DAG terminology (nodes vs. records).


166-177: Method name could be more precise.

The helper method's name could be more precise to match the DAG's domain vocabulary.

- private void enqueueUnvisitedNeighbors(Hashtable<I, Vector<Edge<I, E>>> edges, I current, LinkedList<I> queue,
+ private void enqueueUnvisitedNodes(Hashtable<I, Vector<Edge<I, E>>> edges, I current, LinkedList<I> queue,

Also, remember to update the corresponding method call in findConnectedRecords if you make this change.


156-158: Efficient cycle detection in BFS implementation.

The early check for already visited nodes is an efficient way to avoid processing the same node multiple times and prevent potential infinite loops in case of cycles.

src/test/java/org/commcare/test/utilities/CasePurgeTest.java (9)

3-4: Good addition of the static import.

The static import of getFullCaseGraph from CasePurgeFilter is appropriate here since it's used directly in the code multiple times.


11-11: Appropriate imports for new functionality.

The imports for DAG, HashMap, and the adjusted Set import align well with the new functionality being added to the test class.

Also applies to: 23-23, 26-26


59-64: Proper extension of the parameter array size and JSON keys.

The parameter array has been correctly updated to accommodate the new relation_outcome parameter, and the JSON keys array has been extended accordingly.


84-84: Good data structure choice for relation outcomes.

Using a HashMap<String, HashSet<String>> to store relation outcomes is an appropriate choice as it allows for efficient lookups of related cases by case ID.


87-90: Constructor signature properly updated.

The constructor signature has been correctly updated to include the new relationOutcomes parameter, and the parameter is passed to the createTestObjectsFromParameters method.


93-96: Method signature and parameter handling correctly updated.

The createTestObjectsFromParameters method signature has been properly updated to include the new parameter, and the method correctly checks if relationOutcomes is null before attempting to process it.

Also applies to: 114-117


166-166: Appropriate use of the imported graph utility.

The code correctly uses the statically imported getFullCaseGraph method to generate the case graph for the purge filter.


169-170: Good comments explaining graph recreation.

The comment clearly explains why the graph needs to be recreated after purging cases, which is helpful for maintaining code clarity.


180-180:

Details

✅ Verification successful

Verify usage of DAG.findConnectedRecords method.

The code uses the findConnectedRecords method from the DAG class to find connected records starting from a single input case. Make sure this is the correct method to use for finding all related cases in the graph.


🏁 Script executed:

#!/bin/bash
# Looking for the implementation of findConnectedRecords in DAG class to confirm its behavior

rg -A 10 "findConnectedRecords" --type java

Length of output: 2159


Usage of DAG.findConnectedRecords is Correct

  • The findConnectedRecords method in DAG.java is implemented to traverse all connected nodes using both direct and inverse edges, ensuring that it returns the complete connected component for a given set of record IDs.
  • The test in CasePurgeTest.java passes the expected input (a set containing the initial case) and validates the outcome via assertions, confirming that the method behaves as intended.
  • No changes to the invocation or underlying logic are necessary.

@shubham1g5 shubham1g5 merged commit 876d215 into master Mar 4, 2025
3 checks passed
@shubham1g5 shubham1g5 deleted the cacheExpirationChanges branch March 4, 2025 11:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants