Conversation
|
@milanmajchrak |
dspace-api/src/main/java/org/dspace/authorize/ResourcePolicyServiceImpl.java
Show resolved
Hide resolved
dspace-api/src/main/java/org/dspace/authorize/ResourcePolicyServiceImpl.java
Outdated
Show resolved
Hide resolved
dspace-oai/src/main/java/org/dspace/event/OAIIndexEventConsumer.java
Outdated
Show resolved
Hide resolved
dspace-oai/src/main/java/org/dspace/event/OAIIndexEventConsumer.java
Outdated
Show resolved
Hide resolved
dspace-oai/src/main/java/org/dspace/event/OAIIndexEventConsumer.java
Outdated
Show resolved
Hide resolved
dspace-oai/src/main/java/org/dspace/event/OAIIndexEventConsumer.java
Outdated
Show resolved
Hide resolved
dspace-api/src/main/java/org/dspace/authorize/ResourcePolicyServiceImpl.java
Outdated
Show resolved
Hide resolved
milanmajchrak
left a comment
There was a problem hiding this comment.
Also you have a merge conflict
|
@milanmajchrak I added test to test scenatious. |
milanmajchrak
left a comment
There was a problem hiding this comment.
Please remove the code duplicities.
dspace-api/src/main/java/org/dspace/authorize/ResourcePolicyServiceImpl.java
Outdated
Show resolved
Hide resolved
dspace-api/src/main/java/org/dspace/authorize/ResourcePolicyServiceImpl.java
Outdated
Show resolved
Hide resolved
dspace-api/src/main/java/org/dspace/authorize/ResourcePolicyServiceImpl.java
Show resolved
Hide resolved
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
WalkthroughThe changes refactor event logging and indexing functionalities across multiple modules. In the API, event logging for modifications now uses a dedicated method. A new consumer class processes OAI indexing events and a revised XOAI component handles caching and Solr indexing operations. In the server webapp, reindexing logic has been removed from several repositories and the dedicated SolrOAIReindexer class was deleted. A return statement was removed from a method affecting resource policy creation flow. Lastly, the configuration file has been updated to include the xoai event consumer and its filters. Changes
Sequence Diagram(s)sequenceDiagram
participant ED as Event Dispatcher
participant OEC as OAIIndexEventConsumer
participant XOAI as XOAI Component
participant Solr as Solr Server
participant Cache as Cache Services
ED->>OEC: Dispatch Event (Add/Modify/Delete)
OEC->>OEC: Collect and filter events
OEC->>XOAI: Trigger index update for Items
XOAI->>Solr: deleteItemByQuery (for each Item)
XOAI->>Solr: Re-index Items
XOAI->>Cache: Commit changes & clear caches
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
⏰ Context from checks skipped due to timeout of 90000ms (3)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Caution
Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 2
🔭 Outside diff range comments (1)
dspace-oai/src/main/java/org/dspace/event/OAIIndexEventConsumer.java (1)
157-169:⚠️ Potential issueRemove duplicated code block to avoid compile or runtime issues.
These lines appear to repeat the exception-handling and finalization logic already included above (lines 141-149). Keeping both copies may lead to compiler errors or unexpected behavior.
- throw e; - } finally { - if (Objects.nonNull(anonymousContext)) { - anonymousContext.complete(); - } - } - } - - public void finish(Context ctx) throws Exception { - // No-op - } -} + // Remove the duplicated lines (157-169) if they are artifacts from a merge or snippet error
🧹 Nitpick comments (14)
.github/workflows/trigger-builds.yml (1)
19-29: Enhance Shell Script Robustness in Build TriggeringThe current script block successfully authenticates with the GitHub CLI, fetches remote branches, and iterates over branches matching the
customer/pattern to trigger their builds.Points to consider:
- Error Handling: If no matching branches are found, the loop silently does nothing. It could be useful to add a check for an empty branch list or log a message indicating that no customer branches were detected.
- Quoting Variables: To prevent potential issues with branch names containing special characters or spaces, consider quoting the branch variable when used in commands.
Below is a suggested diff snippet that incorporates these improvements:
- git fetch --prune origin # Ensure remote refs are fetched - BRANCHES=$(git ls-remote --heads origin | awk -F'/' '{print $3"/"$4}' | grep '^customer/') - for branch in $(echo "$BRANCHES" | sed -e 's/[\[\]"]//g' -e 's/,/\n/g'); do - echo "Triggering build for branch: $branch" - gh workflow run build.yml --ref $branch - done + set -e # Exit on any error + git fetch --prune origin # Ensure remote refs are fetched + BRANCHES=$(git ls-remote --heads origin | awk -F'/' '{print $3"/"$4}' | grep '^customer/') + if [ -z "$BRANCHES" ]; then + echo "No customer branches found to trigger." + else + for branch in $(echo "$BRANCHES" | sed -e 's/[\[\]"]//g' -e 's/,/\n/g'); do + echo "Triggering build for branch: $branch" + gh workflow run build.yml --ref "$branch" + done + fiThese changes will make the script more robust and clearer in its intent.
dspace-api/src/main/resources/org/dspace/storage/rdbms/sqlmigration/h2/V7.6_2023.09.28__enforce_group_or_eperson_for_resourcepolicy.sql (1)
9-9: Cleaning up invalid resource policies.The SQL statement removes resource policies that don't have an associated EPerson or Group, which is a good preparatory step before enforcing the constraint. Consider adding a comment explaining the significance of this cleanup for future maintainers.
-DELETE FROM ResourcePolicy WHERE eperson_id is null and epersongroup_id is null; +-- Remove orphaned resource policies with no associated user or group +DELETE FROM ResourcePolicy WHERE eperson_id is null and epersongroup_id is null;dspace/config/clarin-dspace.cfg (1)
308-308: Uncommented metadata editing configurationThe line for allowing metadata editing has been uncommented but left empty. This appears to be preparation for future configuration.
Consider adding a comment explaining the intended use of this configuration and why it's currently empty.
dspace-oai/src/main/java/org/dspace/event/OAIIndexEventConsumer.java (3)
36-38: Consider removing or documenting this empty initialize() method.This method is currently empty and does not appear to be overridden. If it's not required by an interface contract or used by downstream subclasses, you can remove it to reduce boilerplate code.
47-100: Refactor nested ifs to improve maintainability.The large block of conditional checks in the
consumemethod makes the code difficult to read and maintain. Consider extracting smaller logic units (e.g., separate methods for handling items, bundles, and communities) or using a strategy pattern to reduce the complexity.
114-150: Consider verifying the new application context creation logic.Creating a new
AnnotationConfigApplicationContextfor indexing each time may introduce unnecessary overhead. If performance or resource usage becomes a concern, consider reusing an existing context, or deferring initialization until truly needed.dspace-oai/src/main/java/org/dspace/xoai/app/XOAI.java (3)
112-118: Consider lazy initialization for the Spring application context.Performing the
new AnnotationConfigApplicationContextin a static initializer or instance initializer might increase startup overhead. If performance or memory usage is a concern, consider lazy-loading these beans on demand.
734-744: Centralize or rename isTest() logic for clarity.Hardcoding
"jdbc:h2:mem:test"for detection can be brittle. Consider a configuration property or a dedicated environment check for test mode, especially if future test setups differ.
755-774: Include the original exception when re-throwing for better stack traces.Currently, the code discards the original exception cause when throwing a new
RuntimeException. Consider retaining it:- throw new RuntimeException("Cannot reindex the item with ID: " + item.getID() + " because: " - + e.getMessage()); + throw new RuntimeException("Cannot reindex the item with ID: " + item.getID(), e);This allows upstream handlers to accurately capture the root cause.
dspace-server-webapp/src/test/java/org/dspace/app/sword2/Swordv2IT.java (2)
93-104: Configuration overrides for SWORDv2 in the @before method.Setting the SWORDv2 properties under test conditions is helpful to align integration tests with the local environment. Consider verifying that these config overrides do not inadvertently persist in other test classes, especially if parallel test execution is enabled.
189-229: depositItemWithEmbargo - Tests item upload with embargo.This test ensures that a zipped embargoed item is accepted and returns HTTP 201 with the correct ATOM entry content type. Optionally, consider verifying the embargo policy details in the resulting item, but this is a good start.
dspace-server-webapp/src/test/java/org/dspace/app/rest/ResourcePolicyRestRepositoryIT.java (1)
1048-1106: Consider extending negative tests.
This new test method for creating a policy with a group appears correct. As a minor improvement, consider adding negative or edge-case scenarios (e.g., invalid group references).dspace-api/src/main/java/org/dspace/authorize/ResourcePolicyServiceImpl.java (2)
306-312: Consider refactoring for DRY.
Repeatedly checkingif (r.getdSpaceObject() instanceof Item)could be consolidated into a helper method (e.g.,addEventModify(context, dSpaceObject)). This reduces duplication and centralizes event logic.
374-377: Repetitive item-event check.
We could reuse the same helper functionality here to streamline event firing forItemobjects.
🛑 Comments failed to post (2)
dspace-server-webapp/pom.xml (1)
483-489: 💡 Verification agent
🧩 Analysis chain
Centralized json-smart version management.
Good approach to define a specific version of
json-smartas a direct dependency after excluding it from multiple other dependencies. This resolves version conflicts between nimbus-jose-jwt and json-path.However, it's worth verifying that 2.5.1 is the latest secure version.
🏁 Script executed:
#!/bin/bash # Check if 2.5.1 is the latest version of json-smart and if there are any security advisories # Check Maven Central for the latest version echo "Latest json-smart version in Maven Central:" curl -s "https://search.maven.org/solrsearch/select?q=g:net.minidev+AND+a:json-smart&rows=20&wt=json" | jq -r '.response.docs[0].latestVersion' # Check for CVEs related to json-smart echo -e "\nChecking for security vulnerabilities in json-smart:" curl -s "https://services.nvd.nist.gov/rest/json/cves/2.0?keywordSearch=json-smart" | jq '.vulnerabilities[] | {cve: .cve.id, description: .cve.descriptions[0].value}'Length of output: 1263
Action Required: Update json-smart Dependency Version
The current dependency version (2.5.1) in
dspace-server-webapp/pom.xmlis outdated and vulnerable. Verification indicates that:
- Maven Central now lists 2.5.2 as the latest version.
- Security advisories (e.g., CVE-2024-57699) confirm that versions up to 2.5.1 are susceptible to a DoS vulnerability.
Recommendations:
- Update the dependency version to 2.5.2.
- Verify that the new version resolves the identified security issue without breaking dependency exclusions.
Diff Snippet Update:
<!-- Specify the version of json-smart we want to use. This resolves version conflicts and addresses a known security vulnerability. --> <dependency> <groupId>net.minidev</groupId> <artifactId>json-smart</artifactId> <version>2.5.2</version> </dependency>dspace-api/src/main/java/org/dspace/content/packager/AbstractMETSIngester.java (1)
765-773:
⚠️ Potential issueImproved policy handling during bitstream crosswalk operations.
This change preserves bitstream policies during the crosswalk process by storing them before the operation and reapplying them afterward. This ensures that authorization settings aren't lost when metadata is updated through the crosswalk operation.
Great enhancement that fixes a potential issue where bitstream permissions could be lost during package ingestion. The policies are now properly preserved and restored after the crosswalk operation.
|
@milanmajchrak urobila som nejake upravy na zaklade review z coderabbit. Zaujimave je upozornenie na zaciatku, tiez urobil review codu, ktory sa netykal mojho PR |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (5)
dspace-oai/src/main/java/org/dspace/event/OAIIndexEventConsumer.java (2)
66-84: Restructure the multi-branch logic for better readability.Having many conditions (item, bundle, collection, community, bitstream) makes the code harder to follow. Consider extracting each case into helper methods or switching to a more centralized approach (e.g., a switch statement or polymorphic dispatch) for clarity.
85-114: Streamline repeated checks in event handling logic.The code checks various subject types (Collections, Communities, Bitstreams, Bundles, Items) and performs different indexing actions. It can be refactored into smaller, reusable methods (e.g.,
handleCollectionChanges(...),handleBitstreamChanges(...),handleItemChanges(...)) to enhance maintainability.dspace-oai/src/main/java/org/dspace/xoai/app/XOAI.java (3)
86-89: Consider using constructor injection instead of direct field injection.Spring recommends using constructor or setter injection to make dependencies more explicit. Field injection can reduce testability and clarity.
735-739: Potential performance concerns with per-itemdeleteByQuery.Calling
deleteByQueryfor each item might be expensive if reindexing a large number of items. Consider batching the deletion queries or using a single query to delete multiple items if you anticipate large collections.
741-755: Stop-on-error approach may lead to partial indexing.An error in reindexing a single item causes a runtime exception, halting the entire batch. This can leave the index partially updated. Evaluate whether a best-effort approach (continuing with other items) or proper rollback of already-reindexed items is more suitable.
Do you want me to propose a revised approach for handling partial failures or open a new issue to discuss batch reindexing with error handling?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
dspace-oai/src/main/java/org/dspace/event/OAIIndexEventConsumer.java(1 hunks)dspace-oai/src/main/java/org/dspace/xoai/app/XOAI.java(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: dspace-dependencies / docker-build (linux/amd64, ubuntu-latest, true)
- GitHub Check: Run Integration Tests
- GitHub Check: Run Unit Tests
🔇 Additional comments (3)
dspace-oai/src/main/java/org/dspace/event/OAIIndexEventConsumer.java (2)
37-45: Class-level documentation is concise but sufficient.The docstring above the class provides a decent overview. Make sure to keep it updated if the class responsibilities evolve.
123-163: Validate the new Context usage in theendmethod.Opening a new anonymous Context and creating a Spring ApplicationContext each time might be resource-intensive. Confirm whether this approach is required for isolation or if you can reuse an existing context or bean. If you must create a new context, ensure it’s always released properly on exceptions (which looks correct here).
dspace-oai/src/main/java/org/dspace/xoai/app/XOAI.java (1)
113-116: Check for potential duplicate initialization.This instance initializer creates a new
AnnotationConfigApplicationContexteach time anXOAIobject is instantiated. Verify if multiple application contexts are truly needed or if you can avoid re-initializing Spring context objects repeatedly.
| itemsToUpdate.addAll(((Bundle)subject).getItems()); | ||
| } | ||
| } else if (event.getSubjectType() == Constants.ITEM) { | ||
| //any event reindex this item |
There was a problem hiding this comment.
Your comments are not consistent. In some places, you add a space at the beginning of the comment, and in others, you don't. Somewhere you start with the uppersase somewhere not.
I suggest to start with uppercase and add a space before the message.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
dspace-oai/src/main/java/org/dspace/event/OAIIndexEventConsumer.java (4)
43-43: Consider using a final logger.It's a common best practice in many Java projects to declare loggers as
private static final Logger LOGGER = Logger.getLogger(...). This ensures the reference is not reassigned and makes the usage consistent.- private static Logger log = Logger.getLogger(OAIIndexEventConsumer.class); + private static final Logger log = Logger.getLogger(OAIIndexEventConsumer.class);
48-48: Initialize the set at declaration or in the constructor.Currently,
itemsToUpdateis declared as null and is lazily initialized inside theconsumemethod. Consider initializing the set right away to reduce null checks and simplify code flow.- private Set<Item> itemsToUpdate = null; + private Set<Item> itemsToUpdate = new HashSet<>();
68-70: Refactor repeated subject-type checks into a switch or single conditional block.Multiple
ifor||conditions for subject types can reduce readability. Consider using a more structured approach (e.g., a switch statement) or a mapping of subject types to actions to lower complexity.
150-158: Ensure theAnnotationConfigApplicationContextis needed at runtime.Creating a new Spring application context on each call to
endmight be resource-intensive. If possible, reuse a single application context or a shared bean configuration. Otherwise, just confirm that the overhead is acceptable and that no memory leaks occur when callingclose().
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
dspace-oai/src/main/java/org/dspace/event/OAIIndexEventConsumer.java(1 hunks)dspace/config/clarin-dspace.cfg(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- dspace/config/clarin-dspace.cfg
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Run Integration Tests
- GitHub Check: dspace-dependencies / docker-build (linux/amd64, ubuntu-latest, true)
- GitHub Check: Run Unit Tests
🔇 Additional comments (1)
dspace-oai/src/main/java/org/dspace/event/OAIIndexEventConsumer.java (1)
127-165:Details
❓ Verification inconclusive
Validate concurrency and transaction boundaries in the
endmethod.Using a brand-new
Contextfor indexing without read-only mode can allow changes to be committed. If concurrent events arrive, you might need to ensure that partial indexing operations do not conflict. Consider verifying that simultaneousendcalls do not produce data race conditions or inconsistent index states.In case you want to confirm concurrency handling, you can run a stress test:
🏁 Script executed:
#!/bin/bash # A simple concurrency check script # This script calls the same end-processing method across multiple simultaneous events # and logs if any concurrency-related exceptions occur. # NOTE: The actual invocation might vary based on project build system and environment. # Replace "DummyScriptCall" with your method invocation or unit test command for i in {1..10}; do (echo "Simulated event $i" && echo "DummyScriptCall" ) & done waitLength of output: 1038
Concurrency and Transaction Boundary Validation in
endMethodBased on the provided simulation output—where ten concurrent invocations of the end-processing routine ran without observable exceptions—the basic stress test indicates that creating a new
Contextper invocation currently prevents immediate concurrency-related errors. However, note the following:
- Context Isolation: Each call to
endinstantiates its ownContext(anonymousContext). This appears to help isolate operations, reducing the risk of data races when multiple events occur simultaneously.- Simulation Limitations: The test output only confirms that no exceptions were thrown during these dummy invocations. It does not fully simulate real-world transaction scenarios or the effects of multiple threads modifying persistent state.
- Further Testing Required: To ensure that partial indexing operations do not conflict under production conditions, consider more extensive testing that involves actual database transactions and evaluates the thread safety of the
XOAI#indexItemsmethod.Please verify that these behaviors remain robust in production-like environments to avoid any hidden concurrency or transactional issues.
…DSpace into solr-reindexing-by-events
| for (Item item : items) { | ||
| try { | ||
| deleteItemByQuery(item); | ||
| solrServerResolver.getServer().add(this.index(item)); |
There was a problem hiding this comment.
can we do better here?
what if we throw in .add
There was a problem hiding this comment.
Solr keeps changes in memory (transaction log) for performance. Without commit(), those changes aren't written to the actual index files. Queries won't reflect deletions (or any updates) until a commit or auto-commit happens. The commit is also called in the index method and in another place in the code.
If an exception occurs while indexing the item or adding it to the Solr server, the exception is logged, and no further items will be processed. I added these info also to code as comments.
| * The indexing is done using the XOAI indexer after all relevant items are collected. | ||
| * | ||
| * Class is copied from UFAL/CLARIN-DSPACE (https://github.com/ufal/clarin-dspace) and modified by | ||
| * @author Michaela Paurikova (michaela.paurikova at dataquest.sk) |
| XOAI indexer = new XOAI(anonymousContext, false, false); | ||
| AnnotationConfigApplicationContext applicationContext = new AnnotationConfigApplicationContext( | ||
| new Class[] { BasicConfiguration.class }); | ||
| applicationContext.getAutowireCapableBeanFactory() |
There was a problem hiding this comment.
is this necessary because we are in dspace-oai?
…DSpace into solr-reindexing-by-events
| List<ResourcePolicy> resourcePolicies = find(c, group); | ||
| for (ResourcePolicy r : resourcePolicies) { | ||
| addEventModify(c, r.getdSpaceObject()); | ||
| } |
There was a problem hiding this comment.
Minor comment.
Just consider using shorter form:
find(c,group).forEach(r -> addEventModify(c, r.getdSpaceObject());
|
|
||
| public void addEventModify(Context context, DSpaceObject dso) { | ||
| if (dso instanceof Item) { | ||
| Item item = (Item) dso; |
There was a problem hiding this comment.
Here, casting dso to item is not needed, you can simply use:
context.addEvent(new Event(Event.MODIFY, -1, null,
Constants.ITEM, dso.getID(), ""));
| } | ||
|
|
||
| Set<Item> filtered = new HashSet<Item>(itemsToUpdate.size()); | ||
| for (Item item : itemsToUpdate) { |
There was a problem hiding this comment.
Could be replaced with one line, I think:
Set<Item> filtered = itemsToUpdate.stream().filter(item -> item.getHandle() != null).collect(Collectors.toSet());
| indexer.indexItems(filtered); | ||
| applicationContext.close(); | ||
| } catch (Exception e) { | ||
| itemsToUpdate = null; |
There was a problem hiding this comment.
I'd move (itemsToUpdate = null) to finally block.
Thus, the entire catch block could be removed, so there would be just:
try {
...
} finally {
itemsToUpdate = null;
...
}
Similarly, line 149 (itemsToUpdate = null) can be removed
| DSpaceObject subject = event.getSubject(ctx); | ||
| DSpaceObject object = event.getObject(ctx); | ||
|
|
||
| int et = event.getEventType(); |
There was a problem hiding this comment.
I'd move this line below - to the place, where "et" number is actually needed.
|
|
||
| ItemService itemService = ContentServiceFactory.getInstance().getItemService(); | ||
|
|
||
| // Collect Items, Collections, Communities that need indexing. |
There was a problem hiding this comment.
This comment is slightly confusing.
Either change it to
(1)
// Collect Items that need indexing.
or
(2)
use more generic list of DSpaceObjects:
// Collect Items, Collections, Communities that need indexing.
private Set<DSpaceObject> objectsToUpdate = null;
With (2) you'd avoid the need of few casting below in the code
| } | ||
|
|
||
| public void addEventModify(Context context, DSpaceObject dso) { | ||
| if (dso instanceof Item) { |
There was a problem hiding this comment.
Here, I'd prefer if (Objects.nonNull(object) && event.getObjectType() == Constants.ITEM)
You use this check also somewhere else in this pull request (OAIIndexEventConsumer)
Problem description
Modify the reindexing of an item by event.
Summary by CodeRabbit
New Features
Refactor
Chores