-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cursor-based ItemReader for MongoDB #4323
Conversation
related #3824 |
Awesome! Thank you for contributing this cursor-based implementation! I will plan this feature for Spring Batch 5.1. Just curious, have you run a benchmark to compare the performance of this reader with the paging one? |
As you know, pagination-based item readers perform poorly in terms of performance due to the amount of documents that need to be skipped. This amount is related to the total number of documents and the batch size, making it difficult to quantify the performance improvement of cursor-based item readers in numbers. In my case, when benchmarking with a batch size of 10k on 0-3.5 million documents, the elapsed time was reduced by about 20-40%. While different results may occur depending on the test conditions, I am attaching the scatter chart results from my test. |
* @param batchSize size the batch size to apply to the cursor | ||
*/ | ||
public void setBatchSize(Integer batchSize) { | ||
this.batchSize = batchSize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpicking: Should we take care of batchSize
when it is 0 or negative?
As I checked the spec of spring-data-mongo
- Use {@literal 0 (zero)} for no limit. A negative limit closes the cursor after returning a single
- batch indicating to the server that the client will not ask for a subsequent one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hmhuan I'm sorry for the delay in responding.
Users would expect that this code is implemented using spring-data-mongodb, given the explicit use of MongoOperations
as an argument in MongoCursorItemReader
. Therefore, if setBatchSize
handles 0 or negative values, I think it could reduce predictability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type of the parameter should be the same as expected by the Query
API. The setter should add the javadoc tag @see Query#cursorBatchSize(int)
so the user knows that the value will be set on that field, and therefore knows the (default) values to use. I will take care of this change when merging the PR.
- Update parameter types - Update tests - Update Javadocs - Fix code formatting
The MongoItemReader currently uses pagination with MongoDB's skip operation, which can cause unnecessary document scans when requesting next pages.
Therefore, I have implemented a new cursor-based MongoDB
ItemReader
that accesses the next documents without skipping.