-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFS] Added coordinated doc migration to RFS Worker #708
Conversation
Signed-off-by: Chris Helma <chelma+github@amazon.com>
Signed-off-by: Chris Helma <chelma+github@amazon.com>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #708 +/- ##
============================================
- Coverage 63.56% 63.35% -0.21%
- Complexity 1565 1579 +14
============================================
Files 216 220 +4
Lines 8414 9164 +750
Branches 758 793 +35
============================================
+ Hits 5348 5806 +458
- Misses 2666 2948 +282
- Partials 400 410 +10
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I am missing something about how document leases are different from the other leasing patterns we have, @chelma could you take a look at that comment thread?
public DocumentsRunner(GlobalState globalState, CmsClient cmsClient, String snapshotName, IndexMetadata.Factory metadataFactory, | ||
ShardMetadata.Factory shardMetadataFactory, SnapshotShardUnpacker unpacker, LuceneDocumentsReader reader, | ||
DocumentReindexer reindexer) { | ||
this.members = new DocumentsStep.SharedMembers(globalState, cmsClient, snapshotName, metadataFactory, shardMetadataFactory, unpacker, reader, reindexer); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit; instead of piping these through as parameters, have a single parameter for DocumentsStep.SharedMembers
instance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, interesting - so are you thinking that the SharedMembers are a higher level abstraction rather than something just internal to the DocumentStep classes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I think so - since most members are not stateful so it seems like they could be part of a higher level construction for reuse.
And for Global state, I think I would recommend just removing it, seems like it represents a 'higher' step in, but the downstream systems don't seem to need to know where they are at save for a couple of log messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Steps set the Phase and Work Item in the GlobalState. These aren't used currently but have future uses per the design.
} | ||
|
||
@Override | ||
public void run() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the logic to pickup a lease only be used once before starting the work of DocumentStep? Why are we acquiring/updating the lease twice depending on CmsEntry.Documents vs CmsEntry.DocumentsWorkItem entry types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's two different lease types for two different sets of work. The lease on a CmsEntry.Documents
is used when you want to be the single node creating all the CmsEntry.DocumentsWorkItem
in the CMS; the lease on the CmsEntry.DocumentsWorkItem
is used when you want to be the single node migrating the documents associated with a specific shard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these are different phases, shouldn't they be different steps of the state machine for processing documents, it looks like they are conflated here. If it works lets go with it, but I think we will need to rework this area in the next pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe; this was all in the state machine we created/reviewed months ago. We'll have to look at it closely to see if it's desirable to actually split this out.
// We use optimistic locking here in the unlikely event someone else is working on this task despite the | ||
// leasing system and managed to complete the task; in that case we want this update to bounce. | ||
members.cmsWorkEntry = members.cmsClient.updateDocumentsWorkItem(updatedEntry, workItem); | ||
members.cmsWorkEntry.ifPresentOrElse( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem correct, both of these log statements say 'failed', shouldn't one be success vs the other a warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! The second log statement should read "Unable to mark Documents Work Item ... as failed"; will fix. Regarding the log levels... I dunno about that one, it's tricky. I could see the case when we try and fail to mark it as failed be a warn? TBH, all the log levels could use a critical eye.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about creating a jira to revisit? Maybe meld this kind of detail check in with the instrumentation work, I suspect a refactor with common paths for logging all steps would greatly reduce the amount of bespoke log messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DocumentsStep.SetupDocumentsWorkEntries.class | ||
), | ||
|
||
// // We were able to acquire the lease, and it's on a Document Work Item setup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete or uncomment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof, good catch. Will investigate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete, thankfully
@@ -152,6 +163,13 @@ public static void main(String[] args) throws Exception { | |||
IndexCreator_OS_2_11 indexCreator = new IndexCreator_OS_2_11(targetClient); | |||
IndexRunner indexWorker = new IndexRunner(globalState, cmsClient, snapshotName, indexMetadataFactory, indexCreator, transformer); | |||
indexWorker.run(); | |||
|
|||
ShardMetadata.Factory shardMetadataFactory = new ShardMetadataFactory_ES_7_10(repoDataProvider); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you look into how SimpleRestoreFromSnapshot_ES_7_10 can be updated to use this code instead of the ad-hoc version? Seems like a great opportunity to reuse this system
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100%, agreed. Will try to work into one of the small cleanup tasks I have next.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to get this started, but seems like a refactor might be better to get this working: chelma/opensearch-migrations@main...peternied:documents-runner-test
Signed-off-by: Chris Helma <chelma+github@amazon.com>
Description
Issues Resolved
Testing
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.