Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Segment-based replication / Remote Store is not compatible with Lucene 9.12 / 10 #15902

Closed
reta opened this issue Sep 11, 2024 · 12 comments
Assignees
Labels
bug Something isn't working Storage:Remote Storage Issues and PRs relating to data and metadata storage

Comments

@reta
Copy link
Collaborator

reta commented Sep 11, 2024

Describe the bug

I have been working on routine Apache Lucene 9.12 update (snapshot) but run into show stopper:

                java.lang.WrongThreadException: Attempted access outside owning thread                                                                                                                                                                                                                                                                
                    at java.base/jdk.internal.foreign.MemorySessionImpl.wrongThread(MemorySessionImpl.java:315)                                                                                                                                                                                                                                       
                    at java.base/jdk.internal.misc.ScopedMemoryAccess$ScopedAccessError.newRuntimeException(ScopedMemoryAccess.java:113)                                                                                                                                                                                                              
                    at java.base/jdk.internal.foreign.MemorySessionImpl.checkValidState(MemorySessionImpl.java:219)                                                                                                                                                                                                                                   
                    at java.base/jdk.internal.foreign.ConfinedSession.justClose(ConfinedSession.java:83)                                                                                                                                                                                                                                              
                    at java.base/jdk.internal.foreign.MemorySessionImpl.close(MemorySessionImpl.java:242)                                                                                                                                                                                                                                             
                    at java.base/jdk.internal.foreign.MemorySessionImpl$1.close(MemorySessionImpl.java:88)                                                                                                                                                                                                                                            
                    at org.apache.lucene.store.MemorySegmentIndexInput.close(MemorySegmentIndexInput.java:514)                                                                                                                                                                                                                                        
                    at org.apache.lucene.tests.store.MockIndexInputWrapper.close(MockIndexInputWrapper.java:81)                                                                                                                                                                                                                                       
                    at org.opensearch.common.util.io.IOUtils.close(IOUtils.java:89)                                                                                        
                    at org.opensearch.common.util.io.IOUtils.close(IOUtils.java:131)
                    at org.opensearch.common.util.io.IOUtils.close(IOUtils.java:81)
                    at org.opensearch.indices.replication.SegmentFileTransferHandler$1$1.close(SegmentFileTransferHandler.java:111)
                    at org.opensearch.common.util.io.IOUtils.close(IOUtils.java:89)
                    at org.opensearch.common.util.io.IOUtils.close(IOUtils.java:131)
                    at org.opensearch.common.util.io.IOUtils.close(IOUtils.java:81)
                    at org.opensearch.indices.replication.SegmentFileTransferHandler$1.close(SegmentFileTransferHandler.java:165)
                    at org.opensearch.common.util.io.IOUtils.close(IOUtils.java:89)
                    at org.opensearch.common.util.io.IOUtils.close(IOUtils.java:131)
                    at org.opensearch.common.util.io.IOUtils.close(IOUtils.java:102)
                    at org.opensearch.indices.recovery.MultiChunkTransfer.onCompleted(MultiChunkTransfer.java:170)
                    at org.opensearch.indices.recovery.MultiChunkTransfer.handleItems(MultiChunkTransfer.java:144)
                    at org.opensearch.indices.recovery.MultiChunkTransfer$1.write(MultiChunkTransfer.java:98)
                    at org.opensearch.common.util.concurrent.AsyncIOProcessor.processList(AsyncIOProcessor.java:131)
                    at org.opensearch.common.util.concurrent.AsyncIOProcessor.drainAndProcessAndRelease(AsyncIOProcessor.java:119)
                    at org.opensearch.common.util.concurrent.AsyncIOProcessor.put(AsyncIOProcessor.java:97)
                    at org.opensearch.indices.recovery.MultiChunkTransfer.addItem(MultiChunkTransfer.java:109)
                    at org.opensearch.indices.recovery.MultiChunkTransfer.lambda$handleItems$3(MultiChunkTransfer.java:151)
                    at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82)
                    at org.opensearch.core.action.ActionListener$6.onResponse(ActionListener.java:301)
                    at org.opensearch.indices.recovery.RecoveryTarget.writeFileChunk(RecoveryTarget.java:457)
                    at org.opensearch.indices.recovery.AsyncRecoveryTarget.lambda$writeFileChunk$6(AsyncRecoveryTarget.java:170)
                    at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:923)
                    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
                    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
                    at java.base/java.lang.Thread.run(Thread.java:1583)

The issue comes from the fact than Apache Lucene uses FF&MI APIs and more specifically, MemorySegmentIndexInput which is backed by confined memory segment that is not supposed to be shared between multiple threads.

At this moment, I don't not know the exact solution to this, more eyes / minds would be certainly beneficial.

Related component

Storage:Remote

To Reproduce

See please #15333

Expected behavior

The tests should be passing.

Additional Details

Plugins
Standard

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • JDK 21 / 22

Additional context
See please #15333

@reta reta added bug Something isn't working untriaged Storage Issues and PRs relating to data and metadata storage Storage:Remote labels Sep 11, 2024
@reta
Copy link
Collaborator Author

reta commented Sep 11, 2024

@andrross @mch2 @msfroh tagging you folks, interesting problem that needs to be solved.

@mch2
Copy link
Member

mch2 commented Sep 11, 2024

@reta Thanks for raising. From the trace looks like this applies to any node-node file copy which would include peer recovery.

@reta
Copy link
Collaborator Author

reta commented Sep 11, 2024

@reta Thanks for raising. From the trace looks like this applies to any node-node file copy which would include peer recovery.

It might be, the presence of

                    at org.opensearch.indices.replication.SegmentFileTransferHandler$1$1.close(SegmentFileTransferHandler.java:111)

specifically points out to segment replication but to your point, it is not limited to it, thanks @mch2 !

@mch2
Copy link
Member

mch2 commented Sep 11, 2024

yep thats right the transfer handler is shared between both functions, will take a look at this

@mch2 mch2 self-assigned this Sep 11, 2024
@mch2
Copy link
Member

mch2 commented Sep 12, 2024

Ok this is because we are using IOContext.READONCE to create inputs and then read/send chunks of the files async. This context now enables a requirement that the input is consumed and closed in the same thread within MemorySegmentIndexInput. There is also enforcement within MockDirectoryWrapper that Segments_n files open with a READONCE context, so will need to send that separately or re-create the input if it has more than 1 chunk, and open the rest of the files with READ context.

@mch2
Copy link
Member

mch2 commented Sep 12, 2024

Thought maybe we could get by with clones on the original inputs bc the clones themselves aren't closed but the creation of them ensures the original is confined via buildSlice here, this prevents us from creating clones from separate threads.

@reta
Copy link
Collaborator Author

reta commented Sep 12, 2024

Ok this is because we are using IOContext.READONCE to create inputs and then read/send chunks of the files async

I also suspected this but changing to READ was not helping, at least with MockDirectoryWrapper (I will post the error shortly, just far from laptop now).

@mch2
Copy link
Member

mch2 commented Sep 12, 2024

yeah the wrapper is still enforcing the use of READONLY only on segment_n files. I'm not sure of the reasoning there but I figure most will xfer within a single chunk, so i've made a fix #15922 on top of your changes that will conditionally use a READONLY context for those and re-open the input per chunk see - 9df169a

@mch2
Copy link
Member

mch2 commented Sep 13, 2024

Checking on that draft theres quite a few tests still breaking because we are reading segments* files with a non readonce context. I think in some of these we may be ok flipping those to READONCE, ie recovery/upload where the files would likely be read once and not see a benefit from being cached by the os. What i've gathered so far is these are:

IndexShard: During remote store restore flows
syncSegmentsFromGivenRemoteSegmentStore - Used to access specifically a local segments* file during restore, used only at recovery time
localDirectoryContains - used during recovery from remote store to check if a file exists before fetching from remote

RemoteDirectory:
calculateChecksumOfChecksum - invoked during file upload to compute checksum, should happen once per file. This one requires a bit more thought bc we open inputs multiple times in the upload flow

@reta
Copy link
Collaborator Author

reta commented Sep 13, 2024

Thanks a lot @mch2

@ashking94
Copy link
Member

Removing the untriaged label since there are conversations and activities going around already.
[Triage - attendees 1 2 3 4 5]

@reta
Copy link
Collaborator Author

reta commented Oct 2, 2024

Getting there, see please #15333

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Storage:Remote Storage Issues and PRs relating to data and metadata storage
Projects
Status: ✅ Done
Development

Successfully merging a pull request may close this issue.

3 participants