Skip to content

Conversation

@kumargu
Copy link
Contributor

@kumargu kumargu commented Apr 10, 2025

The major motivation to remove the interception of unix domain sockets connections is to reduce the churn of granting allow access to domain sockets across multiple places in core and more importantly across plugins. A recent stream of updates haven't had been able to resolve errors on window build (see opensearch-project/custom-codecs#235).

While this could be a temporary change: there are arguments that this change is generally safe since we can trust IPC communications. See more discussion at

At this point in time, we would like to get the windows build making some progress to be able to identify what else is broken and needs to be patched across various plugins.

If we are not comfortable removing this interception, we can come back to it in RC-2 with the right sets of plumbings needed.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Gulshan Kumar <kumargu@amazon.com>
@andrross
Copy link
Member

@kumargu What about this one?

final NetPermission permission = new NetPermission("accessUnixDomainSocket");

@reta
Copy link
Contributor

reta commented Apr 10, 2025

Strong 👎 We could just drop security manager and agent with zero churn. I would have better work to help to fix the issues, since we are very likely would add more (not less!) interceptors in the future, and such kind of changes are guarantee to happen.

@kumargu
Copy link
Contributor Author

kumargu commented Apr 10, 2025

If we are not comfortable removing this interception, we can come back to it in RC-2 with the right sets of plumbings needed.

@reta I am not saying we drop it forever. The idea was to get the builds moving forward to identify what else is broken.

Given, your other PR has worked for codec build, I can close this one.

@kumargu
Copy link
Contributor Author

kumargu commented Apr 10, 2025

We could just drop security manager and agent with zero churn.

On this: re-quoting my earlier stances: #1687 (comment) #16729 (comment)

@github-actions
Copy link
Contributor

❌ Gradle check result for d9fef15: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@kumargu kumargu closed this Apr 10, 2025
@andrross
Copy link
Member

@reta @kumargu Honestly my concern here is with the platform-specific behavior we're seeing here. This feels like very low level details leaking into the security policy. Given we're seeing this only on Windows, this will likely introduce development friction as Windows is not the common development platform. Failures will only show up later in PRs or other integration test environments. I'd like to explore whether we can infer the appropriate permissions without needing to explicitly call out accessUnixDomainSocket for one platform.

@cwperks
Copy link
Member

cwperks commented Apr 10, 2025

@jmazanec15 Just sent me a k-nn failure.

2025-04-10T18:18:30.0875008Z »  Caused by: java.lang.SecurityException: Denied OPEN (read) access to file: /proc/cpuinfo, domain: ProtectionDomain  (file:/__w/k-NN/k-NN/build/testclusters/integTest-0/distro/3.0.0-ARCHIVE/lib/lucene-core-10.1.0.jar <no signer certificates>)

I see that the k-NN repo already does make this grant.

I'm planning to take a deeper dive, but on the surface it looks like we are trying every ProtectionDomain on the call stack instead of just ensuring that at least one of the ProtectionDomain's on the call stack has the grant. Is my understanding correct?

Full stack trace
2025-04-10T18:18:30.0745441Z »  	at org.opensearch.knn.jni.JNIService.initIndex(JNIService.java:50) ~[?:?]
2025-04-10T18:18:30.0747238Z »  	at org.opensearch.knn.index.codec.nativeindex.MemOptimizedNativeIndexBuildStrategy.lambda$buildAndWriteIndex$0(MemOptimizedNativeIndexBuildStrategy.java:63) ~[?:?]
2025-04-10T18:18:30.0748969Z »  	at java.base/java.security.AccessController.doPrivileged(AccessController.java:319) ~[?:?]
2025-04-10T18:18:30.0750859Z »  	at org.opensearch.knn.index.codec.nativeindex.MemOptimizedNativeIndexBuildStrategy.buildAndWriteIndex(MemOptimizedNativeIndexBuildStrategy.java:62) ~[?:?]
2025-04-10T18:18:30.0753366Z »  	at org.opensearch.knn.index.codec.nativeindex.remote.RemoteIndexBuildStrategy.buildAndWriteIndex(RemoteIndexBuildStrategy.java:152) ~[?:?]
2025-04-10T18:18:30.0755853Z »  	at org.opensearch.knn.index.codec.nativeindex.NativeIndexWriter.buildAndWriteIndex(NativeIndexWriter.java:159) ~[?:?]
2025-04-10T18:18:30.0757378Z »  	at org.opensearch.knn.index.codec.nativeindex.NativeIndexWriter.flushIndex(NativeIndexWriter.java:105) ~[?:?]
2025-04-10T18:18:30.0759129Z »  	at org.opensearch.knn.index.codec.KNN990Codec.NativeEngines990KnnVectorsWriter.flush(NativeEngines990KnnVectorsWriter.java:128) ~[?:?]
2025-04-10T18:18:30.0761709Z »  	at org.apache.lucene.codecs.perfield.PerFieldKnnVectorsFormat$FieldsWriter.flush(PerFieldKnnVectorsFormat.java:120) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
2025-04-10T18:18:30.0764749Z »  	at org.apache.lucene.index.VectorValuesConsumer.flush(VectorValuesConsumer.java:76) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
2025-04-10T18:18:30.0767204Z »  	at org.apache.lucene.index.IndexingChain.flush(IndexingChain.java:305) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
2025-04-10T18:18:30.0769225Z »  	at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:456) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
2025-04-10T18:18:30.0771702Z »  	at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:502) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
2025-04-10T18:18:30.0773880Z »  	at org.apache.lucene.index.DocumentsWriter.maybeFlush(DocumentsWriter.java:456) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
2025-04-10T18:18:30.0776468Z »  	at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:649) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
2025-04-10T18:18:30.0778717Z »  	at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:578) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
2025-04-10T18:18:30.0781115Z »  	at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:382) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
2025-04-10T18:18:30.0783673Z »  	at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:356) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
2025-04-10T18:18:30.0786557Z »  	at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:346) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
2025-04-10T18:18:30.0789243Z »  	at org.apache.lucene.index.FilterDirectoryReader.doOpenIfChanged(FilterDirectoryReader.java:112) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
2025-04-10T18:18:30.0791886Z »  	at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:170) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
2025-04-10T18:18:30.0794375Z »  	at org.opensearch.index.engine.OpenSearchReaderManager.refreshIfNeeded(OpenSearchReaderManager.java:72) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0796979Z »  	at org.opensearch.index.engine.OpenSearchReaderManager.refreshIfNeeded(OpenSearchReaderManager.java:52) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0799472Z »  	at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:167) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
2025-04-10T18:18:30.0802154Z »  	at org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:240) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
2025-04-10T18:18:30.0804640Z »  	at org.opensearch.index.engine.InternalEngine$ExternalReaderManager.refreshIfNeeded(InternalEngine.java:443) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0807389Z »  	at org.opensearch.index.engine.InternalEngine$ExternalReaderManager.refreshIfNeeded(InternalEngine.java:423) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0809781Z »  	at org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:167) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
2025-04-10T18:18:30.0812430Z »  	at org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:240) ~[lucene-core-10.1.0.jar:10.1.0 884954006de769dc43b811267230d625886e6515 - 2024-12-17 16:15:44]
2025-04-10T18:18:30.0813833Z »  	at org.opensearch.index.engine.InternalEngine.refresh(InternalEngine.java:1805) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0815048Z »  	at org.opensearch.index.engine.InternalEngine.refresh(InternalEngine.java:1782) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0816612Z »  	at org.opensearch.index.shard.IndexShard.refresh(IndexShard.java:1421) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0818081Z »  	at org.opensearch.action.admin.indices.refresh.TransportShardRefreshAction.lambda$shardOperationOnPrimary$0(TransportShardRefreshAction.java:101) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0819408Z »  	at org.opensearch.core.action.ActionListener.completeWith(ActionListener.java:344) ~[opensearch-core-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0820754Z »  	at org.opensearch.action.admin.indices.refresh.TransportShardRefreshAction.shardOperationOnPrimary(TransportShardRefreshAction.java:100) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0822317Z »  	at org.opensearch.action.admin.indices.refresh.TransportShardRefreshAction.shardOperationOnPrimary(TransportShardRefreshAction.java:57) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0823870Z »  	at org.opensearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:1333) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0826092Z »  	at org.opensearch.action.support.replication.ReplicationOperation.execute(ReplicationOperation.java:150) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0829035Z »  	at org.opensearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.runWithPrimaryShardReference(TransportReplicationAction.java:654) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0831286Z »  	at org.opensearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.lambda$doRun$0(TransportReplicationAction.java:547) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0832579Z »  	at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) ~[opensearch-core-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0834269Z »  	at org.opensearch.index.shard.IndexShard.lambda$wrapPrimaryOperationPermitListener$36(IndexShard.java:4203) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0836729Z »  	at org.opensearch.core.action.ActionListener$3.onResponse(ActionListener.java:132) ~[opensearch-core-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0838980Z »  	at org.opensearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:310) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0841585Z »  	at org.opensearch.index.shard.IndexShardOperationPermits.acquire(IndexShardOperationPermits.java:255) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0843925Z »  	at org.opensearch.index.shard.IndexShard.acquirePrimaryOperationPermit(IndexShard.java:4174) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0846913Z »  	at org.opensearch.action.support.replication.TransportReplicationAction.acquirePrimaryOperationPermit(TransportReplicationAction.java:1262) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0850013Z »  	at org.opensearch.action.support.replication.TransportReplicationAction$AsyncPrimaryAction.doRun(TransportReplicationAction.java:544) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0852885Z »  	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0855836Z »  	at org.opensearch.action.support.replication.TransportReplicationAction.handlePrimaryRequest(TransportReplicationAction.java:483) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0858888Z »  	at org.opensearch.wlm.WorkloadManagementTransportInterceptor$RequestHandler.messageReceived(WorkloadManagementTransportInterceptor.java:63) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0861829Z »  	at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:108) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0864064Z »  	at org.opensearch.transport.TransportService$7.doRun(TransportService.java:1048) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0866590Z »  	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:975) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0868936Z »  	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-3.0.0-beta1-SNAPSHOT.jar:3.0.0-beta1-SNAPSHOT]
2025-04-10T18:18:30.0870628Z »  	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
2025-04-10T18:18:30.0871981Z »  	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
2025-04-10T18:18:30.0872999Z »  	at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
2025-04-10T18:18:30.0875008Z »  Caused by: java.lang.SecurityException: Denied OPEN (read) access to file: /proc/cpuinfo, domain: ProtectionDomain  (file:/__w/k-NN/k-NN/build/testclusters/integTest-0/distro/3.0.0-ARCHIVE/lib/lucene-core-10.1.0.jar <no signer certificates>)

@reta
Copy link
Contributor

reta commented Apr 10, 2025

Failures will only show up later in PRs or other integration test environments. I'd like to explore whether we can infer the appropriate permissions without needing to explicitly call out accessUnixDomainSocket for one platform.

@andrross this is not a call for one platform but it is a permission to use UNIX domain sockets. We are seeing Windows failures here because recent JDKs do use UNIX domain sockets on Windows for loopback (and I think it is Windows 10 or above).

JDK makes this decision inside the socket impl, but because we are not part of it anymore - we have to be much more explicit / precise with permissions

@kumargu
Copy link
Contributor Author

kumargu commented Apr 10, 2025

I see that the k-NN repo already does make this grant.

that failure is from the tests, right? I don't think plugin-security.policy is referenced in the tests. At least in core, the permissions for tests are picked from test.policy or test-framework.policy

@andrross
Copy link
Member

JDK makes this decision inside the socket impl, but because we are not part of it anymore - we have to be much more explicit / precise with permissions

@reta I understand. This is the point I'm pushing on. Do we actually need to be so precise. Can our policy enable generic "socket" permissions then the validation code can apply it to whatever we deem appropriate (in this case it would be both InetSocketAddress and UnixDomainSocketAddress).

@reta
Copy link
Contributor

reta commented Apr 10, 2025

@reta I understand. This is the point I'm pushing on. Do we actually need to be so precise.

@andrross in my opinion, being precise is a must , and it does not hurt - if our goal is to build a minimal but robust foundation. If our goal is to release something that pretend to do right thing - sure, we don't need to be precise. Yes, being precise requires some work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants