[Feature Request] [Parallel downloads] Investigate the use of virtual threads as the new I/O mechanism for segment downloads #11708

kotwanikunal · 2024-01-02T19:18:50Z

Is your feature request related to a problem? Please describe

Coming in from #11461 - we need to investigate utilizing virtual threads as the default I/O mechanism for segment downloads.

Describe the solution you'd like

#10786 talks about using virtual threads as the I/O mechanism over the current code base.
As a part of parallel downloads -

We can adapt reactive streams from the Java SDK as the standard async I/O mechanism
Adapt virtual threads as the I/O mechanism

A major benefit of the virtual thread mechanism is that the existing code constructs and APIs do not need to change - it will leverage the benefits at a logically lower level.

The aim of this issue is to test and compare the overall effort, performance benefits between the two mechanisms as well as documenting the suggested path forward.

Related component

Search:Remote Search

Describe alternatives you've considered

Reactive Streams
Virtual threads

Additional context

kotwanikunal · 2024-03-08T05:04:41Z

Experiments performed

All of the experiments below were performed on a self managed cluster with 3 r7g.4xl EC2 nodes, a load balancer and a c5.2xl EC2 node running opensearch-benchmark as the load generator.

Recovery

Created an index nyc_taxis with 6 primary shards and 0 replicas initially with a primary shard size of around 17
- The overall index size was around 120 GB.
Performed shard recovery across 3 use cases by adding in 2 replica shards for the index.
Bumped up the EBS volume throughput to 1000 MB/s and IOPS to 6000, JVM heap configured to 100 GB
The number of concurrent recoveries was bumped up to 100 to facilitate for parallel recovery of shards and the speed limit (indices.recovery.max_bytes_per_sec ) was removed.

	Shard Size (GBs)	Recovery Time (seconds)	Throughput (MB/s)	Improvement

File-Based Blocking Threads	17.8	94	189.9	Baseline for recovery
File-based Virtual Threads	N/A	N/A	N/A

Part-based Blocking Threads	17.8	85.5	208.23	10% over file based
Part-based Virtual Threads	17.8	70	255.17	34% over file based 22% over blocking multipart

Virtual threaded multipart approach is roughly 34% faster than the file approach and ~22% faster than the blocking multipart approach.

Replication

Replication scenario was tested with the above described nyc_taxis index with 6 primary shards of roughly 18gb and 0 replicas initially and a new so index with 3 primary shards and 1 replica spread across 3 nodes.
The test involved increasing the replicas for nyc_taxis to 2, kicking off recovery, and simultaneously indexing documents on so to check for overall replication lag
- There was a separate process polling _cat/segment_replication every 5 seconds till the recovery was in progress and dumping the results to a file
The number of concurrent recoveries was limited to 2 to test for system fairness and the recovery speed (indices.recovery.max_bytes_per_sec) was limited 125 MB

Case 1: 1s Refresh Interval for so

	Average lag (seconds)	Average Lag (GB)	Max Lag (seconds)	Max Lag (GB)
Baseline	1.9	0.27	42	4.6


File-Based Blocking Threads	45	1.4	132	4.8
File-based Virtual Threads	55	1.7	150	5.8

Part-based Blocking Threads	15	0.9	108	5.2
Part-based Virtual Threads	16	0.5	44	1.4

Case 2: 10s Refresh Interval for so

	Average lag (seconds)	Average Lag (GB)	Max Lag (seonds)	Max Lag (GB)

File-Based Blocking Threads	61.84873	2.0458	198	6.9
File-based Virtual Threads	71.4141	2.11445	246	6.3

Part-based Blocking Threads	51.16281	1.14736	168	3.1
Part-based Virtual Threads	41.63	1.44	126	4.8

Compared to the file based approach, the virtual, multithreaded approach gives comparatively better performance. The inherent fairness added by the part based approach helps reduce the overall replication lag and improve replication performance.

The baseline case has been added in for comparison purposes and the virtual, multipart approach is the most fair approach as compared to the other approaches.

Security (JSM)

Virtual Threads and SecurityManager are incompatible called out as a part of JEP-444
- More on that here
VirtualThreads operate off platform threads and need access to other Threads and ThreadGroups to park and unpark these threads when it encounters blocking code
VirtualThreads can operate with the OpenSearch codebase and Security Manager with certain tweaks
- The changes are highlighted here: https://github.com/kotwanikunal/OpenSearch/blob/virtual-thread-sm/libs/secure-sm/src/main/java/org/opensearch/secure_sm/SecureSM.java#L219-L221
- The callout here is that if virtual threads are enabled, SM will act as a pass through for VirtualThread permissions with respect to thread level permissions
- We can build up a new permission for virtual threads, if needed, to add in explicit compatibility check for virtual threads

Path forward

The next steps for multipart download support would be as follows -

Add support for platform threaded multipart downloads as the default download mechanism for remote store operations
1. Virtual Thread support will be added for JDK21+ runtimes and is described as step 2
2. This will be wrapped behind a feature release flag with defaults to file-based downloads pre launch
Add a multi release JAR support for JDK 21+/<JDK21
1. This will ensure usage of virtual threads with JDK21+ where support exists
2. This has been previously implemented here:
For JSM, there are 2 possible approaches -
1. Enable security manager with a passthrough just for virtual threads as highlighted here
2. Enable Virtual Threads on JDK21+ only if security manager is disabled

TLDR:

For <JDK21 - we will use multipart downloads behind a feature flag, with a fallback to file-based downloads until GA
For JDK21+ - We will use multipart downloads with virtual threads (documenting the behavior with SecurityManager) and a fallback to platform/blocking thread based multipart downloads

Code

With Security Manager: main...kotwanikunal:OpenSearch:virtual-thread-sm
Without Security Manager: main...kotwanikunal:OpenSearch:virtual-thread

cc: @Bukhtawar @andrross @mch2

reta · 2024-03-08T18:07:06Z

@kotwanikunal thanks a lot for exploring this area, there have been discussions related to virtual threads for a while now (both in OpenSearch and Apache Lucene communities), at the moment there are a few cautionary cases that we may look into more closely:

file system I/O (basically accessing files on disk) is not virtual thread friendly yet, at least on some operation systems, https://openjdk.org/jeps/444 warns about that explicitly

The vast majority of blocking operations in the JDK will unmount the virtual thread, freeing its carrier and the underlying OS thread to take on new work. However, some blocking operations in the JDK do not unmount the virtual thread, and thus block both its carrier and the underlying OS thread. This is because of limitations at either the OS level (e.g., many filesystem operations) or the JDK level ...
as you mentioned, Security Manager is a bummer ([RFC] Replace Java Security Manager (JSM) #1687), the approached to pass through or disable it are (in my opinion) overly simplistic - they completely defeats the security boundaries. Since the issue is not easy to solve at large, we may consider "sealing" the usage of virtual threads in core to some "trusted" code flows, making sure there is no way to hijack it with "non-trusted" one (aka plugins)
as a side note, we would very likely need to reconsider the large part of the core related to resource tracking, hot threads reporting, pools configurations, .... I am not 100% sure but I believe we may have surprises here.

kotwanikunal · 2024-03-09T00:21:16Z

@kotwanikunal thanks a lot for exploring this area, there have been discussions related to virtual threads for a while now (both in OpenSearch and Apache Lucene communities), at the moment there are a few cautionary cases that we may look into more closely:

file system I/O (basically accessing files on disk) is not virtual thread friendly yet, at least on some operation systems, https://openjdk.org/jeps/444 warns about that explicitly

Nice to hear from you @reta! :)
Given that we have gains, even with the platform threaded approach and there will be improvements over time with virtual threads, would it make more sense to head in this direction v/s reactive/async programming model? Virtual threads will still be opt-in, possibly with a feature flag, to have some bake time.

kotwanikunal · 2024-03-09T00:23:16Z

as a side note, we would very likely need to reconsider the large part of the core related to resource tracking, hot threads reporting, pools configurations, .... I am not 100% sure but I believe we may have surprises here.

For the approach we are looking at, the queue is not unbounded and we will limit job submissions at the recovery/replication event level. We can build up guardrails and framework around it with these set of operations.

reta · 2024-03-09T16:14:44Z

Nice to hear from you @reta! :)

❤️ Same, @kotwanikunal !

Given that we have gains, even with the platform threaded approach and there will be improvements over time with virtual threads, would it make more sense to head in this direction v/s reactive/async programming model?

I think for downloading segments (network I/O) the virtual threads are the perfect fit, for filesystem we may need to weight in the tradeoff. As far as I understand, the experimental implementation does not indicate from where exactly gains come from, just in general there are some. If we could collect precise measurements here - it would help.

For the approach we are looking at, the queue is not unbounded and we will limit job submissions at the recovery/replication event level. We can build up guardrails and framework around it with these set of operations.

My apologies, guardrails are certainly needed but I meant different subject here: we need to have visibility into virtual threads disregarding where they are being used. Can we track the CPU share used by virtual thread? (as we do for regular one) Can we capture stacks to collect hot threads? And things like that ...

kotwanikunal added enhancement Enhancement or improvement to existing feature or request untriaged labels Jan 2, 2024

github-actions bot added the Search:Remote Search label Jan 2, 2024

kotwanikunal removed the untriaged label Jan 2, 2024

kartg self-assigned this Jan 17, 2024

kotwanikunal mentioned this issue Jan 20, 2024

[Discuss] High level plan for multistream downloads #11461

Open

kotwanikunal assigned mch2 and unassigned kartg Jan 29, 2024

kotwanikunal self-assigned this Feb 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] [Parallel downloads] Investigate the use of virtual threads as the new I/O mechanism for segment downloads #11708

[Feature Request] [Parallel downloads] Investigate the use of virtual threads as the new I/O mechanism for segment downloads #11708

kotwanikunal commented Jan 2, 2024

kotwanikunal commented Mar 8, 2024

reta commented Mar 8, 2024 •

edited

Loading

kotwanikunal commented Mar 9, 2024

kotwanikunal commented Mar 9, 2024

reta commented Mar 9, 2024

[Feature Request] [Parallel downloads] Investigate the use of virtual threads as the new I/O mechanism for segment downloads #11708

[Feature Request] [Parallel downloads] Investigate the use of virtual threads as the new I/O mechanism for segment downloads #11708

Comments

kotwanikunal commented Jan 2, 2024

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Related component

Describe alternatives you've considered

Additional context

kotwanikunal commented Mar 8, 2024

Experiments performed

Recovery

Replication

Security (JSM)

Path forward

Code

reta commented Mar 8, 2024 • edited Loading

kotwanikunal commented Mar 9, 2024

kotwanikunal commented Mar 9, 2024

reta commented Mar 9, 2024

reta commented Mar 8, 2024 •

edited

Loading