Skip to content

High performance cloud object storage (for reading chunked multi-dimensional arrays) #10

Open
@JackKelly

Description

@JackKelly

Some notes (in no particular order) about speeding up reading many chunks of data from cloud storage.

General info about cloud storage buckets

In general, cloud storage buckets are highly distributed storage systems. They can be capable of very high bandwidth. But - crucially - they also have very high latencies of 100-200 milliseconds for first-byte-out.

Contrast this with read latencies of locally attached storage: HDD: ~5 ms; SSD: ~80 µs; DDR5 RAM: 90 ns. In other words: cloud storage latencies are two orders of magnitude higher than a local HDD; and cloud storage latencies are four orders of magnitude higher than a local SSD!

How many IOPS can we get from cloud storage to a single machine?

AWS docs say:

Your applications can easily achieve thousands of transactions per second in request performance when uploading and retrieving storage from Amazon S3. Amazon S3 automatically scales to high request rates [although the scaling takes time]. For example, your application can achieve at least 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per partitioned Amazon S3 prefix. There are no limits to the number of prefixes in a bucket. You can increase your read or write performance by using parallelization. For example, if you create 10 prefixes in an Amazon S3 bucket to parallelize reads, you could scale your read performance to 55,000 read requests per second...

Some data lake applications on Amazon S3 scan millions or billions of objects for queries that run over petabytes of data. These data lake applications achieve single-instance transfer rates that maximize the network interface use for their Amazon EC2 instance, which can be up to 100 Gb/s on a single instance. These applications then aggregate throughput across multiple instances to get multiple terabits per second.

(Also see this Stack Overflow discusion)

So... it sounds like we could achieve arbitrarily high IOPS by using lots of prefixes.

How fast can a single VM go? A 100 Gbps NIC could submit 20 million GET requests per second (assuming each GET request is 500 bytes). And a 100 Gbps NIC could receive 2 million 4 kB chunks per second (assuming a 20% overhead, so each request is 5 kB); or 2,000 x 4 MB chunks per second.

To hit 2 million IOPS, you'd need to evenly distribute your reads across 364 prefixes.

2 million requests per second per machine is a lot! It wasn't that long ago that it was considered very impressive if a single HTTP server could handle 10,000 RPS. I'm not even sure if 2 million RPS is actually possible! Here's a benchmark of a proxy server handling 2 million RPS, on a single machine.

I'd assume that, in order for the operating system to handle anything close to 2 million requests per second, we'd have to submit our GET requests using io_uring.

Some "tricks" that LSIO could use to improve read bandwidth from cloud storage buckets to a single VM:

  • Submit many GET requests concurrently (already planned)
  • GET large files in multiple "parts" (using the HTTP Range header, or partNumber), and download the parts in parallel (already planned)
  • Stripe data across many prefixes (like RAID 0). (LSIO could act just like a RAID controller: LSIO's user would read and write single files. Under the hood, LSIO would stripe that data over a use-configurable number of prefixes. The RAID chunk size would be user-configurable. Or, perhaps, when the user submits writes as a sequence of arbitrary-sized chunks (like Zarr) then LSIO could split those chunks across multiple S3 prefixes. And the docs could tell users that chunks which are often read together should be close together in the submitted sequence of chunks. That works for striping. And works for merging reads into sequential reads for HDDs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    cloud storageenhancementNew feature or requestperformanceImprovements to runtime performancequestionFurther information is requested

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions