feat(csharp): Implement CloudFetch for Databricks Spark driver#2634
Conversation
|
I don’t think we want to do this in the Spark driver, do we? |
the change should be backward compatible. how can I run the test? |
CurtHagenlocher
left a comment
There was a problem hiding this comment.
Thanks for the change! I've left some feedback.
| MemoryStream dataStream; | ||
|
|
||
| // If the data is LZ4 compressed, decompress it | ||
| if (this.isLz4Compressed) |
There was a problem hiding this comment.
Is it possible to leverage the Apache.Arrow.Compression assembly to do decompression? It works by passing a CompressionCodecFactory to the ArrowStreamReader constructor.
There was a problem hiding this comment.
do you have code pointers? I tried it, seems not working.
There was a problem hiding this comment.
I don't, no. I can try to figure it out later; this doesn't need to be blocking.
CurtHagenlocher
left a comment
There was a problem hiding this comment.
The file artifacts/Apache.Arrow.Adbc.TetsDrivers/Apache/Debug/net8.0/.msCoverageSourceRootsMapping_Apache.Arrow.Adbc.Tests.Drivers.Apache is still present in the latest iteration. Could you please remove it from the PR?
Looks fine to me otherwise. One thing we might consider in a future change is to (configurably) fetch more than one link in parallel in order to maximize throughput.
| MemoryStream dataStream; | ||
|
|
||
| // If the data is LZ4 compressed, decompress it | ||
| if (this.isLz4Compressed) |
There was a problem hiding this comment.
I don't, no. I can try to figure it out later; this doesn't need to be blocking.
…e#2634) Initial implementation of adding CloudFetch feature in Databricks Spark Driver. - create a new CloudFetchReader to handle CloudFetch file download and decompress. - Test case for small and large result. Coming changes after this - Adding prefetch to the downloader - Adding renewal for expired presigned url - Retries
Initial implementation of adding CloudFetch feature in Databricks Spark Driver.
Coming changes after this