Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support read compressed files in COPY statemente #5380

Closed
Tracked by #4736
BohuTANG opened this issue May 14, 2022 · 4 comments · Fixed by #5655
Closed
Tracked by #4736

Support read compressed files in COPY statemente #5380

BohuTANG opened this issue May 14, 2022 · 4 comments · Fixed by #5655
Assignees
Labels
A-storage Area: databend storage
Milestone

Comments

@BohuTANG
Copy link
Member

No description provided.

@BohuTANG BohuTANG changed the title Support read compressed files like <code class="notranslate">xxx.csv.gz (archived file support like <code class="notranslate">tar or <code class="notranslate">zip` is under development) Support read compressed files May 14, 2022
@BohuTANG BohuTANG added the A-storage Area: databend storage label May 14, 2022
@BohuTANG BohuTANG changed the title Support read compressed files Support read compressed files in COPY statement May 14, 2022
@Xuanwo Xuanwo moved this to 📋 Backlog in Xuanwo's Work May 15, 2022
@Xuanwo Xuanwo added this to the v0.8 milestone May 20, 2022
@Xuanwo Xuanwo moved this to 🔨 In Progress in Databend Storage Layer May 20, 2022
@Xuanwo Xuanwo added A-storage Area: databend storage and removed A-storage Area: databend storage labels May 20, 2022
@Xuanwo
Copy link
Member

Xuanwo commented May 20, 2022

This issue is blocked by integrating with the new format trait.

The original opendal design will decompress the IO stream in async runtime which is not expected by the new processor.

I'm working on it to make it possible:

  • IO on async runtime
  • Decompress (which is CPU bound) on sync runtime

Work logged: https://note.xuanwo.io/#/page/opendal%20buffered%20io

@Xuanwo
Copy link
Member

Xuanwo commented May 20, 2022

Have a discussion with async-compression: Nullus157/async-compression#150


Update @ 2022-05-20

The author of async-compression does have a plan to refactor so that we can use the internal codec staff. I will use some workaround for now.

@Xuanwo Xuanwo moved this from 🔨 In Progress to 🔍 In Review in Databend Storage Layer May 20, 2022
@Xuanwo Xuanwo moved this from 🔍 In Review to 🔨 In Progress in Databend Storage Layer May 20, 2022
@Xuanwo
Copy link
Member

Xuanwo commented May 26, 2022

I believe we get the correct answer now: apache/opendal#289

@Xuanwo
Copy link
Member

Xuanwo commented May 29, 2022

Sorry to keep you waiting, let's set sail!

In PR #5655, we got this works.

Repository owner moved this from 🔨 In Progress to 📦 Done in Databend Storage Layer May 30, 2022
@Xuanwo Xuanwo changed the title Support read compressed files in COPY statement Support read compressed files in COPY statemente Jun 9, 2022
@Xuanwo Xuanwo moved this from 📋 Backlog to 📦 Done in Xuanwo's Work Jun 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage Area: databend storage
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants