Skip to content

Conversation

@yahoNanJing
Copy link
Contributor

@yahoNanJing yahoNanJing commented Mar 23, 2022

Which issue does this PR close?

Closes #1772.

Rationale for this change

By splitting the object store into a separate module, the object stores in datafusion-contrib no longer need to depend on the whole datafusion crate. They only need to depend on this separated module. What's more, if we hope to introduce the object stores in datafusion-contrib as some featuers in the datafusion crate, the issue of dependency cyclic can be avoided.

What changes are included in this PR?

  • Introduces a new crate datafusion-storage for object store interfaces.
  • Moves the object store in previous datasource mod into this new datafusion-storage crate.

The reason of still keeping ObjectStoreRegistry in the datafusion crate is

  1. for future introduces object store features which can make it possible to have many default object stores besides the LocalFileSystem, like HadoopFileSystem in datafusion-objectstore-hdfs, S3FileSystem in datafusion-objectstore-s3.
  2. the object stores in datafusion-contrib don't need it.

Are there any user-facing changes?

@yahoNanJing
Copy link
Contributor Author

Hi @jimexist, @matthewmturner, @yjshen, @alamb, could you help review this patch?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me -- thank you @yahoNanJing

cc @yjshen @rdettai @matthewmturner and @tustvold

@matthewmturner
Copy link
Contributor

Overall looks good. I think my only point, similar to what we discussed before, is on some of the ObjectStore functionality that is still hanging around in datafusion core i.e. ListingTable and functionality in datasource. I know the ObjectStore doesnt need them and this keeps datafusion-storage light, which of course is nice. To me they just seem logically coupled so at first was odd seeing the functionality split like that.

That being said, i have given this more thought and given how the above mentioned functionalities leverage physical plan, etc. i do think it makes sense to have datafusion wrap these functionalities together. just walking through my thought process out loud :)

thanks for this.

@alamb
Copy link
Contributor

alamb commented Mar 23, 2022

To me they just seem logically coupled so at first was odd seeing the functionality split like that.

🤔 maybe it is time for datafusion-listing-table crate

Copy link
Contributor

@liukun4515 liukun4515 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alamb
Copy link
Contributor

alamb commented Mar 24, 2022

🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

split datafusion-datasource module

5 participants