Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is ObjectStoreSchemaProvider Still Needed? #2656

Closed
tustvold opened this issue May 30, 2022 · 2 comments · Fixed by #2665
Closed

Is ObjectStoreSchemaProvider Still Needed? #2656

tustvold opened this issue May 30, 2022 · 2 comments · Fixed by #2665
Labels
enhancement New feature or request question Further information is requested

Comments

@tustvold
Copy link
Contributor

tustvold commented May 30, 2022

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

The ObjectStoreSchemaProvider is a SchemaProvider that provides the ability to register ObjectStore and ListingTable. Effectively it is a MemorySchemaProvider but which maintains its own ObjectStoreRegistry instead of using the one on RuntimeEnv.

I'm quite possibly missing something, but I'm not entirely sure why this is needed?

Describe the solution you'd like

If it isn't needed, it would be good to remove it.

Describe alternatives you've considered

If it is needed, we should keep it 😄

Thoughts @matthewmturner ?

@tustvold tustvold added enhancement New feature or request question Further information is requested labels May 30, 2022
@matthewmturner
Copy link
Contributor

FYI #1836 and #1863 is background conversation on the API design for this.

The original idea was to use this as shown below (slightly updated from how it was discussed in the above issue) - i.e. easily register multiple tables from a prefix into a schema.

let object_store = S3FileSystem::default();
let schema = ObjectStoreSchemaProvider::new();
schema.register_store(object_store);
let tables = object_store.list_dir("s3://active/schema1");
tables.iter().map(|file|  {
    let config = ListingTableConfig::new(object_store, file).infer().await?;
    let name = extract_name_from_file(&file);
    schema.register_listing_table(name, file, config);
}

Thinking on it now, i think the same could be done without the need for this abstraction, like the following:

let object_store = S3FileSystem::default();
let schema = MemorySchemaProvider::new();
let tables = object_store.list_dir("s3://active/schema1");
tables.iter().map(|file|  {
    let config = ListingTableConfig::new(object_store, file).infer().await?;
    let table = ListingTable::try_new(config)?;
    let name = extract_name_from_file(&file);
    schema.register_table(name, table)?;
}

If you agree, then yes I do think we can remove ObjectStoreSchemaProvider.

tustvold added a commit to tustvold/arrow-datafusion that referenced this issue May 31, 2022
@tustvold
Copy link
Contributor Author

Created #2665

tustvold added a commit that referenced this issue May 31, 2022
* Remove ObjectStoreSchemaProvider (#2656)

* Use custom object store

* Use file schema
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants