-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add HTTP object store example #7602
Conversation
ctx.runtime_env() | ||
.register_object_store(&base_url, Arc::new(http_store)); | ||
|
||
// register csv file with the execution context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is cool
// register csv file with the execution context | ||
ctx.register_csv( | ||
"aggregate_test_100", | ||
"https://github.com/apache/arrow-testing/raw/master/data/csv/aggregate_test_100.csv", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I ran this example, even after merging up from main to get the update to arrow 47 I see an error:
cargo run --example query-http-csv
Running `/Users/alamb/Software/target-df2/debug/examples/query-http-csv`
Error: ObjectStore(NotFound { path: "apache/arrow-testing/raw/master/data/csv/aggregate_test_100.csv"
...
xXCp+7drdDBCAdubm6eidX+2WwqT5komwh4YQLk+H4aE93h8Xg2gvHekQZOGSgLZTLyDTLJ4Lx9/KZWKBSainT4Iy3FqQBfnUZR42PKQFksBr9QKVXCPusD3OiA/RkQ5kP8qV/Jl1WywAp/6+dcmPM2zL1UrUahe4JqfnWWKXIul3uUbfP8njAFLW1OFr3gdFtZ72cNH+PtQT7/brW+NXqJAHh0y9V8/U/A1U7AfwIMAD7mS3pCbuWJAAAAAElFTkSuQmCC\">\n </a>\n </div>\n </body>\n</html>\n", source: Some(reqwest::Error { kind: Status(404), url: Url { scheme: "https", cannot_be_a_base: false, username: "", password: None, host: Some(Domain("github.com")), port: None, path: "/apache/arrow-testing/raw/master/data/csv/aggregate_test_100.csv", query: None, fragment: None } }), status: Some(404) } })
Did you have it work for you?
Update: it needs an object store release (tracked by apache/arrow-rs#4858)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @pka -- I took the liberty of merging this branch from master, and then tested with the (unreleased) version of object_store
and it works great!
I applied this patch:
diff --git a/Cargo.toml b/Cargo.toml
index 60ff770d0d..dae6e3c04f 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -74,3 +74,6 @@ opt-level = 3
overflow-checks = false
panic = 'unwind'
rpath = false
+
+[patch.crates-io]
+object_store = { git = "https://github.com/apache/arrow-rs.git" }
And then ran it like this
cargo run --example query-http-csv
Compiling object_store v0.7.0 (https://github.com/apache/arrow-rs.git#2c9e2e9a)
Compiling parquet v47.0.0
Compiling datafusion-common v31.0.0 (/Users/alamb/Software/arrow-datafusion2/datafusion/common)
Compiling datafusion-expr v31.0.0 (/Users/alamb/Software/arrow-datafusion2/datafusion/expr)
Compiling datafusion-physical-expr v31.0.0 (/Users/alamb/Software/arrow-datafusion2/datafusion/physical-expr)
Compiling datafusion-execution v31.0.0 (/Users/alamb/Software/arrow-datafusion2/datafusion/execution)
Compiling datafusion-sql v31.0.0 (/Users/alamb/Software/arrow-datafusion2/datafusion/sql)
Compiling datafusion-physical-plan v31.0.0 (/Users/alamb/Software/arrow-datafusion2/datafusion/physical-plan)
Compiling datafusion-optimizer v31.0.0 (/Users/alamb/Software/arrow-datafusion2/datafusion/optimizer)
Compiling datafusion v31.0.0 (/Users/alamb/Software/arrow-datafusion2/datafusion/core)
Compiling datafusion-examples v31.0.0 (/Users/alamb/Software/arrow-datafusion2/datafusion-examples)
Finished dev [unoptimized + debuginfo] target(s) in 1m 00s
Running `/Users/alamb/Software/target-df2/debug/examples/query-http-csv`
+----+----+-----+
| c1 | c2 | c3 |
+----+----+-----+
| c | 2 | 1 |
| d | 5 | -40 |
| b | 1 | 29 |
| a | 1 | -85 |
| b | 5 | -82 |
+----+----+-----+
Thanks again @pka |
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Which issue does this PR close?
Closes #.
Rationale for this change
There's no example for accessing files over HTTP.
What changes are included in this PR?
Adds an example for reading a CSV file over HTTP.
DRAFT: requires apache/arrow-rs#4837 to be merged
Are these changes tested?
N/A
Are there any user-facing changes?
No