-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-3731: MVP to read parquet in R library #3230
Changes from 2 commits
22d89dd
456c5d2
9e1897f
ff260c5
e936b44
7d6e64d
56adad2
75ba5c9
8ccaa51
1df3026
c67fa3d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
#' Read parquet file from disk | ||
#' | ||
#' @param files a vector of filenames | ||
#' @export | ||
read_parquet = function(files) { | ||
tables = lapply(files, function(f) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we'd probably use |
||
return (as_tibble(shared_ptr(`arrow::Table`, read_parquet_file(f)))) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we could have only one parquet reader instead of one per file. But I need to read more about the parquet api first. |
||
}) | ||
do.call('rbind', tables) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We'd probably use
|
||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -49,7 +49,8 @@ if [ "$INCLUDE_DIR" ] || [ "$LIB_DIR" ]; then | |
elif [ "$PKGCONFIG_CFLAGS" ] || [ "$PKGCONFIG_LIBS" ]; then | ||
echo "Found pkg-config cflags and libs!" | ||
PKG_CFLAGS=${PKGCONFIG_CFLAGS} | ||
PKG_LIBS=${PKGCONFIG_LIBS} | ||
#PKG_LIBS=${PKGCONFIG_LIBS} | ||
PKG_LIBS="-larrow -lparquet" # hard coded! what is the right way to do this? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this should rather be addressed by updating the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Having this seems to work for me:
This needs an update to the
|
||
elif [[ "$OSTYPE" == "darwin"* ]]; then | ||
if [ "$(command -v brew)" ]; then | ||
BREWDIR=$(brew --prefix) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
// // Licensed to the Apache Software Foundation (ASF) under one | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The cpp files must adhere to the style. You can enforce that with
|
||
// // or more contributor license agreements. See the NOTICE file | ||
// // distributed with this work for additional information | ||
// // regarding copyright ownership. The ASF licenses this file | ||
// // to you under the Apache License, Version 2.0 (the | ||
// // "License"); you may not use this file except in compliance | ||
// // with the License. You may obtain a copy of the License at | ||
// // | ||
// // http://www.apache.org/licenses/LICENSE-2.0 | ||
// // | ||
// // Unless required by applicable law or agreed to in writing, | ||
// // software distributed under the License is distributed on an | ||
// // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
// // KIND, either express or implied. See the License for the | ||
// // specific language governing permissions and limitations | ||
// // under the License. | ||
// | ||
// | ||
#include <arrow/api.h> | ||
#include <arrow/io/api.h> | ||
#include <parquet/arrow/reader.h> | ||
#include <parquet/arrow/writer.h> | ||
#include <parquet/exception.h> | ||
|
||
// [[Rcpp::export]] | ||
std::shared_ptr<arrow::Table> read_parquet_file(std::string filename) { | ||
std::shared_ptr<arrow::io::ReadableFile> infile; | ||
PARQUET_THROW_NOT_OK(arrow::io::ReadableFile::Open( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Although the name is a big hint, I'm not sure what the macro does and if it should perhaps be replaced by something that plays well with Rcpp. |
||
filename, arrow::default_memory_pool(), &infile)); | ||
|
||
std::unique_ptr<parquet::arrow::FileReader> reader; | ||
PARQUET_THROW_NOT_OK( | ||
parquet::arrow::OpenFile(infile, arrow::default_memory_pool(), &reader)); | ||
std::shared_ptr<arrow::Table> table; | ||
PARQUET_THROW_NOT_OK(reader->ReadTable(&table)); | ||
|
||
return table; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs the license headers, you can just copy from any other R file.
https://travis-ci.org/apache/arrow/jobs/470349974#L493