Skip to content

Commit d529de7

Browse files
sgilmore10kou
authored andcommitted
ARROW-12730: [MATLAB] Update featherreadmex and featherwritemex to build against latest Arrow C++ APIs
**Overview** * The MEX functions ``featherreadmex`` and ``featherwritemex`` fail to build against the latest Arrow C++ APIs. These changes allow them to successfully build. * These changes require CMake version 3.20 or later in order to access the latest functionality exposed by [FindMatlab.cmake](https://cmake.org/cmake/help/latest/module/FindMatlab.html). We noticed that some Arrow project components, such as [Gandiva](https://arrow.apache.org/docs/developers/cpp/building.html?highlight=gandiva#cmake-version-requirements), require newer versions of CMake than the core Arrow C++ libraries. If version 3.20 is too new, we're happy to find an alternative. * We couldn't find a way to read and write a table description for feather V1 files using the latest APIs. It looks like support for reading and writing descriptions was modified in pull request #6694. For now, we've removed support for table descriptions. **Testing** * Built ``featherreadmex`` and ``featherwritemex`` on Windows 10 with Visual Studio 2019 * Built ``featherreadmex`` and ``featherwritemex`` on macOS Big Sur (11.2.3) with GNU Make 3.81 * Built ``featherreadmex`` and ``featherwritemex`` on Debian 10 with GNU Make GNU 4.2.1 * Ran all tests in ``tfeather`` and ``tfeathermex`` on all platforms in MATLAB R2021a **Future Directions** * We did not detect the build failures due to the lack of CI integration. We hope to add CI support soon and will follow up with a mailing list discussion to talk through the details. * These changes are temporary to allow us to have a clean slate to start developing the [MATLAB Interface to Apache Arrow](https://github.com/apache/arrow/blob/master/matlab/doc/matlab_interface_for_apache_arrow_design.md). * Eventually we would like to support the full ranges of data types for feather V1 and feather V2. * In order to modernize the code, we plan to migrate to the [C++ MEX](https://www.mathworks.com/help/matlab/cpp-mex-file-applications.html) and [MATLAB Data Array](https://www.mathworks.com/help/matlab/matlab-data-array.html) APIs. * We are going to follow up with another pull request to update the README.md to provide more detailed platform-specific development instructions. * The MATLAB based build system inside of the ``build_support`` folder is out of date. We are not sure if we want to maintain a separate MATLAB based build system along side the CMake based one. We will follow up on this in the future via the mailing list or Jira. We acknowledge there is a lot of information in this pull request. In the future, we will work in smaller increments. We felt a bigger pull request was necessary to get back to a working state. Thanks, Sarah Closes #10305 from sgilmore10/ARROW_12730 Lead-authored-by: sgilmore <sgilmore@mathworks.com> Co-authored-by: sgilmore10 <74676073+sgilmore10@users.noreply.github.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
1 parent cc4b9be commit d529de7

File tree

11 files changed

+191
-144
lines changed

11 files changed

+191
-144
lines changed

matlab/CMakeLists.txt

Lines changed: 42 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,8 @@
1515
# specific language governing permissions and limitations
1616
# under the License.
1717

18-
cmake_minimum_required(VERSION 3.2)
18+
cmake_minimum_required(VERSION 3.20)
19+
1920
set(CMAKE_CXX_STANDARD 11)
2021

2122
set(MLARROW_VERSION "5.0.0-SNAPSHOT")
@@ -29,22 +30,45 @@ if(EXISTS "${CPP_CMAKE_MODULES}")
2930
set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} ${CPP_CMAKE_MODULES})
3031
endif()
3132

32-
## Arrow is Required
33+
set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} ${CMAKE_SOURCE_DIR}/cmake_modules)
34+
35+
# Arrow is Required
3336
find_package(Arrow REQUIRED)
3437

35-
## MATLAB is required to be installed to build MEX interfaces
36-
set(MATLAB_ADDITIONAL_VERSIONS "R2018a=9.4")
37-
find_package(Matlab REQUIRED MX_LIBRARY)
38-
39-
# Build featherread mex file based on the arrow shared library
40-
matlab_add_mex(NAME featherreadmex
41-
SRC src/featherreadmex.cc src/feather_reader.cc src/util/handle_status.cc
42-
src/util/unicode_conversion.cc
43-
LINK_TO ${ARROW_SHARED_LIB})
44-
target_include_directories(featherreadmex PRIVATE ${ARROW_INCLUDE_DIR})
45-
46-
# Build featherwrite mex file based on the arrow shared library
47-
matlab_add_mex(NAME featherwritemex
48-
SRC src/featherwritemex.cc src/feather_writer.cc src/util/handle_status.cc
49-
LINK_TO ${ARROW_SHARED_LIB})
50-
target_include_directories(featherwritemex PRIVATE ${ARROW_INCLUDE_DIR})
38+
# MATLAB is Required
39+
find_package(Matlab REQUIRED)
40+
41+
# Construct the absolute path to featherread's source files
42+
set(featherread_sources featherreadmex.cc feather_reader.cc util/handle_status.cc
43+
util/unicode_conversion.cc)
44+
list(TRANSFORM featherread_sources PREPEND ${CMAKE_SOURCE_DIR}/src/)
45+
46+
# Build featherreadmex MEX binary
47+
matlab_add_mex(R2018a
48+
NAME featherreadmex
49+
SRC ${featherread_sources}
50+
LINK_TO arrow_shared)
51+
52+
# Construct the absolute path to featherwrite's source files
53+
set(featherwrite_sources featherwritemex.cc feather_writer.cc util/handle_status.cc
54+
util/unicode_conversion.cc)
55+
list(TRANSFORM featherwrite_sources PREPEND ${CMAKE_SOURCE_DIR}/src/)
56+
57+
# Build featherwritemex MEX binary
58+
matlab_add_mex(R2018a
59+
NAME featherwritemex
60+
SRC ${featherwrite_sources}
61+
LINK_TO arrow_shared)
62+
63+
# Ensure the MEX binaries are placed in the src directory on all platforms
64+
if(WIN32)
65+
set_target_properties(featherreadmex PROPERTIES RUNTIME_OUTPUT_DIRECTORY
66+
$<1:${CMAKE_SOURCE_DIR}/src>)
67+
set_target_properties(featherwritemex PROPERTIES RUNTIME_OUTPUT_DIRECTORY
68+
$<1:${CMAKE_SOURCE_DIR}/src>)
69+
else()
70+
set_target_properties(featherreadmex PROPERTIES LIBRARY_OUTPUT_DIRECTORY
71+
$<1:${CMAKE_SOURCE_DIR}/src>)
72+
set_target_properties(featherwritemex PROPERTIES LIBRARY_OUTPUT_DIRECTORY
73+
$<1:${CMAKE_SOURCE_DIR}/src>)
74+
endif()

matlab/src/+mlarrow/+util/createMetadataStruct.m

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
function metadata = createMetadataStruct(description, numRows, numVariables)
1+
function metadata = createMetadataStruct(numRows, numVariables)
22
% CREATEMETADATASTRUCT Helper function for creating Feather MEX metadata
33
% struct.
44

@@ -17,8 +17,7 @@
1717
% implied. See the License for the specific language governing
1818
% permissions and limitations under the License.
1919

20-
metadata = struct('Description', description, ...
21-
'NumRows', numRows, ...
20+
metadata = struct('NumRows', numRows, ...
2221
'NumVariables', numVariables);
2322
end
2423

matlab/src/+mlarrow/+util/table2mlarrow.m

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,6 @@
2323
%
2424
% Field Name Class Description
2525
% ------------ ------- ----------------------------------------------
26-
% Description char Table description (T.Properties.Description)
2726
% NumRows double Number of table rows (height(T))
2827
% NumVariables double Number of table variables (width(T))
2928
%
@@ -51,7 +50,7 @@
5150
variables = repmat(createVariableStruct('', [], [], ''), 1, width(t));
5251

5352
% Struct representing table-level metadata.
54-
metadata = createMetadataStruct(t.Properties.Description, height(t), width(t));
53+
metadata = createMetadataStruct(height(t), width(t));
5554

5655
% Iterate over each variable in the given table,
5756
% extracting the underlying array data.

matlab/src/feather_reader.cc

Lines changed: 49 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -18,16 +18,21 @@
1818
#include <algorithm>
1919
#include <cmath>
2020

21+
#include "feather_reader.h"
22+
23+
#include <arrow/array/array_base.h>
24+
#include <arrow/array/builder_base.h>
25+
#include <arrow/array/builder_primitive.h>
2126
#include <arrow/io/file.h>
2227
#include <arrow/ipc/feather.h>
28+
#include <arrow/result.h>
2329
#include <arrow/status.h>
2430
#include <arrow/table.h>
2531
#include <arrow/type.h>
26-
#include <arrow/util/bit-util.h>
27-
32+
#include <arrow/type_traits.h>
33+
#include <arrow/util/bitmap_visit.h>
2834
#include <mex.h>
2935

30-
#include "feather_reader.h"
3136
#include "matlab_traits.h"
3237
#include "util/handle_status.h"
3338
#include "util/unicode_conversion.h"
@@ -52,11 +57,11 @@ mxArray* ReadNumericVariableData(const std::shared_ptr<Array>& column) {
5257
mxArray* variable_data =
5358
mxCreateNumericMatrix(column->length(), 1, matlab_class_id, mxREAL);
5459

55-
std::shared_ptr<ArrowArrayType> integer_array =
60+
auto arrow_numeric_array =
5661
std::static_pointer_cast<ArrowArrayType>(column);
5762

5863
// Get a raw pointer to the Arrow array data.
59-
const MatlabType* source = integer_array->raw_values();
64+
const MatlabType* source = arrow_numeric_array->raw_values();
6065

6166
// Get a mutable pointer to the MATLAB array data and std::copy the
6267
// Arrow array data into it.
@@ -121,8 +126,7 @@ void BitUnpackBuffer(const std::shared_ptr<Buffer>& source, int64_t length,
121126
// writes to a zero-initialized destination buffer.
122127
// Implements a fast path for the fully-valid and fully-invalid cases.
123128
// Returns true if the destination buffer was successfully populated.
124-
bool TryBitUnpackFastPath(const std::shared_ptr<Array>& array,
125-
mxLogical* destination) {
129+
bool TryBitUnpackFastPath(const std::shared_ptr<Array>& array, mxLogical* destination) {
126130
const int64_t null_count = array->null_count();
127131
const int64_t length = array->length();
128132

@@ -177,32 +181,24 @@ Status FeatherReader::Open(const std::string& filename,
177181
*feather_reader = std::shared_ptr<FeatherReader>(new FeatherReader());
178182

179183
// Open file with given filename as a ReadableFile.
180-
std::shared_ptr<io::ReadableFile> readable_file(nullptr);
181-
182-
RETURN_NOT_OK(io::ReadableFile::Open(filename, &readable_file));
183-
184-
// TableReader expects a RandomAccessFile.
185-
std::shared_ptr<io::RandomAccessFile> random_access_file(readable_file);
186-
184+
ARROW_ASSIGN_OR_RAISE(auto readable_file, io::ReadableFile::Open(filename));
185+
187186
// Open the Feather file for reading with a TableReader.
188-
RETURN_NOT_OK(ipc::feather::TableReader::Open(random_access_file,
189-
&(*feather_reader)->table_reader_));
190-
191-
// Read the table metadata from the Feather file.
192-
(*feather_reader)->num_rows_ = (*feather_reader)->table_reader_->num_rows();
193-
(*feather_reader)->num_variables_ = (*feather_reader)->table_reader_->num_columns();
194-
(*feather_reader)->description_ =
195-
(*feather_reader)->table_reader_->HasDescription()
196-
? (*feather_reader)->table_reader_->GetDescription()
197-
: "";
198-
199-
if ((*feather_reader)->num_rows_ > internal::MAX_MATLAB_SIZE ||
200-
(*feather_reader)->num_variables_ > internal::MAX_MATLAB_SIZE) {
201-
mexErrMsgIdAndTxt("MATLAB:arrow:SizeTooLarge",
202-
"The table size exceeds MATLAB limits: %u x %u",
203-
(*feather_reader)->num_rows_, (*feather_reader)->num_variables_);
187+
ARROW_ASSIGN_OR_RAISE(auto reader, ipc::feather::Reader::Open(readable_file));
188+
189+
// Set the internal reader_ object.
190+
(*feather_reader)->reader_ = reader;
191+
192+
// Check the feather file version
193+
auto version = reader->version();
194+
if (version == ipc::feather::kFeatherV2Version) {
195+
return Status::NotImplemented("Support for Feather V2 has not been implemented.");
196+
} else if (version != ipc::feather::kFeatherV1Version) {
197+
return Status::Invalid("Unknown Feather format version.");
204198
}
205199

200+
// read the table metadata from the Feather file
201+
(*feather_reader)->num_variables_ = reader->schema()->num_fields();
206202
return Status::OK();
207203
}
208204

@@ -225,15 +221,11 @@ mxArray* FeatherReader::ReadMetadata() const {
225221
mxSetField(metadata, 0, "NumVariables",
226222
mxCreateDoubleScalar(static_cast<double>(num_variables_)));
227223

228-
// Set the description.
229-
mxSetField(metadata, 0, "Description",
230-
util::ConvertUTF8StringToUTF16CharMatrix(description_));
231-
232224
return metadata;
233225
}
234226

235227
// Read the table variables from the Feather file as a mxArray*.
236-
mxArray* FeatherReader::ReadVariables() const {
228+
mxArray* FeatherReader::ReadVariables() {
237229
const int32_t num_variable_fields = 4;
238230
const char* fieldnames[] = {"Name", "Type", "Data", "Valid"};
239231

@@ -242,16 +234,34 @@ mxArray* FeatherReader::ReadVariables() const {
242234
mxArray* variables =
243235
mxCreateStructMatrix(1, num_variables_, num_variable_fields, fieldnames);
244236

245-
// Read all the table variables in the Feather file into memory.
237+
std::shared_ptr<arrow::Table> table;
238+
auto status = reader_->Read(&table);
239+
if (!status.ok()) {
240+
mexErrMsgIdAndTxt("MATLAB:arrow:FeatherReader::FailedToReadTable",
241+
"Failed to read arrow::Table from Feather file. Reason: %s",
242+
status.message().c_str());
243+
}
244+
245+
// Set the number of rows
246+
num_rows_ = table->num_rows();
247+
248+
if (num_rows_ > internal::MAX_MATLAB_SIZE ||
249+
num_variables_ > internal::MAX_MATLAB_SIZE) {
250+
mexErrMsgIdAndTxt("MATLAB:arrow:SizeTooLarge",
251+
"The table size exceeds MATLAB limits: %u x %u", num_rows_,
252+
num_variables_);
253+
}
254+
255+
auto column_names = table->ColumnNames();
256+
246257
for (int64_t i = 0; i < num_variables_; ++i) {
247-
std::shared_ptr<ChunkedArray> column;
248-
util::HandleStatus(table_reader_->GetColumn(i, &column));
258+
auto column = table->column(i);
249259
if (column->num_chunks() != 1) {
250260
mexErrMsgIdAndTxt("MATLAB:arrow:FeatherReader::ReadVariables",
251261
"Chunked columns not yet supported");
252262
}
253263
std::shared_ptr<Array> chunk = column->chunk(0);
254-
const std::string column_name = table_reader_->GetColumnName(i);
264+
const std::string column_name = column_names[i];
255265

256266
// set the struct fields data
257267
mxSetField(variables, i, "Name", internal::ReadVariableName(column_name));

matlab/src/feather_reader.h

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,6 @@
2323
#include <arrow/ipc/feather.h>
2424
#include <arrow/status.h>
2525
#include <arrow/type.h>
26-
2726
#include <matrix.h>
2827

2928
namespace arrow {
@@ -56,7 +55,7 @@ class FeatherReader {
5655
/// Clients are responsible for freeing the returned mxArray memory
5756
/// when it is no longer needed, or passing it to MATLAB to be managed.
5857
/// \return variables mxArray* struct array containing table variable data
59-
mxArray* ReadVariables() const;
58+
mxArray* ReadVariables();
6059

6160
/// \brief Initialize a FeatherReader object from a given Feather file.
6261
/// \param[in] filename path to a Feather file
@@ -66,12 +65,11 @@ class FeatherReader {
6665

6766
private:
6867
FeatherReader() = default;
69-
std::unique_ptr<ipc::feather::TableReader> table_reader_;
68+
std::shared_ptr<ipc::feather::Reader> reader_;
7069
int64_t num_rows_;
7170
int64_t num_variables_;
7271
std::string description_;
7372
};
7473

7574
} // namespace matlab
7675
} // namespace arrow
77-

0 commit comments

Comments
 (0)