Skip to content

JohanMabille/sparrow

 
 

Repository files navigation

sparrow

GHA Linux GHA OSX GHA Windows GHA Docs Codecov Integration tests

C++20 idiomatic APIs for the Apache Arrow Columnar Format

Introduction

sparrow is an implementation of the Apache Arrow Columnar format in C++. It provides array structures with idiomatic APIs and convenient conversions from and to the C interface.

sparrow requires a modern C++ compiler supporting C++20.

Installation

Package managers

We provide a package for the mamba (or conda) package manager:

mamba install -c conda-forge sparrow

Install from sources

sparrow has a few dependencies that you can install in a mamba environment:

mamba env create -f environment-dev.yml
mamba activate sparrow

You can then create a build directory, and build the project and install it with cmake:

mkdir build
cd build
cmake .. \
    -DCMAKE_BUILD_TYPE=Debug \
    -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX \
    -DBUILD_EXAMPLES=ON \
    -DBUILD_TESTS=ON \
    -BUILD_DOCS=ON \
    ..
make install

Usage

Requirements

Compilers:

  • Clang 18 or higher
  • GCC 11.2 or higher
  • Apple Clang 16 or higher
  • MSVC 19.41 or higher

Initialize data with sparrow and extract C data structures

#include "sparrow/sparrow.hpp"
namespace sp = sparrow;

sp::primitive_array<int> ar = { 1, 3, 5, 7, 9 };
auto [arrow_array, arrow_schema] = sp::extract_arrow_structures(std::move(ar));
// Use arrow_array and arrow_schema as you need (serialization, passing it to
// a third party library)
// ...
// You are responsible for releasing the structure in the end
arrow_array.release(&arrow_array);
arrow_schema.release(&arrow_schema);

Initialize data with sparrow and use C data structures

#include "sparrow/sparrow.hpp"
namespace sp = sparrow;

sp::primitive_array<int> ar = { 1, 3, 5, 7, 9 };
// Caution: get_arrow_structures returns pointers, not values
auto [arrow_array, arrow_schema] = sp::get_arrow_structures(ar);
// Use arrow_array and arrow_schema as you need (serialization, passing it to
// a third party library)
// ...
// do NOT release the C structures in the end, the "ar" variable will do it for you

Read data from somewhere and pass it to sparrow

#include "sparrow/sparrow.hpp"
#include "thrid-party-lib.hpp"
namespace sp = sparrow;
namespace tpl = third_party_library;

ArrowArray array;
ArrowSchema schema;
tpl::read_arrow_structures(&array, &schema);

sp::array ar(&array, &schema);
// Use ar as you need
// ...
// You are responsible for releasing the structure in the end
array.release(&array);
schema.release(&schema);

Read data from somewhere and move it into sparrow

#include "sparrow/sparrow.hpp"
#include "thrid-party-lib.hpp"
namespace sp = sparrow;
namespace tpl = third_party_library;

ArrowArray array;
ArrowSchema schema;
tpl::read_arrow_structures(&array, &schema);

sp::array ar(std::move(array), std::move(schema));
// Use ar as you need
// ...
// do NOT release the C structures in the end, the "ar" variable will do it for you

Documentation

The documentation (currently being written) can be found at https://man-group.github.io/sparrow/index.html

Acknowledgements

This development has been funded as part of a collaboration between ArcticDB, Bloomberg, and QuantStack.

License

This software is licensed under the Apache License 2.0. See the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages

  • C++ 97.4%
  • CMake 2.3%
  • Other 0.3%