Skip to content

polydbms/xdbc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

XDBC

  • XDBC is a holistic, high-performance framework for fast and scalable data transfers across heterogeneous data systems (e.g. DBMS to dataframes) aiming to combine the generality of generic solutions with performance of specialized connectors
  • It decomposes data transfer into a configurable pipeline (read -> deserialize -> compress -> send/receive -> decompress -> serialize -> write) with pipeline-parallel execution and ring-buffer memory manager for low resource overhead.
  • The core of the framework (xdbc-client and xdbc-server) are written in C++ with bindings available for Python and Spark. It includes built-in adapters to connect to PostgreSQL, CSV, Parquet and Pandas.
  • The project includes a lightweight heuristic optimizer implemented in Python that automatically tunes the parallelism, buffer sizes, intermediate formats and compression algorithms to the current environment.

Project Structure

XDBC consists of multiple repositories covering the cross-system functionality. For the reproducibility experiments the following repositories will be cloned and used :

  • xdbc-client Client-side module, for loading data into the target system.
  • xdbc-server Server-side module, for extracting the data from the source system.
  • xdbc-python Python bindings for loading data into Pandas (through pybind).
  • xdbc-spark Spark bindings, for loading data into a Spark RDD (through a custom DataSource with JNI).
  • pg_xdbc_fdw PostgreSQL Foreign Data Wrapper, for loading data into a table.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published