Cylon is a fast, scalable, distributed memory, parallel runtime with a Pandas like DataFrame.
-
Updated
Jun 9, 2024 - C++
Cylon is a fast, scalable, distributed memory, parallel runtime with a Pandas like DataFrame.
TPC-H queries in Apache Spark SQL using native DataFrames API
Java Application, uses Apache Spark, handles batch as well as streaming processing
mainframe - a lightweight dataframe library for C++
Apache Spark project for Advanced Topics on Databases course
A sandbox environment designed to simulate a pseudo-distributed Hadoop cluster with integrated Apache Spark and Kafka components. It allows developers to prototype and experiment with big data workflows, test distributed computing patterns, and explore cluster behavior in a contained virtual setup.
Soundhopper project - created for users to skip ahead to specified sections of track - built using Python, and Jupyter notebook.
API converting NYC Department of Health: https://github.com/nychealth/coronavirus-data
Construct Source files as per the target files in Spark using Datframe api and spark
Semester assignment for ECE NTUA 3189 Advanced Topics in Database Systems
make easier the use of columnar spark files
Analysis of American Time Use Survey (ATUS): https://www.kaggle.com/bls/american-time-use-survey
Add a description, image, and links to the dataframes-api topic page so that developers can more easily learn about it.
To associate your repository with the dataframes-api topic, visit your repo's landing page and select "manage topics."