-
Clone the repo
git clone https://github.com/imjuliengaupin/sparkler.git
⚙️ Features
-
Modular and configurable to work with a locally installed, pseudo-distributed
Apache Hadoopmachine cluster -
Apache Sparkstructured event streaming withApache Kafka -
Distributed Extract-Transform-Load (ETL) data processing with
Apache Spark-
A custom Suite class (leveraging object-oriented programming abstraction concepts) to create independent and modular objects that leverage common functionality and can be used when connecting to different databases to extract data into a DataFrame object to apply transformations, using the
DataFrame API -
A custom Suite class (leveraging object-oriented programming abstraction concepts) to create independent and modular objects that leverage common functionality and can be used when extracting the content of different file varieties into a DataFrame object to apply transformations, using the
DataFrame API
-
See the open issues for a full list of proposed features (and known issues).
🔁 CI/CD
💻 Demo
If you find interest in this project and want to share your own insights, enhancements, or bugfixes, please feel free to contribute!
- Fork the project
- Create your feature branch
git checkout -b feature/branchname - Commit your changes
git commit -m 'description' - Push your feature branch
git push origin feature/branchname - Open a pull request
📝 License
Distributed under the MIT License. See LICENSE for more information.
