This is an ETL pipeline to pull bitcoin exchange data from CoinCap API and load it into our data warehouse. For more details check out the blog at https://startdataengineering.com/post/data-engineering-project-to-impress-hiring-managers/
Code available at bitcoinMonitor repository.
You can run this data pipeline using GitHub codespaces. Follow the instructions below.
- Create codespaces by going to the bitcoinMonitor repository, forking it and then clicking on
Create codespaces on main
button. - Wait for codespaces to start, then in the terminal type
make up
. - Wait for
make up
to complete, and then wait for 30s (give Metabase sometime to setup). - After 30s go to the
ports
tab and click on the link exposing port3000
to access Metabase UI (username and password issdeuser
andsdepassword1234
respectively). Seemetabase connection settings
screenshot below for connection details.
Note: The screenshots show how to run a project on codespaces, please make sure to use the instructions above for this specific project.
The metabase UI will look like the following
Note Make sure to switch off codespaces instance, you only have limited free usage; see docs here.
To run locally, you need:
- git
- Github account
- Docker with at least 4GB of RAM and Docker Compose v1.27.0 or later
Clone the repo and run the following commands to start the data pipeline:
git clone https://github.com/josephmachado/bitcoinMonitor.git
cd bitcoinMonitor
make up
sleep 30 # wait for Metabase to start
make ci # run checks and tests
Go to http:localhost:3000 to see the Metabase UI.
We use python to pull, transform and load data. Our warehouse is postgres. We also spin up a Metabase instance for our presentation layer.
All of the components are running as docker containers.
Read this post, for information on setting up CI/CD, IAC(terraform), "make" commands and automated testing.