There are more detailed setup instructions in Assignment-0 and the other directories. The 424 Detailed Slides (http://www.cs.umd.edu/~amol/cmsc724-spring2022/424-All-Slides.pdf) go through SQL and MongoDB syntax both in sufficient detail if you need (we won't cover the entire syntax in class).
-
Clone the GitHub Class Repository to get started (there are more detailed instructions in
Assignment-0README):git clone https://github.com/umddb/cmsc424-fall2021.git -
You can load the systems directly on your machines (easier on Linux or Mac), but to make things easier, we have provided a Dockerfile to create a virtual image with PostgreSQL, MongoDB, and Apache Spark pre-loaded.
- Install Docker Desktop: https://www.docker.com/products/docker-desktop
- In the top-level directory, run:
docker build -t "cmsc724" . - Confirm that the image was created successfully with
docker images - Run the docker image:
docker run --rm -ti -p 8888:8888 -p 8881:8881 -p 5432:5432 -v /Users/amol/git/cmsc724-spring2022:/data cmsc724:latest. Make sure to replace "/Users/amol/..." with the correct path for you. You may also have to fiddle with the port mappings if you already have things running on port 8888 or 5432. - The above commands mounts the local directory into
/dataon the virtual machine. - Assuming it ran successfully, you should be logged in as
rootin the docker container, and you should see the shell. - NOTE: you will be logged in as
root. - You need to start PostgreSQL Server: /etc/init.d/postgresql start
- At this point, you should be able to use psql:
psql socialnewtork - Start Jupyter:
jupyter-notebook --port=8888 --allow-root --no-browser --ip=0.0.0.0 - On your host machine, you should be able to visit the URL directly (we did the port mapping above when running Docker).
- As soon as you exit the Docker container, the machine will shut down -- so only changes you have made in the /data directory will persist.
-
For MongoDB, following needs to be done after loading:
- Start the MongoDB server:
systemctl start mongod.service - The following should be done if the collections are empty, but they should be loaded fine in the docker image already.
- Load customers (run from
/data):mongoimport --db "analytics" --collection "customers" /data/Assignment-2/sample_analytics/customers.json - Load accounts:
mongoimport --db "analytics" --collection "accounts" /data/Assignment-2/sample_analytics/accounts.json - Load transactions:
mongoimport --db "analytics" --collection "transactions" /data/Assignment-2/sample_analytics/transactions.json
- Load customers (run from
- Start the MongoDB server:
-
For Spark, see the instructions in
Assignment-3README file for setup. -
If you are having trouble installing Docker or somewhere in the steps above, you can also just install the software directly by going through the commands listed in the Dockerfile