Our assignment in this project was as follows:
-
You're a data scientist at a game development company
-
Your latest mobile game has two events you're interested in tracking:
buy a sword
&join guild
-
Each has metadata characterstic of such events (i.e., sword type, guild name, etc)
The commands used to execute the following are recorded in the commands.txt
file in this repository:
-
Instrument your API server to log events to Kafka
- The API server is instrumented in the
game_api.py
file in this repository
- The API server is instrumented in the
-
Use Apache Bench to generate test data for your pipeline
The following is executed in the Project_3
Jupyter Notebook in this repository:
-
Assemble a data pipeline to catch these events: use Spark streaming to filter select event types from Kafka, land them into HDFS/parquet to make them available for analysis using Presto
-
Produce an analytics report where you provide a description of your pipeline and some basic analysis of the events
We were given the following guidelines for this notebook:
Use a notebook to present your queries and findings. Remember that this notebook should be appropriate for presentation to someone else in your business who needs to act on your recommendations.
It's understood that events in this pipeline are generated events which make them hard to connect to actual business decisions. However, we'd like students to demonstrate an ability to plumb this pipeline end-to-end, which includes initially generating test data as well as submitting a notebook-based report of at least simple event analytics.
We were given the following guidelines for additional options.
I did not attempt the options that have been italicized as these are outside of my skillset.
There are plenty of advanced options for this project. Here are some ways to take your project further than just the basics we'll cover in class:
-
Generate and filter more types of events. There are plenty of other things you might capture events for during gameplay
- I created additional events. The final list of event types is as follows:
- Given in assignment:
- default
- purchase_sword
- purchase_knife
- join_guild
- Optional additions:
- purchase_shield
- declare_fealty
- declare_war
- Given in assignment:
- I created additional events. The final list of event types is as follows:
-
Enhance the API to use additional http verbs such as
POST
orDELETE
as well as additionally accept parameters for events (e.g., purchase events might accept sword or item type) -
Connect a user-keyed storage engine such as Redis or Cassandra up to Spark so you can track user state during gameplay (e.g., user's inventory or health)