This is a Github repository created to submit the fifth Homework of the Algorithmic Methods for Data Mining (ADM) course for the MSc. in Data Science at the Sapienza University of Rome.
-
README.md
: A markdown file that explains the content of the repository. -
main.ipynb
: A Jupyter Notebook file containing all the relevant exercises and reports belonging to the homework questions, the Command Line Question, and the Algorithmic Question. -
modules/
: A folder including 4 Python modules used to solve the exercises inmain.ipynb
. The files included are:-
__init__.py
: A init file that allows us to import the modules into our Jupyter Notebook. -
data_handler.py
: A Python file including aDataHandler
class designed to handle data cleaning and feature engineering on Kaggle's Citation Network Dataset. -
backend.py
: A Python file including aBackend
class designed to build 5 functionalities to solve the exercises from the homework. -
frontend.py
: A Python file including aFrontend
class designed to visualize the 5 functionalities of theBackend
to solve the exercises from the homework..
-
-
commandline.sh
: A bash script including the code to solve the Command Line Question. -
.gitignore
: A predetermined.gitignore
file that tells Git which files or folders to ignore in a Python project. -
LICENSE
: A file containing an MIT permissive license.
In this homework we worked with Kaggle's predefined Citation Network Dataset.
If the Notebook doesn't load through Github please try all of these steps:
-
Try compiling the Notebook through its NBViewer.
-
Try downloading the Notebook and opening it in your local computer.
Author: Miguel Angel Sanchez Cortes
Email: sanchezcortes.2049495@studenti.uniroma1.it
MSc. in Data Science, Sapienza University of Rome