Big Data Analytics Project - Fall 2021 - Link to iMEdD GitHub repo
- Task 1 : Given all speeches (for all years) we need to detect the different topics (i.e., thematic areas), most important keywords and how they change across years
- Task 2 : Given all speeches we need to detect pairwise similarities between parliament members & detect the top-k pairs with the highest degree of similarity
- Task 3 : For each member and also for each party we need to detect how the most important keywords evolve across years.
- Task 4 : Detect any significant deviation (per member, per party or in general) with respect to the speeches before and after the crisis
- Task 5 : Taking into account all speeches, we need to detect if we can group them in meaningful clusters.Check about the participation of each member in each cluster and also the participation of each party in the cluster.
- Task 6 : % of Male/Female positions in the parliament over the years
How to package and run a spark application
- Run
package
from thesbt shell
(IntelliJ) - Once the
.jar
is created in thetarget
folder run this command once inside that folder
spark-submit \
--class <Name of the main class> \
--master local[*] \
--executor-memory 8G \
--total-executor-cores 4 \
/path/to/examples.jar <add optional arguments here>