Processing IT Recruitment Data in HDFS Cluster, Spark, Elasticsearch and Kibana, deployed by Docker compose
-
Updated
Jan 13, 2022 - Python
Processing IT Recruitment Data in HDFS Cluster, Spark, Elasticsearch and Kibana, deployed by Docker compose
An easy to use script that automatically adds files to the spark-submit command.
The primary objective of this study is to explore the feasibility of using machine learning algorithms to classify health insurance plans based on their coverage for routine dental services. To achieve this, I used six different classification algorithms: LR, DT, RF, GBT, SVM, FM(Tech: PySpark, SQL, Databricks, Zeppelin books, Hadoop, Spark-Submit)
PySpark-based ETL pipeline that extracts transaction data from a MySQL database, cleans and transforms it, aggregates monthly sales per customer, and writes the processed data to an S3 bucket in Parquet format.
Running Python Engg file with Spark-submit
Movie Recommendation using Apache Spark MLlib
Simple spark environment setup in windows OS.
Add a description, image, and links to the spark-submit topic page so that developers can more easily learn about it.
To associate your repository with the spark-submit topic, visit your repo's landing page and select "manage topics."