Notes on Apache Spark (pyspark)
-
Updated
Mar 3, 2019 - HTML
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Notes on Apache Spark (pyspark)
Apache Spark™ and Scala Workshops
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
Toolkit for Apache Spark ML for Feature clean-up, feature Importance calculation suite, Information Gain selection, Distributed SMOTE, Model selection and training, Hyper parameter optimization and selection, Model interprability.
Scalable Data Science, course sets in big data Using Apache Spark over databricks and their mathematical, statistical and computational foundations using SageMath.
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟
Workshop Big Data en Español
Dockerizing and Consuming an Apache Livy environment
MLFlow End to End Workshop at Chandigarh University
Rails application for the Archives Unleashed Cloud.
Example applications of spark-trend-calculus
A comprehensive implementation of Bitcoin address clustering using multiple heuristic conditions for blockchain analysis and chain analysis applications.
Serene Data Integration Platform
Time series forecasting using Prophet and Apache Spark
NiFi, Data Engineering, Data Ingest, REST, ETL, Mapping, ELT, SQL, Spark, Kafka for Good
UC Davis Distributed Computing with Spark SQL (with Databricks) and Databricks Apache Spark SQL for Data Analysts
Example applications of GDELT mass media intelligence data
Exploratory Analysis of Amazon Product Reviews Dataset comprising of various categories spanning over 14 years
Infant Mortality Data Prediction and Analysis
Created by Matei Zaharia
Released May 26, 2014