Spark Resources ↩
This document gathers Spark related resources that caught our attention. |
- Top 80+ Apache Spark Interview Questions and Answers for 2023 by Shivam Arora, July 2023.
- Stateful transformations in Spark Streaming - Part 1 by Ankur Ranjan, February 2023.
- Spark Streaming - Part 2 by Ankur Ranjan, January 2023.
- Spark Streaming - Part 1 by Ankur Ranjan, January 2023.
- Syntactic Sugar in Spark Scala Codebase - Part 1 by Ankur Ranjan, September 2022.
- Just Enough Spark! Core Concepts Revisited !! by Deepak Rajak, June 2020.
- Spark Trigger Options by Sylverter Daniel, March 2019.
- The Good, Bad and Ugly: Apache Spark for Data Science Work by Robert Bennett, June 2018.
- How to Shutdown a Spark Streaming Job Gracefully by Lan Jiang, February 2017.
Blogs ▴
- Databricks Blog :
- Introducing Apache Spark™ 3.5 by Yuanjian Li and al., September 2023.
- Spark Connect Available in Apache Spark 3.4 by Allan Folting and al., April 2023.
- Introducing Spark Connect - The Power of Apache Spark, Everywhere by Stefania Leone and al., July 2022.
- Introducing Apache Spark™ 3.2 by Gengliang Wang and al., October 2021.
- What’s New in Apache Spark™ 3.1 Release for Structured Streaming by Yuanjian Li and Bo Zhang, April 2021.
- Introducing Apache Spark™ 3.1 by Hyukjin Kwon and al., March 2021.
- Tuning Java Garbage Collection for Apache Spark Applications by Daoyuan Wand and Jie Huang, May 2015.
- Rust Macros: Practical Examples and Best Practices, by Aniket Bhattacharyea, July 2023.
spark-shell
– Spark Interactive Shell (Scala).- 👍 waitingfor{code} Blog posts :
- Spark SQL checkpoints, April 2023.
- Generated method too long to be JIT compiled by Bartosz Konieczny, November 2022.
- Wildcard path and partitions by Bartosz Konieczny, November 2022.
- What's new in Apache Spark 3.3 - Data Source V2 by Bartosz Konieczny, July 2022.
- What's new in Apache Spark 3.3 - new functions by Bartosz Konieczny, June 2022.
- What's new in Apache Spark 3.3 - joins by Bartosz Konieczny, June 2022.
- Reconciling Spark APIs for Scala by Michal Palka, September 2022.
- MungingData Blog posts :
- Convert streaming CSV data to Delta Lake with different latency requirements, June 2022.
- Registering Native Spark Functions, May 2021.
- Exploring DataFrames with summary and describe, April 2021.
- Scala Spark vs Python PySpark: Which is better?, February 2021.
- Install Hadoop 3.2.1 on Windows 10 Step by Step Guide by Raymond, October 2021.
- Build Your Own WinUtils for Spark by Nigel Meakins, April 2021.
- CERN Database Blog :
- Performance Comparison of 5 JDKs on Apache Spark by Luca Canali, November 2023.
- Making histograms with Apache Spark and other SQL engines by Luca Canali, May 2022.
- Apache Spark 3.0 Memory Monitoring Improvements by Luca Canali, August 2020.
- SparkMeasure, a tool for performance troubleshooting of Apache Spark workloads by Luca Canali, August 2018.
- Apache Spark Blog by Perficient
- Spark DataFrame: Writing into Files by G.R. Baskaran, March 2024.
- Spark: Parser Modes by G.R. Baskaran, January 2024.
- It’s good that Spark Security is turned off by default by David Callaghan, January 2022.
- Take advantage of windows in your Spark data science pipeline by David Callaghan, May 2020.
- Introduction to Apache Spark's Core API (Part I) by Anil Agrawal, December 2018.
- How to Install Hadoop on Windows by Parixit Odedara, August 2018.
Books ▴
- Data Engineering with Scala and Spark by Eric Tome, Rupam Bhattacharjee and David Radford, January 2024.
(Packt, ISBN 978-1-80461-258-3, 300 pages) - Hands-on Guide on Apache Spark 3 by Alfonso A. Garcia, 2023.
(Apress, ISBN 978-1-4842-9379-9, 404 pages) - Modern Data Engineering with Apache Spark by Scott Haines, March 2022.
(Apress, ISBN 978-1-4842-7451-4, 585 pages) - Beginning Apache Spark 3 by Hien Luu, October 2021.
(Apress, ISBN 978-1-4842-7382-1, 438 pages) - Apache Spark : Invent the Future by Ernesto Lee, June 2021.
(ISBN 979-8-5257-0848-8, 482 pages) - Learning Spark (2nd Edition) by Jules S. Damji and al, July 2020.
(O'Reilly, 978-1-492-05004-9, 399 pages) - Spark in Action (2nd Edition) by Jean-Georges Perrin, May 2020.
(Manning, ISBN 978-1-6172-9552-2, 576 pages) - Apache Spark Quick Start Guide by Shrey Mehrotra and Akash Grade, January 2019.
(Packt, ISBN 978-1-7893-4910-8, 154 pages) - Practical Apache Spark by Subhashini Chellappan and Bharanitharan Ganesan, 2018.
(Apress, ISBN 978-1-4842-3651-2, 280 pages)
- Framing Apache Spark in life sciences by Andrea Manconi et al., Feburary 2023.
- Apache Spark: A Unified Engine for Big Data Processing by Matei Zaharia et al., November 2016.
(Communications of the ACM, 59(11):56-65) - Big data analytics on Apache Spark by Salman Salloum et al, October 2016.
(IJDSA'16 Proceedings, 1, pages 145–164) - Discretized streams: fault-tolerant streaming computation at scale by Matei Zaharia et al., November 2013.
(SOSP '13 Proceedings, pages 423–438)
Projects ▴
- Spark Job Server.
- Spark-Scala3 – Compile time encoder derivation for Scala 3.
- Sparkplug – A framework for creating composable and pluggable data processing pipelines.
doric
– Type-safe columns for Spark DataFrames.sbt-spark-submit
– a sbt plugin forspark-submit
.- Spark Configuration Optimization – online Spark configuration tool.
Tutorials ▴
- Learn Apache Spark by TutorialKart, 2023.
- Apache Spark Full Course by Edureka, 2023 (9 hours).
- Apache Spark Crash Course Mini-series by Hortonworks
- SparkBy{Examples}
- Quick introduction to Apache Spark by Melvin L, May 2016 (13 min).
- Top 5 Mistakes When Writing Spark Aplications by Mark Grover and Ted Malaska (DataBricks), February 2016.
- What is Apache Spark by Mike Olson ( Cloudera), September 2015.
- Advanced Apache Spark Training by Sameer Farooqui (Databricks), April 2015.
- 👍 A Deeper Understanding of Spark Internals by Aaron Davidson (Databricks), July 2014 (44 min).