Advanced Training Tutorials About RDD and Spark Internals
- Spark Summit EAST 2015, March 18-19, 2015, Spark Version 1.3
- (Recommand) Advanced Apache Spark – Sameer Farooqui
Slide | Video
- (Recommand) Advanced Apache Spark – Sameer Farooqui
- Spark Summit 2015, June 15-17, 2015, Spark Version 1.4
- Spark Summit 2014, June 30- July 2, 2014, Spark Version 1.1
- Advanced Spark Internals and Tuning – Reynold Xin
Slides | Video - Spark SQL – Michael Armbrust
Slides | Video
- Advanced Spark Internals and Tuning – Reynold Xin
Spark SQL, DataFrame, DataSet And Tungsten
Spark DataFrames: Simple and Fast Analysis of Structured Data – Michael Armbrust
Slide | VideoStructuring Spark: DataFrames, Datasets, and Streaming – Michael Armbrust
Slide | VideoFrom DataFrames to Tungsten: A Peek into Spark’s Future – Reynold Xin
Slide | VideoDeep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal – Josh Rosen
Slide | Video
Spark Research Papers
Zaharia, Matei, et al. “Spark: Cluster Computing with Working Sets“. HotCloud 10 (2010): 10-10.
Zaharia, Matei, et al. “Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing“. Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. USENIX Association, 2012.
Armbrust, Michael, et al. “Spark SQL: Relational Data Processing in Spark“. Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. ACM, 2015.
Zaharia, Matei, et al. “Discretized Streams: Fault-Tolerant Streaming Computation at Scale“. Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. ACM, 2013.
Xin, Reynold S., et al. “GraphX: A Resilient Distributed Graph System on Spark“. First International Workshop on Graph Data Management Experiences and Systems. ACM, 2013.