Apache Spark is a powerful platform that provides users with new ways to store and make use of big data. In this course, get up to speed with Spark, and discover how to leverage this popular processing engine to deliver effective and comprehensive insights into your data. Instructor Ben Sullins provides an overview of the platform, going into the different components that make up Apache Spark. He shows how to analyze data in Spark using PySpark and Spark SQL, explores running machine learning algorithms using MLib, demonstrates how to create a streaming analytics application using Spark Streaming, and more.
Topics Include:
- Understanding Spark
- Reviewing Spark components
- Where Spark shines
- Understanding data interfaces
- Working with text files
- Loading CSV data into DataFrames
- Using Spark SQL to analyze data
- Running machine learning algorithms using MLib
- Querying streaming data
- Connecting BI tools to Spark
Duration:
1h 27m
Now available on LinkedIn Learning:
www.linkedin.com/learning/apache-spark-essential-training
and on Lynda.com:
www.lynda.com/Apache-Spark-tutorials/Apache-Spark-Essential-Training/550568-2.html