Apache Spark 2.x Cookbook
內容描述
Key Features
Contains recipes on solving real-time data-processing problems with Apache Spark
Utilize core Spark modules such as Spark SQL, Spark MLlib, Spark Streaming, and GraphX processing
A practical guide to help you master Apache Spark as your single big data computing platform
Book Description
While Apache Spark 1.x gained lot of traction and adoption in the early years, Spark 2.0 delivers very notable improvements in the areas of API, Performance, Structured Streaming, and simplifying building blocks to build better, faster, smarter, and accessible big data applications. This book uncovers all these features in the form of structured recipes to analyze and mature large and complex sets of data.
Starting with installing and configuring Apache Spark with various cluster managers, you will learn to set up development environments. Furthermore, you will be introduced to working with RDD's, Data Frames to operate on data with schemas, and real-time streaming with various sources such as Twitter Stream and Apache Kafka. You will also work through recipes on machine learning, including supervised learning, unsupervised learning, recommendation engines, deep learning algorithms, and GPU implementations on Spark.
Last but not the least, the final few chapters will help you delve more deeply into the concepts of graph processing using GraphX, securing your implementations, cluster optimization, and troubleshooting.
What you will learn
Install and configure Apache Spark with various cluster managers
Set up a development environment for Apache Spark
Learn to operate on data in Spark with schemas
Get to grips with real-time streaming analytics using Spark Streaming
Master supervised learning and unsupervised learning using MLlib
Build a recommendation engine using MLlib
Use Tensorframes to manipulate Spark's DataFrames with TensorFlow programs for deep learning
Develop a set of common applications or project types, and solutions that solve complex big data problems