Stream Processing with Apache Spark: Best Practices for Scaling and Optimizing Apache Spark

Stream Processing with Apache Spark: Best Practices for Scaling and Optimizing Apache Spark

作者: Francois Garillot Gerard Maas
出版社: O'Reilly
出版在: 2019-07-02
ISBN-13: 9781491944240
ISBN-10: 1491944242
裝訂格式: Paperback
總頁數: 452 頁





內容描述


To build analytics tools that provide faster insights, knowing how to process data in real time is a must, and moving from batch processing to stream processing is absolutely required. Fortunately, the Spark in-memory framework/platform for processing data has added an extension devoted to fault-tolerant stream processing: Spark Streaming.
If you're familiar with Apache Spark and want to learn how to implement it for streaming jobs, this practical book is a must.

Understand how Spark Streaming fits in the big picture
Learn core concepts such as Spark RDDs, Spark Streaming clusters, and the fundamentals of a DStream
Discover how to create a robust deployment
Dive into streaming algorithmics
Learn how to tune, measure, and monitor Spark Streaming


作者介紹


Gerard Maas is a Principal Engineer at Lightbend, where he works on the seamless integration of Structured Streaming and other scalable stream processing technologies into the Lightbend Platform. Previously, he worked at a cloud-native IoT startup, where he led the data processing team on building the streaming pipelines that pushed Spark Streaming to its limits in terms of throughput. Back then, he published the first comprehensive guide to tune Spark Streaming performance.

Gerard has held leading roles at several startups and large enterprises, building data science governance, cloud-native IoT platforms, telecom platforms, and scalable APIs. He is a regular speaker at technology conferences and contributes to small and large open source projects. Gerard has a degree in Computer Engineering from the Simón Bolívar University, Venezuela. You can find him on twitter as @maasg.
François Garillot is based in Seattle, where he works on distributed computing at Facebook. He received a Ph.D. from École Polytechnique in 2011, and worked on Spark Streaming's back-pressure while working at Lightbend in 2015. His interests include type systems, leveraging programming languages to make analytics simpler to express, and a passion for Scala, Spark, and roasted arabica. When not at work, he can be found enjoying the mountains of the Pacific Northwest.




相關書籍

管理大數據 RBD (從CI\BI到AI)

作者 中源數聚(北京)信息科技有限公司

2019-07-02

Serverless GraphQL APIs with Amazon's AWS AppSync (API-University Series) (Volume 8)

作者 Matthias Biehl

2019-07-02

Getting Started with Varnish Cache: Accelerate Your Web Applications

作者 Thijs Feryn

2019-07-02