34,79 €
Apache Beam is an open source unified programming model for implementing and executing data processing pipelines, including Extract, Transform, and Load (ETL), batch, and stream processing.
This book will help you to confidently build data processing pipelines with Apache Beam. You’ll start with an overview of Apache Beam and understand how to use it to implement basic pipelines. You’ll also learn how to test and run the pipelines efficiently. As you progress, you’ll explore how to structure your code for reusability and also use various Domain Specific Languages (DSLs). Later chapters will show you how to use schemas and query your data using (streaming) SQL. Finally, you’ll understand advanced Apache Beam concepts, such as implementing your own I/O connectors.
By the end of this book, you’ll have gained a deep understanding of the Apache Beam model and be able to apply it to solve problems.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 407
Veröffentlichungsjahr: 2022
Use a single programming model for both batch and stream data processing
Jan Lukavský
BIRMINGHAM—MUMBAI
Copyright © 2022 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Publishing Product Manager: Reshma Raman
Senior Editor: David Sugarman
Content Development Editor: Nathanya Dias
Technical Editor: Devanshi Ayare
Copy Editor: Safis Editing
Project Coordinator: Aparna Ravikumar Nair
Proofreader: Safis Editing
Indexer: Sejal Dsilva
Production Designer: Ponraj Dhandapani
Marketing Coordinator: Priyanka Mhatre
First published: January 2022
Production reference: 1161221
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-80056-493-0
www.packt.com
Jan Lukavský is a freelance big data architect and engineer who is also a committer of Apache Beam. He is a certified Apache Hadoop professional. He is working on open source big data systems combining batch and streaming data pipelines in a unified model, enabling the rise of real-time, data-driven applications.
I want to thank my family for all their support and patience, especially my wife, Pavla, and my children.
Marcelo Henrique Neppel currently works as a software engineer at Canonical, interacting with technologies including Kubernetes and Juju. Previously, he worked at a big data company coordinating two teams and developing pipelines for projects using Apache Beam, and also at BPlus Tecnologia, working with databases and integrations.
I would like to thank Packt for giving me the opportunity to contribute to this excellent book. I would like to thank my wife, Janaina, and my family, for always supporting me, and also Gabriel Verani, who introduced me to Apache Beam.
This section represents a general introduction to how most streaming data processing systems work, what the general properties of data streams are, and what problems are needed to be solved for computational correctness and for balancing throughput and latency in the context of Apache Beam. This section also covers how pipelines are implemented, tested, and run.
This section comprises the following chapters:
Chapter 1, Introduction to Data Processing with Apache BeamChapter 2, Implementing, Testing, and Deploying Basic PipelinesChapter 3, Implementing Pipelines Using Stateful Processing