28,99 €
In today's data-driven world, organizations across different sectors need scalable and efficient solutions for processing large volumes of data. Kubernetes offers an open-source and cost-effective platform for deploying and managing big data tools and workloads, ensuring optimal resource utilization and minimizing operational overhead. If you want to master the art of building and deploying big data solutions using Kubernetes, then this book is for you.
Written by an experienced data specialist, Big Data on Kubernetes takes you through the entire process of developing scalable and resilient data pipelines, with a focus on practical implementation. Starting with the basics, you’ll progress toward learning how to install Docker and run your first containerized applications. You’ll then explore Kubernetes architecture and understand its core components. This knowledge will pave the way for exploring a variety of essential tools for big data processing such as Apache Spark and Apache Airflow. You’ll also learn how to install and configure these tools on Kubernetes clusters. Throughout the book, you’ll gain hands-on experience building a complete big data stack on Kubernetes.
By the end of this Kubernetes book, you’ll be equipped with the skills and knowledge you need to tackle real-world big data challenges with confidence.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 344
Veröffentlichungsjahr: 2024
Big Data on Kubernetes
A practical guide to building efficient and scalable data solutions
Neylson Crepalde
Copyright © 2024 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Group Product Manager: Apeksha Shetty
Publishing Product Manager: Apeksha Shetty
Book Project Manager: Aparna Ravikumar Nair
Senior Editor: Sushma Reddy
Technical Editor: Kavyashree K S
Copy Editor: Safis Editing
Proofreader: Sushma Reddy
Indexer: Subalakshmi Govindhan
Production Designer: Gokul Raj S T
DevRel Marketing Executive: Nivedita Singh
First published: July 2024
Production reference: 1210624
Published by Packt Publishing Ltd.
Grosvenor House
11 St Paul’s Square
Birmingham
B3 1RB, UK
ISBN: 978-1-83546-214-0
www.packtpub.com
To my wife, Sarah, and my son, Joao Pedro, for their love and support. To Silvio Salej Higgins for being a great mentor.
– Neylson Crepalde
Neylson Crepalde is a generative AI strategist at Amazon Web Services (AWS). Before this, Neylson was chief technology officer at A3Data, a consulting business focused on data, analytics, and artificial intelligence. In his time as CTO, he worked with the company’s tech team to build a Big Data architecture on top of Kubernetes that inspired the writing of this book. Neylson holds a PhD in economic sociology, and he was a visiting scholar at the Centre des Sociologies des Organisations at Sciences Po, Paris. Neylson is also a frequent guest speaker at conferences and has taught MBA programs for more than 10 years.
I want to thank all the people who have worked with me in the development of this great architecture, especially Mayla Teixeira and Marcus Oliveira for their outstanding contributions.
Thariq Mahmood has 16 years of experience in data technology and possesses a strong skillset in Kubernetes, Big Data, data engineering, and DevOps on the public cloud, the private cloud, and on-premise environments. He has expertise in data warehousing, data modeling, and data security. He actively contributes to projects on Git and has experience setting up batch and streaming pipelines for various production environments, using Databricks, Hadoop, Spark, Flink, and other cloud-native tools from AWS, Azure, and GCP. Also, he implemented MLOps and DevSecOps in numerous projects. He currently works on helping organizations optimize their Big Data infrastructure costs and implementing data-lake and one-lake architectures within Kubernetes.
In this part, you will learn about the fundamentals of containerization and Kubernetes. You will start by understanding the basics of containers and how to build and run Docker images. This will provide you with a solid foundation for working with containerized applications. Next, you will dive into the Kubernetes architecture, exploring its components, features, and core concepts such as pods, deployments, and services. With this knowledge, you will be well equipped to navigate the Kubernetes ecosystem. Finally, you will get hands-on experience by deploying local and cloud-based Kubernetes clusters and then deploying applications you built earlier onto these clusters.
This part contains the following chapters:
Chapter 1, Getting Started with ContainersChapter 2, Kubernetes ArchitectureChapter 3, Kubernetes – Hands On