32,39 €
Site reliability engineering is all about continuous improvement, finding the balance between business and product demands while working within technological limitations to drive higher revenue. But quantifying and understanding reliability, handling resources, and meeting developer requirements can sometimes be overwhelming. With a focus on reliability from an infrastructure and coding perspective, Becoming a Rockstar SRE brings forth the site reliability engineer (SRE) persona using real-world examples.
This book will acquaint you the role of an SRE, followed by the why and how of site reliability engineering. It walks you through the jobs of an SRE, from the automation of CI/CD pipelines and reducing toil to reliability best practices. You’ll learn what creates bad code and how to circumvent it with reliable design and patterns. The book also guides you through interacting and negotiating with businesses and vendors on various technical matters and exploring observability, outages, and why and how to craft an excellent runbook. Finally, you’ll learn how to elevate your site reliability engineering career, including certifications and interview tips and questions.
By the end of this book, you’ll be able to identify and measure reliability, reduce downtime, troubleshoot outages, and enhance productivity to become a true rockstar SRE!
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 630
Veröffentlichungsjahr: 2023
Electrify your site reliability engineering mindset to build reliable, resilient, and efficient systems
Jeremy Proffitt
Rod Anami
BIRMINGHAM—MUMBAI
Copyright © 2023 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Group Product Manager: Mohd Riyan Khan
Publishing Product Manager: Surbhi Suman
Senior Editor: Romy Dias
Technical Editor: Shruthi Shetty
Copy Editor: Safis Editing
Project Coordinator: Ashwin Kharwa
Proofreader: Safis Editing
Indexer: Tejal Daruwale Soni
Production Designer: Alishon Mendonca
Marketing Coordinator: Agnes D’souza
First published: March 2023
Production reference: 1290323
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-80323-922-4
www.packtpub.com
For my wonderful wife, who still likes me after 18 years. I like you too.
– Jeremy Proffitt
To my God, wife Tati, and son Gabe.
– Rod Anami
Jeremy Proffitt (born January 1977) is obsessed with constantly improving systems and solving problems with an unmatched sense of urgency – the definition of a Site Reliability Engineer (SRE). A master of solutions and technological knowledge, Jeremy is a rockstar SRE with AWS professional certifications in Architecture and DevOps – and has routinely saved millions in potential lost revenue in his career. In his free time, Jeremy enjoys spending time in his rockstar-appropriate technology cave and loves venturing into 3D printing, electronics, and Internet of Things (IoT) projects. By day, Jeremy currently manages a team of top SRE and DevOps talent driving constant improvement and is often cited in the company as a visionary in terms of observability and emergency response.
To the leaders who have helped me see the truth in our work and friends who have stood by and given me the encouragement to follow the wonders of technology, often while in awe of their own work, I say thank you! To my arch-enemies, you have been a wonderful addition that has always challenged me to become better. And finally, to my wife, Jamie, who I still desperately love after 18 years – and mind you, still likes me – I still remember our first date when you took my arm, you stole my heart, and in all our years, I’ve never felt you let go once.
Rod Anami is a seasoned engineer who works with cloud infrastructure and software engineering technologies. As one of the SREs at the Kyndryl CoE, he coaches other SREs on running IT modernization, transformation, and automation projects for clients worldwide. Rod leads the global SRE guild inside Kyndryl, where he helps plant and grow SRE chapters in many countries. Rod is certified as an SRE, technical specialist, and DevOps engineer professional at the ultimate level. He holds AWS, HashiCorp, Azure, and Kubernetes certifications, among many others. He is passionate about contributing to open source software at large with Node.js libraries.
I want to thank my wonderful wife, Tatiana, and my beloved son Gabriel, for giving me the space and support needed to write this book. My parents, Shizuo and Rita, for raising me with solid character. The Google site reliability engineering organization made this fantastic approach and profession open source. I want to thank Kyndryl for backing me on this journey. I had many bosses and leaders, good, bad, and inspiring ones. I want to mention a few who impacted my career immensely by helping me acquire the skills and knowledge for this book: Marcos Cimmino, Tara Sims, Andy Barnes, and Gene Brown. Nothing great is accomplished alone: it requires effort, endurance, enjoyment, colleagues, and God.
Chris Smith is a strategic IT leader with a proven track record across the financial service industry. His passion is to lead organization-wide transformational efforts for Fortune 500 institutions within digital and contact center technology and operations. He is skilled at driving agile adoption, building an engineering-first mindset, and facilitating cloud modernization of core banking services at scale.
Itohanoghosa Eregie is the founder of techinanutshellhack, a platform dedicated to explaining technology concepts with short video clips about cloud and site SRE concepts in their simplest form via LinkedIn. She worked as a software developer at Cyberspace Limited before finding her passion as a platform engineer, which earned her an opportunity to work with Dell EMC as a resident platform engineer for one of Africa’s largest telecommunications companies, MTN Nigeria, as a platform engineer. Altoros Americas currently employs her as a VMware Tanzu engineer, involved in customer engagement. Itohan is passionate about building resilient systems in the cloud and ensuring organizations adhere to SRE practices.
Brannen Taylor has almost 30 years of experience in corporate IT from the healthcare, managed services, power, hosted DR, and financial services industries. He has worked with small “mom-and-pop” operations up to ITIL-heavy Fortune 10 companies. He was a network engineer for 20 years and has been a network operations manager for the past 2 years. He has certifications from many vendors such as Nortel, Cisco, and Palo Alto, as well as a few that are vendor-agnostic, many cloud certifications from AWS and Azure, and is now moving into Network DevOps (NetDevOps), focusing on Nautobot, Ansible, and various vendor SDKs. He enjoys scuba diving with his wife and friends and has two grown children.
I would like to thank God for leading me into a career that I love. I want to thank my children for only eye-rolling me a little when I launch into an explanation about binary when they ask me how email works. I want to thank my wife Lara for putting up with me being on call these past 23 years, working unexpectedly long days, nights, and weekends, and non-stop studying. Thank you to my colleagues and the friends I’ve made along the way.
Gene Brown is the Vice President and a Distinguished Engineer at Kyndryl. He leads the SRE profession and certification program and is the global site reliability engineering leader. He is responsible for driving the enablement of SREs across Kyndryl’s countries, practices, and strategic markets through a Center of Excellence with SRE chapter leaders across the services organization globally.
Gene enjoys spending time with clients interested in adopting SRE and likes comparing notes on what has worked well and how to overcome the challenges that come with cultural change. Gene was the co-founder of IBM’s and Kyndryl’s SRE profession with a focus on certifying SREs based on their applied experience in the field of site reliability engineering.
In this first part, you will learn about site reliability engineering, its roots, and current usage outside Google. We emphasize how the site reliability engineer (SRE) persona is the center of gravity of everything orbiting systems reliability. When we talk about site reliability engineering, it’s impossible to do so without a discussion about the business of software development, which we tie into not only statistics used for reliability but how those impact what companies are ultimately interested in, customer satisfaction and revenue. Finally, we’ll explore why the lack of reliability persists in organizations and discuss some of the lesser known truths that make site reliability engineering critical and complex.
The following chapters will be covered in this section:
Chapter 1, SRE Job Role – Activities and ResponsibilitiesChapter 2, Fundamental Numbers – Reliability StatisticsChapter 3, Imperfect Habits – Duct Tape Architecture and Spaghetti Code