29,99 €
Cloud-Native Observability with OpenTelemetry is a guide to helping you look for answers to questions about your applications. This book teaches you how to produce telemetry from your applications using an open standard to retain control of data. OpenTelemetry provides the tools necessary for you to gain visibility into the performance of your services. It allows you to instrument your application code through vendor-neutral APIs, libraries and tools.
By reading Cloud-Native Observability with OpenTelemetry, you’ll learn about the concepts and signals of OpenTelemetry - traces, metrics, and logs. You’ll practice producing telemetry for these signals by configuring and instrumenting a distributed cloud-native application using the OpenTelemetry API. The book also guides you through deploying the collector, as well as telemetry backends necessary to help you understand what to do with the data once it's emitted. You’ll look at various examples of how to identify application performance issues through telemetry. By analyzing telemetry, you’ll also be able to better understand how an observable application can improve the software development life cycle.
By the end of this book, you’ll be well-versed with OpenTelemetry, be able to instrument services using the OpenTelemetry API to produce distributed traces, metrics and logs, and more.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 366
Veröffentlichungsjahr: 2022
Learn to gain visibility into systems by combining tracing, metrics, and logging with OpenTelemetry
Alex Boten
BIRMINGHAM—MUMBAI
Copyright © 2022 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Group Product Manager: Rahul Nair
Publishing Product Manager: Shrilekha Malpani
Senior Editor: Arun Nadar
Content Development Editor: Sujata Tripathi
Technical Editor: Rajat Sharma
Copy Editor: Safis Editing
Project Coordinator: Shagun Saini
Proofreader: Safis Editing
Indexer: Pratik Shirodkar
Production Designer: Ponraj Dhandapani
Marketing Coordinator: Nimisha Dua
First published: April 2022
Production reference: 1140422
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-80107-770-5
www.packt.com
To my mother, sister, and father. Thank you for teaching me to persevere in the face of adversity, always be curious, and work hard.
It has never been a better time to be a software engineer.
As engineers, we are motivated by impact and efficiency—and who can argue that both are not skyrocketing, particularly in comparison with time spent and energy invested?
These days, you can build out a scalable, elastic, distributed system to serve your code to millions of users per day with a few clicks—without ever having to personally understand much about operations or architecture. You can write lambda functions or serverless code, hit save, and begin serving them to users immediately.
It feels like having superpowers, especially for those of us who remember the laborious times before. Every year brings more powerful APIs and higher-level abstractions – many, many infinitely complex systems that "just work" at the click of a button or the press of a key.
But when it doesn't "just work," it has gotten harder than ever to untangle the reasons and understand why.
Superpowers don't come for free it turns out. The winds of change may be sweeping us all briskly out toward a sea of ever-expanding options, infinite flexibility, automated resiliency, and even cost-effectiveness, but these glories have come at the price of complexity—skyrocketing, relentlessly compounding complexity and the cognitive overload that comes with it.
Systems no longer fail in predictable ways. Static dashboards are no longer a viable tool for understanding your systems. And though better tools will help, digging ourselves out of this hole is not merely an issue of switching from one tool to another. We need to rethink the way software gets built, shipped, and maintained, to be production-focused from day 1.
For far too long now, we have been building and shipping software in the dark. Software engineers act like all they need to do is write tests and make sure their code passes. While tests are important, all they can really do is validate the logic of your code and increase your confidence that you have not introduced any serious regressions. Operations engineers, meanwhile, rely on monitoring checks, but those are a blunt tool at best. Most bugs will never rise to the criticality of a paging alert, which means that as a system gets more mature and sophisticated, most issues will have to be found and reported by your users.
And this isn't just a problem of bugs, firefighting, or outages. This is about understanding your software in the wild—as your users run your code on your infrastructure, at a given time. Production remains far too much of a black box for too many people, who are then forced to try and reason about it by reading lines of code and using elaborate mental models.
Because we've all been shipping code blindly, all this time, we ship changes we don't fully understand to a production system that is a hairball of changes we've never truly understood. We've been shipping blindly for years and years now, leaving SRE teams and ops teams to poke at the black boxes and try to clean up the mess—all the while still blindfolded. The fact that anything has ever worked is a testament to the creativity and dedication of these teams.
A funny thing starts happening when people begin instrumenting their code for observability and inspecting it in production—regularly, after every deployment, as a habit. You find bugs everywhere, bugs you never knew existed. It's like picking up a rock and watching all the little nasties lurking underneath scuttle away from the light.
With monitoring tools and aggregates, we were always able to see that errors existed, but we had no way of correlating them to an event or figuring out what was different about the erroring requests. Now, all of a sudden, we are able to look at an error spike and say, "Ah! All of these errors are for requests coming from clients running app version 1.63, calling the /export endpoint, querying the primaries for mysql-shard3, shard5, and shard7, with a payload of over 10 KB, and timing out after 15 seconds." Or we can pull up a trace and see that one of the erroring requests was issuing thousands of serial database queries in a row. So many gnarly bugs and opaque behaviors become shallow once you can visualize them. It's the most satisfying experience in the world.
But yes, you do have to instrument your code. (Auto-instrumentation is about as effective as automated code commenting.) So let's talk about that.
I can hear you now—"Ugh, instrumentation!" Most people would rather get bitten by a rattlesnake than refactor their logging and instrumentation code. I know this, and so does every vendor under the sun. This is why even legacy logging companies are practically printing money. Once they get your data flowing in, it takes an act of God to move it or turn it off.
This is a big part of the reason we, as an industry, are so behind when it comes to public, reusable standards and tooling for instrumentation and observability, which is why I am so delighted to participate in the push for OpenTelemetry. Yes, it's in the clumsy toddler years of technological advancement. But it will get better. It has gotten better. I was cynical about OTel in the early days, but the community excitement and uptake have exceeded my expectations at every step. As well it should. Because the promise of OpenTelemetry is that you may need to instrument your code once, but only once. And then you can move from vendor to vendor without re-instrumenting.
This means vendors will have to compete for your business on features, usability, and cost-effectiveness, instead of vendor lock-in. OTel has the potential to finally break this stranglehold—to make it so you only instrument once, and you can move from vendor to vendor with just a few lines of configuration changes. This is brilliant—this changes everything. This is one battle you should absolutely join and fight.
Software systems aren't going to get simpler anytime soon. Yet the job of developing and maintaining software may paradoxically be poised to get faster and easier, by forcing us to finally adopt better real-time instrumentation and telemetry. Going from monitoring to observability is like the difference between visual flight rating (VFR) and instrument flight rating (IFR) for pilots. Yeah, learning to fly (or code) by instrumentation feels a little strange at first, but once you master it, you can fly so much faster, farther, and more safely than ever before.
It's not just about observability. There are lots of dovetailing trends in tech right now—feature flags, chaos engineering, progressive deployment, and so on—all of which center production, and focus on shrinking the distance and tightening the feedback loops between dev and prod. Together they deliver compounding benefits that help teams move swiftly and safely, devoting more of their time to solving new and interesting puzzles that move the business forward, and less time to toil and yak shaving.
It's not just about observability... but it starts with observability. The ability to see what is happening is the most important feedback loop of all.
And observability starts with instrumentation.
So, here we go.
Charity Majors
CTO, Honeycomb
Alex Boten is a senior staff software engineer at Lightstep and has spent the last 10 years helping organizations adapt to a cloud-native landscape. From building core network infrastructure to mobile client applications and everything in between, Alex has first-hand knowledge of how complex troubleshooting distributed applications is.
This led him to the domain of observability and contributing to open source projects in the space. A contributor, approver, and maintainer in several aspects of OpenTelemetry, Alex has helped evolve the project from its early days in 2019 into the massive community effort that it is today.
More than anything, Alex loves making sense of the technology around us and sharing his learnings with others.
Yuri Grinshteyn strongly believes that reliability is a key feature of any service and works to advocate for site reliability engineering principles and practices. He graduated from Tufts University with a degree in computer engineering and has worked in monitoring, diagnostics, observability, and reliability throughout his career. Currently, he is a site reliability engineer at Google Cloud, where he works with customers to help them achieve appropriate reliability for their services; previously, he worked at Oracle, Compuware, Hitachi Consulting, and Empirix. You can find his work on YouTube, Medium, and GitHub. He and his family live just outside of San Francisco and love taking advantage of everything California has to offer.
In this part, you will learn about the origin of OpenTelemetry and why it was needed. We will then dive into the various components and concepts of OpenTelemetry.
This part of the book comprises the following chapters:
Chapter 1, The History and Concepts of ObservabilityChapter 2, OpenTelemetry Signals: Traces, Metrics, and LogsChapter 3, Auto-Instrumentation