Cloud-Native Observability with OpenTelemetry - Alex Boten - E-Book

Cloud-Native Observability with OpenTelemetry E-Book

Alex Boten

0,0
29,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

Cloud-Native Observability with OpenTelemetry is a guide to helping you look for answers to questions about your applications. This book teaches you how to produce telemetry from your applications using an open standard to retain control of data. OpenTelemetry provides the tools necessary for you to gain visibility into the performance of your services. It allows you to instrument your application code through vendor-neutral APIs, libraries and tools.
By reading Cloud-Native Observability with OpenTelemetry, you’ll learn about the concepts and signals of OpenTelemetry - traces, metrics, and logs. You’ll practice producing telemetry for these signals by configuring and instrumenting a distributed cloud-native application using the OpenTelemetry API. The book also guides you through deploying the collector, as well as telemetry backends necessary to help you understand what to do with the data once it's emitted. You’ll look at various examples of how to identify application performance issues through telemetry. By analyzing telemetry, you’ll also be able to better understand how an observable application can improve the software development life cycle.
By the end of this book, you’ll be well-versed with OpenTelemetry, be able to instrument services using the OpenTelemetry API to produce distributed traces, metrics and logs, and more.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 366

Veröffentlichungsjahr: 2022

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Cloud-Native Observability with OpenTelemetry

Learn to gain visibility into systems by combining tracing, metrics, and logging with OpenTelemetry

Alex Boten

BIRMINGHAM—MUMBAI

Cloud-Native Observability with OpenTelemetry

Copyright © 2022 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Group Product Manager: Rahul Nair

Publishing Product Manager: Shrilekha Malpani

Senior Editor: Arun Nadar

Content Development Editor: Sujata Tripathi

Technical Editor: Rajat Sharma

Copy Editor: Safis Editing

Project Coordinator: Shagun Saini

Proofreader: Safis Editing

Indexer: Pratik Shirodkar

Production Designer: Ponraj Dhandapani

Marketing Coordinator: Nimisha Dua

First published: April 2022

Production reference: 1140422

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-80107-770-5

www.packt.com

To my mother, sister, and father. Thank you for teaching me to persevere in the face of adversity, always be curious, and work hard.

Foreword

It has never been a better time to be a software engineer.

As engineers, we are motivated by impact and efficiency—and who can argue that both are not skyrocketing, particularly in comparison with time spent and energy invested?

These days, you can build out a scalable, elastic, distributed system to serve your code to millions of users per day with a few clicks—without ever having to personally understand much about operations or architecture. You can write lambda functions or serverless code, hit save, and begin serving them to users immediately. 

It feels like having superpowers, especially for those of us who remember the laborious times before. Every year brings more powerful APIs and higher-level abstractions – many, many infinitely complex systems that "just work" at the click of a button or the press of a key. 

But when it doesn't "just work," it has gotten harder than ever to untangle the reasons and understand why.

Superpowers don't come for free it turns out. The winds of change may be sweeping us all briskly out toward a sea of ever-expanding options, infinite flexibility, automated resiliency, and even cost-effectiveness, but these glories have come at the price of complexity—skyrocketing, relentlessly compounding complexity and the cognitive overload that comes with it.

Systems no longer fail in predictable ways. Static dashboards are no longer a viable tool for understanding your systems. And though better tools will help, digging ourselves out of this hole is not merely an issue of switching from one tool to another. We need to rethink the way software gets built, shipped, and maintained, to be production-focused from day 1.

For far too long now, we have been building and shipping software in the dark. Software engineers act like all they need to do is write tests and make sure their code passes. While tests are important, all they can really do is validate the logic of your code and increase your confidence that you have not introduced any serious regressions. Operations engineers, meanwhile, rely on monitoring checks, but those are a blunt tool at best. Most bugs will never rise to the criticality of a paging alert, which means that as a system gets more mature and sophisticated, most issues will have to be found and reported by your users. 

And this isn't just a problem of bugs, firefighting, or outages. This is about understanding your software in the wild—as your users run your code on your infrastructure, at a given time. Production remains far too much of a black box for too many people, who are then forced to try and reason about it by reading lines of code and using elaborate mental models.

Because we've all been shipping code blindly, all this time, we ship changes we don't fully understand to a production system that is a hairball of changes we've never truly understood. We've been shipping blindly for years and years now, leaving SRE teams and ops teams to poke at the black boxes and try to clean up the mess—all the while still blindfolded. The fact that anything has ever worked is a testament to the creativity and dedication of these teams.

A funny thing starts happening when people begin instrumenting their code for observability and inspecting it in production—regularly, after every deployment, as a habit. You find bugs everywhere, bugs you never knew existed. It's like picking up a rock and watching all the little nasties lurking underneath scuttle away from the light. 

With monitoring tools and aggregates, we were always able to see that errors existed, but we had no way of correlating them to an event or figuring out what was different about the erroring requests. Now, all of a sudden, we are able to look at an error spike and say, "Ah! All of these errors are for requests coming from clients running app version 1.63, calling the /export endpoint, querying the primaries for mysql-shard3, shard5, and shard7, with a payload of over 10 KB, and timing out after 15 seconds." Or we can pull up a trace and see that one of the erroring requests was issuing thousands of serial database queries in a row. So many gnarly bugs and opaque behaviors become shallow once you can visualize them. It's the most satisfying experience in the world.

But yes, you do have to instrument your code. (Auto-instrumentation is about as effective as automated code commenting.) So let's talk about that.

I can hear you now—"Ugh, instrumentation!" Most people would rather get bitten by a rattlesnake than refactor their logging and instrumentation code. I know this, and so does every vendor under the sun. This is why even legacy logging companies are practically printing money. Once they get your data flowing in, it takes an act of God to move it or turn it off.

This is a big part of the reason we, as an industry, are so behind when it comes to public, reusable standards and tooling for instrumentation and observability, which is why I am so delighted to participate in the push for OpenTelemetry. Yes, it's in the clumsy toddler years of technological advancement. But it will get better. It has gotten better. I was cynical about OTel in the early days, but the community excitement and uptake have exceeded my expectations at every step. As well it should. Because the promise of OpenTelemetry is that you may need to instrument your code once, but only once. And then you can move from vendor to vendor without re-instrumenting.

This means vendors will have to compete for your business on features, usability, and cost-effectiveness, instead of vendor lock-in. OTel has the potential to finally break this stranglehold—to make it so you only instrument once, and you can move from vendor to vendor with just a few lines of configuration changes. This is brilliant—this changes everything. This is one battle you should absolutely join and fight.

Software systems aren't going to get simpler anytime soon. Yet the job of developing and maintaining software may paradoxically be poised to get faster and easier, by forcing us to finally adopt better real-time instrumentation and telemetry. Going from monitoring to observability is like the difference between visual flight rating (VFR) and instrument flight rating (IFR) for pilots. Yeah, learning to fly (or code) by instrumentation feels a little strange at first, but once you master it, you can fly so much faster, farther, and more safely than ever before.

It's not just about observability. There are lots of dovetailing trends in tech right now—feature flags, chaos engineering, progressive deployment, and so on—all of which center production, and focus on shrinking the distance and tightening the feedback loops between dev and prod. Together they deliver compounding benefits that help teams move swiftly and safely, devoting more of their time to solving new and interesting puzzles that move the business forward, and less time to toil and yak shaving.

It's not just about observability... but it starts with observability. The ability to see what is happening is the most important feedback loop of all.

And observability starts with instrumentation.

So, here we go.

Charity Majors

CTO, Honeycomb

Contributors

About the author

Alex Boten is a senior staff software engineer at Lightstep and has spent the last 10 years helping organizations adapt to a cloud-native landscape. From building core network infrastructure to mobile client applications and everything in between, Alex has first-hand knowledge of how complex troubleshooting distributed applications is.

This led him to the domain of observability and contributing to open source projects in the space. A contributor, approver, and maintainer in several aspects of OpenTelemetry, Alex has helped evolve the project from its early days in 2019 into the massive community effort that it is today.

More than anything, Alex loves making sense of the technology around us and sharing his learnings with others.

About the reviewer

Yuri Grinshteyn strongly believes that reliability is a key feature of any service and works to advocate for site reliability engineering principles and practices. He graduated from Tufts University with a degree in computer engineering and has worked in monitoring, diagnostics, observability, and reliability throughout his career. Currently, he is a site reliability engineer at Google Cloud, where he works with customers to help them achieve appropriate reliability for their services; previously, he worked at Oracle, Compuware, Hitachi Consulting, and Empirix. You can find his work on YouTube, Medium, and GitHub. He and his family live just outside of San Francisco and love taking advantage of everything California has to offer.

Table of Contents

Preface

Section 1: The Basics

Chapter 1: The History and Concepts of Observability

Understanding cloud-native applications

Looking at the shift to DevOps

Reviewing the history of observability

Centralized logging

Using metrics and dashboards

Applying tracing and analysis

Understanding the history of OpenTelemetry

OpenTracing

OpenCensus

Observability for cloud-native software

Understanding the concepts of OpenTelemetry

Signals

Pipelines

Resources

Context propagation

Summary

Chapter 2: OpenTelemetry Signals – Traces, Metrics, and Logs

Technical requirements

Traces

Anatomy of a trace

Details of a span

Additional considerations

Metrics

Anatomy of a metric

Data point types

Exemplars

Additional considerations

Logs

Anatomy of a log

Correlating logs

Additional considerations

Semantic conventions

Summary

Chapter 3: Auto-Instrumentation

Technical requirements

What is auto-instrumentation?

Challenges of manual instrumentation

Components of auto-instrumentation

Limits of auto-instrumentation

Bytecode manipulation

OpenTelemetry Java agent

Runtime hooks and monkey patching

Instrumenting libraries

The Instrumentor interface

Wrapper script

Summary

Section 2: Instrumenting an Application

Chapter 4: Distributed Tracing – Tracing Code Execution

Technical requirements

Configuring the tracing pipeline

Getting a tracer

Generating tracing data

The Context API

Span processors

Enriching the data

ResourceDetector

Span attributes

SpanKind

Propagating context

Additional propagator formats

Composite propagator

Recording events, exceptions, and status

Events

Exceptions

Status

Summary

Chapter 5: Metrics – Recording Measurements

Technical requirements

Configuring the metrics pipeline

Obtaining a meter

Push-based and pull-based exporting

Choosing the right OpenTelemetry instrument

Counter

Asynchronous counter

An up/down counter

Asynchronous up/down counter

Histogram

Asynchronous gauge

Duplicate instruments

Customizing metric outputs with views

Filtering

Dimensions

Aggregation

The grocery store

Number of requests

Request duration

Concurrent requests

Resource consumption

Summary

Chapter 6: Logging – Capturing Events

Technical requirements

Configuring OpenTelemetry logging

Producing logs

Using LogEmitter

The standard logging library

A logging signal in practice

Distributed tracing and logs

OpenTelemetry logging with Flask

Logging with WSGI middleware

Resource correlation

Summary

Chapter 7: Instrumentation Libraries

Technical requirements

Auto-instrumentation configuration

OpenTelemetry distribution

OpenTelemetry configurator

Environment variables

Command-line options

Requests library instrumentor

Additional configuration options

Manual invocation

Double instrumentation

Automatic configuration

Configuring resource attributes

Configuring traces

Configuring metrics

Configuring logs

Configuring propagation

Revisiting the grocery store

Legacy inventory

Grocery store

Shopper

Flask library instrumentor

Additional configuration options

Finding instrumentation libraries

OpenTelemetry registry

opentelemetry-bootstrap

Summary

Section 3: Using Telemetry Data

Chapter 8: OpenTelemetry Collector

Technical requirements

The purpose of OpenTelemetry Collector

Understanding the components of OpenTelemetry Collector

Receivers

Processors

Exporters

Extensions

Additional components

Transporting telemetry via OTLP

Encodings and protocols

Additional design considerations

Using OpenTelemetry Collector

Configuring the exporter

Configuring the collector

Modifying spans

Filtering metrics

Summary

Chapter 9: Deploying the Collector

Technical requirements

Collecting application telemetry

Deploying the sidecar

System-level telemetry

Deploying the agent

Connecting the sidecar and the agent

Adding resource attributes

Collector as a gateway

Autoscaling

OpenTelemetry Operator

Summary

Chapter 10: Configuring Backends

Technical requirements

Backend options for analyzing telemetry data

Tracing

Metrics

Logging

Running in production

High availability

Scalability

Data retention

Privacy regulations

Summary

Chapter 11: Diagnosing Problems

Technical requirements

Introducing a little chaos

Experiment #1 – increased latency

Experiment #2 – resource pressure

Experiment #3 – unexpected shutdown

Using telemetry first to answer questions

Summary

Chapter 12: Sampling

Technical requirements

Concepts of sampling across signals

Traces

Metrics

Logs

Sampling strategies

Samplers available

Sampling at the application level via the SDK

Using the OpenTelemetry Collector to sample data

Tail sampling processor

Summary

Other Books You May Enjoy

Section 1: The Basics

In this part, you will learn about the origin of OpenTelemetry and why it was needed. We will then dive into the various components and concepts of OpenTelemetry.

This part of the book comprises the following chapters:

Chapter 1, The History and Concepts of ObservabilityChapter 2, OpenTelemetry Signals: Traces, Metrics, and LogsChapter 3, Auto-Instrumentation