High Performance with MongoDB - Asya Kamsky - E-Book

High Performance with MongoDB E-Book

Asya Kamsky

0,0
53,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

With data as the new competitive edge, performance has become the need of the hour. As applications handle exponentially growing data and user demand for speed and reliability rises, three industry experts distill their decades of experience to offer you guidance on designing, building, and operating databases that deliver fast, scalable, and resilient experiences.
MongoDB’s document model and distributed architecture provide powerful tools for modern applications, but unlocking their full potential requires a deep understanding of architecture, operational patterns, and tuning best practices. This MongoDB book takes a hands-on approach to diagnosing common performance issues and applying proven optimization strategies from schema design and indexing to storage engine tuning and resource management.
Whether you’re optimizing a single replica set or scaling a sharded cluster, this book provides the tools to maximize deployment performance. Its modular chapters let you explore query optimization, connection management, and monitoring or follow a complete learning path to build a rock-solid performance foundation. With real-world case studies, code examples, and proven best practices, you’ll be ready to troubleshoot bottlenecks, scale efficiently, and keep MongoDB running at peak performance in even the most demanding production environments.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 556

Veröffentlichungsjahr: 2025

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



High Performance with MongoDB

Best practices for performance tuning, scaling, and architecture

Asya Kamsky, Ger Hartnett, Alex Bevilacqua

High Performance with MongoDB

Copyright © 2025 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Portfolio Director: Sunith Shetty

Relationship Lead: Sathya Mohan

Project Managers: Aniket Shetty & Sathya Mohan

Content Engineer: Siddhant Jain

Technical Editor: Aniket Shetty

Copy Editor: Safis Editing

Indexer: Hemangini Bari

Proofreader: Siddhant Jain

Production Designer: Deepak Chavan

First published: September 2025

Production reference: 1220825

Published by Packt Publishing Ltd.

Grosvenor House

11 St Paul’s Square

Birmingham

B3 1RB, UK.

ISBN 978-1-83702-263-2

www.packtpub.com

To Ben and Mark for their unwavering support

– Asya

To Luci, Clíodhna, Hannah, and Jane. Thank you for all of your support.

– Ger

Thanks to all my amazing friends and colleagues at MongoDB for helping me get the word out about how awesome MongoDB can be.

– Alex

Acknowledgements

We’d like to thank everyone who developed, documented, managed, and supported the MongoDB products over the years. In particular, we’d like to thank colleagues who made suggestions or put together internal content that contributed to this book, including Cesar Albuquerque de Godoy, Pierre Depretz, Dave Walker, Steve Wotring, Chris Harris, and Xiaochen Wu.

Contributors

About the authors

Asya Kamsky is a Principal KnowItAll at MongoDB, where she has worked since 2012. Before discovering MongoDB, she spent nearly two decades coaxing relational databases into doing what they were never designed to do at a series of startups, most of which no one remembers. Asya has an extensive background in software development, databases, and telling people what they’re doing wrong.

Ger Hartnett is a Lead Engineer on MongoDB’s product performance team, where he loves working on fascinating optimization and scaling challenges. Before MongoDB, he founded a startup that built a project communication platform (that had scaling issues). Prior to that, he architected and tuned embedded software at Intel, where he co-authored a book. Even before that, he developed systems at companies including Tellabs, Digital, and Motorola.

Alex Bevilacqua is the Lead Product Manager at MongoDB for Developer Experience. Prior to joining MongoDB in 2018, he worked as a software engineer and systems architect, implementing solutions in a number of languages, technologies, and frameworks. Aside from his passion for programming that consumes more than just his working hours, Alex can typically be found at an arena with one of his kids, both of whom play rep hockey.

About the reviewers

Keith Smith is a systems and storage expert with decades of experience in operating systems and data infrastructure. Based in the U.S., he has contributed to innovations at companies like Sun Microsystems, NetApp, and MongoDB. Keith started his career designing real-time UNIX systems and went on to build extensible operating systems at Harvard, where he earned his PhD. His work has spanned object-based storage, metadata catalogs, and file system architectures. Since 2020, he has led storage engine development at MongoDB. Keith is passionate about scalable systems and continues to push the boundaries of data storage technology.

John Page is a full-stack engineer who joined MongoDB in 2013. Since then, he has held senior technical roles focused on customer success across all phases of the development lifecycle. Before MongoDB, he spent 18 years building document database systems for law enforcement and the U.S. military. Trained as a geologist, John can find silicon in the wild, build a computer from raw components, and write everything from operating systems and databases to front ends and documentation.

James Kovacs is a director of engineering in Database Experience at MongoDB. He focuses on client libraries, object-document mappers, framework integrations, and AI/ML. He has over 25 years of professional software engineering experience, having worked in a wide variety of companies and industries over that time. When not in front of a keyboard, you can find him playing board games/TTRPGs, cooking tasty vegan meals in the kitchen, or at the dojo teaching karate.

Stephanie Eristoff is a software engineer on the Storage Execution team at MongoDB, which she joined in 2024. She brings experience in performance testing of time-series collections and indexes and has also contributed to the Performance and Query Execution teams. Outside of work, Stephanie enjoys baking, running, and biking.

Katya Kamenieva is a product manager at MongoDB, where she has been driving improvements to the aggregation framework, change streams, and other features since 2019. Her work focuses on enhancing how users interact with data in MongoDB. Outside of work, she’s a mom to two young daughters and finds mindfulness through quiet moments, a good cup of coffee, yoga, and watercolor painting.

Jawwad Asghar is a software engineer at MongoDB, where he has been a key member of the performance team since 2022. He helps drive changes to the database by optimizing its performance for diverse and demanding customer use cases. Prior to joining MongoDB, he spent 10 years as a control software engineer in the locomotive industry. When not behind his desk, he enjoys biking around New York City, camping, and going hiking upstate.

Alice Doherty is a software engineer at MongoDB, having joined the Product Performance team in 2023. She offers valuable experience in root-causing unique performance problems, both for customers and for internal engineering teams, as well as driving performance improvements in MongoDB’s products. Outside of work, Alice enjoys staying active, cooking, and any adventures in the outdoors.

Linda Qin is a staff technical services engineer at MongoDB. She joined MongoDB in 2013 and has worked in the technical support team since then. Linda specializes in issue diagnosis and performance tuning, with deep expertise in sharding. She is also involved in developing tools to streamline diagnostics and improve efficiency. Outside of work, Linda enjoys cooking and bushwalking.

Matt Panton is a product manager at MongoDB. Since joining in 2022, Matt has focused on delivering improvements to key distributed systems features such as sharding and replication. His work is centered on making MongoDB scalable and resilient. Outside of work, Matt enjoys playing soccer, running, and cooking.

Sebastien Mendez (but call him Seb) is a lead engineer on MongoDB’s Query Execution team. Back in the day, he was active in the demoscene as “rakiz,” hacking Z80 assembly demos, creating diskmags on his Amstrad CPC 6128, and organizing coding parties. After exploring several industries, Seb discovered a passion for data processing—building a persistence layer, a query engine, and a highly configurable streaming data pipeline. Today, he specializes in MongoDB’s change streams, helping shape their roadmap around observability, performance, and new features. Outside of work, Seb enjoys video gaming, reading sci-fi and fantasy, tinkering with Arduinos, and growing his pop culture T-shirt collection.

Kevin Arhelger is a staff engineer on the Technical Services team at MongoDB, where he focuses on improving performance for customers across diverse environments. He has been professionally administering Linux systems since 2006 and has specialized in software and database performance since 2008. With deep expertise in server administration, cloud automation, and full-stack performance tuning, Kevin brings a holistic approach to solving complex technical challenges.

Manuel Fontan is a senior technologist on the Curriculum team at MongoDB. Previously, he was a senior technical services engineer in the Core team at MongoDB. In between, Manuel worked as a database reliability engineer at Slack for a little over two years and then for Cognite until he rejoined MongoDB. With over 15 years of experience in software development and distributed systems, he is naturally curious and holds a Telecommunications Engineering MSc from Vigo University (Spain) and a Free and Open Source Software MSc from Rey Juan Carlos University (Spain).

Parker Faucher is a self-taught software engineer with over six years of experience in technical education. He has authored more than 100 educational videos for MongoDB, establishing himself as a knowledgeable resource in database technologies. Currently, Parker focuses on AI and search technologies, exploring innovative solutions in these rapidly evolving fields. When not advancing his technical expertise, Parker enjoys spending quality time with his family and maintains an avid interest in collecting comic books.

Sarah Evans is a curriculum engineer at MongoDB, where she specializes in making complex technical concepts accessible for all learners. With over ten years of experience in software engineering, education, and developing curriculum for technical bootcamps, she’s all about making learning MongoDB less intimidating and more exciting. When she’s not helping developers level up their database skills, Sarah can be found having spontaneous dance parties with her family, exploring the great outdoors with her dog, or tending to her garden.

Contents

Acknowledgements

Preface

How this book will help you

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Systems and MongoDB Architecture

What are systems?

Characteristics of systems

Changing systems is a risky business

A system with no delays is simple

A system with delays can behave in unexpected ways

Trying to fix oscillations

Systems surprise us

A typical software system

Algorithmic efficiency (complexity)

Avoid premature optimization

Amdahl’s law (limit of parallel speedup)

Locality and caching

Little’s law (throughput versus latency)

Understanding MongoDB architecture

The document model: MongoDB’s foundation

Key architectural components of MongoDB

The data services system

Query engine

Storage engine/WiredTiger

Libraries

Other system components that mongod uses

Managing complexity in modern data platforms

Flexible data model with rigorous capabilities

Built-in redundancy and resilience

Horizontal scaling with intelligent distribution

Performance tools

Finding bottlenecks

An incremental process for optimization

Summary

References

Schema Design for Performance

Understanding the core principles of schema design

There is no single right way

Data collocation

Read and write trade-offs

Small versus large documents

Common myths

Key strengths of the MongoDB schema design

One-to-many relationships

Embedding weak entities

Dynamic attributes

Caches and snapshots

Optimization for common use cases

Schema evolution

Schema validation

Common schema design mistakes

Overnormalizing

Overembedding

Other common anti-patterns

Schema design patterns by benefit

Patterns for read performance optimization

Patterns for write performance optimization

Patterns for query and analytics optimization

Archive pattern for storage optimization

Real-world application: The Socialite app

Scenario 1: User profile and activity feed

Scenario 2: Chat system

Summary

Indexes

Introduction to indexes

What is an index?

Resource efficiency and trade-offs

Resource usage

Common misconceptions about indexes in MongoDB

Types of indexes in MongoDB

Single-field indexes

Compound indexes

Multikey indexes

Sparse indexes

Wildcard indexes

Partial indexes

Designing efficient indexes

Cardinality and selectivity

Constructing compound indexes

Equality queries

Sorts and range queries

The ESR guideline

Maximizing resources with partial indexes

Covered queries: the performance holy grail

Ascending versus descending index order

Indexing and aggregation pipelines

Summary

Aggregations

MongoDB’s aggregation framework

Core concepts of the aggregation pipeline

Performance considerations

Aggregation pipeline flow

Optimizing aggregation pipelines

Optimization techniques

Filter data early

Avoid unnecessary $unwind and $group

Design efficient $group operations

Avoid common $lookup performance issues

Efficient use of $project and $addFields

Working with large datasets

Aggregation pipeline limits

Managing memory constraints with allowDiskUse

Aggregation in distributed environments

Optimizing aggregation for sharded collections

Understanding shard-local versus merged operations

Monitoring and profiling aggregation performance

Utilizing materialized views

Summary

Replication

Understanding MongoDB replica sets

Components of a replica set

Replication and high availability

Understanding the MongoDB election process

Replica set configuration

Chained replication

Replica set tags and analytics nodes

Replication internals and performance

Flow control

Replication and the oplog

Managing replication lag

Read and write strategies

Read preference

Write concern and durability

Summary

Sharding

Understanding core sharding architecture

Architectural components of a sharded cluster

Sharding a collection and selecting a shard key

Why scatter-gather is bad

Strategic shard key selection

Shard key for targeting operations

Shard key with good granularity

Avoid increasing or decreasing shard key values

Types of sharding

Range-based sharding

Hashed sharding

Zone-based sharding

Advanced sharding administration

Resharding: Whether, when, and how

Balancer considerations

Pre-splitting: Whether, when, and how

Moving unsharded collections

Colocating sharded collection chunks together

Summary

Storage Engines

Exploring storage engines

Overview of WiredTiger

A lookup operation

An update operation

An insert operation

Eviction, checkpointing, and recovery

Compression and encryption

Configuration for improving performance

Changing the size of the WiredTiger cache

Changing syncdelay

Changing minSnapshotHistoryWindowInSeconds

Changing how eviction works

Switching to the in-memory storage engine

Changing the max leaf page size

Summary

Change Streams

Understanding change streams architecture

How change streams work: From write operations to events

Event structure and life cycle

Implementing change streams effectively

Choosing the right scope and filtering strategy

Server-side filtering with aggregation pipelines

Document lookup strategies and performance

Building a price monitoring service

Managing performance and durability

Resource optimization strategies

Handling high-volume event streams

Special considerations for sharded deployments

Advanced patterns and production readiness

Transaction visibility and event batching

Document size limitations and collection life cycle

Monitoring and health checks

Replica set considerations

Performance-tuning recap

Summary

Transactions

Understanding multi-document ACID transactions

History and evolution of transactions in MongoDB

Introduction to ACID properties in MongoDB

Document-level atomicity versus multi-document transactions

Document-level atomicity

Multi-document transactions atomicity

When to use multi-document transactions in MongoDB

Transactions API and session management

Core API

Callback API

Read/write concerns and transaction behavior

Performance considerations with transactions

Replica set versus sharded cluster transactions

WiredTiger cache considerations

Managing transaction runtime limits and errors

Lock acquisition and contention management

Optimizing transaction size and duration

Common transaction anti-patterns and their performance costs

Long-running transactions and their impact on system performance

Unnecessary use of transactions where single-document atomicity would suffice

Single-document transactions

Transactions for read-only operations

Misunderstanding transaction scope and atomicity

Frequent small transactions on hot documents/collections

Improper error handling and retry logic

Insufficient monitoring of transaction metrics

Summary

Client Libraries

What are drivers?

How MongoDB drivers work

Key features of MongoDB drivers

Consistency and reliability through shared specifications

Idiomatic experience

Performance optimization

What are object-document mappers (ODMs)?

Understanding ODMs

Key features of ODMs

Schema enforcement and data validation

Intuitive query APIs

Relationship management

Middleware and life cycle hooks

Type safety and IDE integration

Impact on developer productivity

When to use ODMs

What are application frameworks?

The value of application frameworks

Leveraging ODMs and ORMs in frameworks

Popular MongoDB-compatible frameworks

Best practices when using frameworks with MongoDB

Beyond the basics

Asynchronous and non-blocking patterns

Surfacing and handling failure conditions

Connection management

Read/write concerns and read preferences

Compression and network performance

Summary

Managing Connections and Network Performance

Understanding connection fundamentals

Latency

Connection churn

Network saturation

Understanding the connection lifecycle

Connection establishment

Connection utilization and pooling

Connection termination

MongoDB connection architecture

TCP/IP and the MongoDB Wire Protocol

Driver connection pooling

MongoDB server connection handling

Monitoring and troubleshooting connections

Connection monitoring best practices

Optimizing connection management

Connection pool optimization

Connection timeout configuration

Server-side optimization

Operating system configuration

Performance optimization leveraging network compression

Benefits and trade-offs of network compression

Available compression algorithms

Implementing network compression

Connection strategies for serverless environments

Summary

Advanced Query and Indexing Concepts

Understanding query execution

Plan stages, or “how indexes can be used”

Using the explain command

The queryPlanner section

The executionStats section

Analyzing log messages

Identifying problematic patterns

Query targeting ratio

Waiting for disk or other resources

How to influence query execution

Using hint

Using query settings

Other options

MQL semantics and indexes

Challenges with arrays and multikey indexes

Equality is not just equality

$elemMatch

Deduplication

Challenges with null and $exists

Additional best practices

Updates

findAndModify

Aggregation and query

$sample

$facet

$where, $function, $accumulator, and mapReduce

$text

$regex and indexes

Aggregation versus match expressions

$expr and indexes

$or clause and indexes

Special collections, index types, and features

Time-series collections

Geospatial indexes

Atlas Search

Atlas Vector Search

Collations

Summary

Operating Systems and System Resources

Technical requirements

Managing resources for optimal performance

CPU utilization

Memory management

Storage

Network

Configuring systems for MongoDB performance

Understand the ideal ratio of simultaneous operations per CPU core

Search for other CPU-intensive processes during performance dips

Select the right filesystem for your application

Extent file system (XFS)

Fourth extended filesystem (ext4)

Other file systems

Filesystem settings

Avoid RAID with parity

Adjust readahead settings

Change filesystem cache settings

Check SSD health

Avoid double encryption and compression

Ensure resident memory usage stays below 80%

Networking best practices

Using auto-scaling for Atlas performance

Summary

Monitoring and Observability

Key differences between monitoring and observability

Core MongoDB metrics and signals

Operational metrics

Performance-specific metrics

Monitoring with MongoDB Atlas

Atlas UI features

Real-Time Performance Panel (RTPP)

Performance Advisor

Query Insights

Alerts

Atlas Search issues

Connection issues

Query issues

Oplog issues

Self-managed monitoring tools

mongostat: real-time activity snapshot

mongotop: Collection-level read/write timings

serverStatus: Comprehensive metrics via the shell

Database profiler: Deep dive into slow operations

Integration with external monitoring systems

Prometheus + Grafana

Atlas integration for Prometheus

Visualizing with Grafana

Datadog

Application performance monitoring platforms

OpenTelemetry

Considerations for external tools

Common performance patterns and what to monitor

Disk I/O bottlenecks

Cache pressure (WiredTiger cache utilization)

Replication lag and oplog window drift

Summary

Debugging Performance Issues

Identifying inherently slow operations

Case study: Troubleshooting cluster performance with Atlas

Diagnosing the cause

Root cause identification

Implementing the solution

Results and learnings

Case study: Unexpected admin query causing collection scan

Diagnosing the cause

Implementing the solution

Results and learnings

Managing blocked operations

Case study: Cursor leak investigation

Diagnosing the cause

Root cause identification

Implementing the solution

Results and learnings

Use case: Burst of poorly optimized queries

Diagnosing the cause

Implementing the solution

Results and learnings

Addressing hardware resource constraints

Case study: Atlas cluster resource optimization

Diagnosing the cause

Implementing the solution

Results and learnings

Use case: Diagnosing insufficient IOPs in self‑managed MongoDB

Diagnosing the cause

Implementing the solution

Results and learnings

Systematic approach to performance troubleshooting

Summary

Afterword

Key takeaways

The performance mindset

Practical next steps

Future considerations

Final thoughts

Index

Other Books You May Enjoy

Landmarks

Cover

Index