53,99 €
With data as the new competitive edge, performance has become the need of the hour. As applications handle exponentially growing data and user demand for speed and reliability rises, three industry experts distill their decades of experience to offer you guidance on designing, building, and operating databases that deliver fast, scalable, and resilient experiences.
MongoDB’s document model and distributed architecture provide powerful tools for modern applications, but unlocking their full potential requires a deep understanding of architecture, operational patterns, and tuning best practices. This MongoDB book takes a hands-on approach to diagnosing common performance issues and applying proven optimization strategies from schema design and indexing to storage engine tuning and resource management.
Whether you’re optimizing a single replica set or scaling a sharded cluster, this book provides the tools to maximize deployment performance. Its modular chapters let you explore query optimization, connection management, and monitoring or follow a complete learning path to build a rock-solid performance foundation. With real-world case studies, code examples, and proven best practices, you’ll be ready to troubleshoot bottlenecks, scale efficiently, and keep MongoDB running at peak performance in even the most demanding production environments.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 556
Veröffentlichungsjahr: 2025
High Performance with MongoDB
Best practices for performance tuning, scaling, and architecture
Asya Kamsky, Ger Hartnett, Alex Bevilacqua
High Performance with MongoDB
Copyright © 2025 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Portfolio Director: Sunith Shetty
Relationship Lead: Sathya Mohan
Project Managers: Aniket Shetty & Sathya Mohan
Content Engineer: Siddhant Jain
Technical Editor: Aniket Shetty
Copy Editor: Safis Editing
Indexer: Hemangini Bari
Proofreader: Siddhant Jain
Production Designer: Deepak Chavan
First published: September 2025
Production reference: 1220825
Published by Packt Publishing Ltd.
Grosvenor House
11 St Paul’s Square
Birmingham
B3 1RB, UK.
ISBN 978-1-83702-263-2
www.packtpub.com
To Ben and Mark for their unwavering support
– Asya
To Luci, Clíodhna, Hannah, and Jane. Thank you for all of your support.
– Ger
Thanks to all my amazing friends and colleagues at MongoDB for helping me get the word out about how awesome MongoDB can be.
– Alex
We’d like to thank everyone who developed, documented, managed, and supported the MongoDB products over the years. In particular, we’d like to thank colleagues who made suggestions or put together internal content that contributed to this book, including Cesar Albuquerque de Godoy, Pierre Depretz, Dave Walker, Steve Wotring, Chris Harris, and Xiaochen Wu.
Asya Kamsky is a Principal KnowItAll at MongoDB, where she has worked since 2012. Before discovering MongoDB, she spent nearly two decades coaxing relational databases into doing what they were never designed to do at a series of startups, most of which no one remembers. Asya has an extensive background in software development, databases, and telling people what they’re doing wrong.
Ger Hartnett is a Lead Engineer on MongoDB’s product performance team, where he loves working on fascinating optimization and scaling challenges. Before MongoDB, he founded a startup that built a project communication platform (that had scaling issues). Prior to that, he architected and tuned embedded software at Intel, where he co-authored a book. Even before that, he developed systems at companies including Tellabs, Digital, and Motorola.
Alex Bevilacqua is the Lead Product Manager at MongoDB for Developer Experience. Prior to joining MongoDB in 2018, he worked as a software engineer and systems architect, implementing solutions in a number of languages, technologies, and frameworks. Aside from his passion for programming that consumes more than just his working hours, Alex can typically be found at an arena with one of his kids, both of whom play rep hockey.
Keith Smith is a systems and storage expert with decades of experience in operating systems and data infrastructure. Based in the U.S., he has contributed to innovations at companies like Sun Microsystems, NetApp, and MongoDB. Keith started his career designing real-time UNIX systems and went on to build extensible operating systems at Harvard, where he earned his PhD. His work has spanned object-based storage, metadata catalogs, and file system architectures. Since 2020, he has led storage engine development at MongoDB. Keith is passionate about scalable systems and continues to push the boundaries of data storage technology.
John Page is a full-stack engineer who joined MongoDB in 2013. Since then, he has held senior technical roles focused on customer success across all phases of the development lifecycle. Before MongoDB, he spent 18 years building document database systems for law enforcement and the U.S. military. Trained as a geologist, John can find silicon in the wild, build a computer from raw components, and write everything from operating systems and databases to front ends and documentation.
James Kovacs is a director of engineering in Database Experience at MongoDB. He focuses on client libraries, object-document mappers, framework integrations, and AI/ML. He has over 25 years of professional software engineering experience, having worked in a wide variety of companies and industries over that time. When not in front of a keyboard, you can find him playing board games/TTRPGs, cooking tasty vegan meals in the kitchen, or at the dojo teaching karate.
Stephanie Eristoff is a software engineer on the Storage Execution team at MongoDB, which she joined in 2024. She brings experience in performance testing of time-series collections and indexes and has also contributed to the Performance and Query Execution teams. Outside of work, Stephanie enjoys baking, running, and biking.
Katya Kamenieva is a product manager at MongoDB, where she has been driving improvements to the aggregation framework, change streams, and other features since 2019. Her work focuses on enhancing how users interact with data in MongoDB. Outside of work, she’s a mom to two young daughters and finds mindfulness through quiet moments, a good cup of coffee, yoga, and watercolor painting.
Jawwad Asghar is a software engineer at MongoDB, where he has been a key member of the performance team since 2022. He helps drive changes to the database by optimizing its performance for diverse and demanding customer use cases. Prior to joining MongoDB, he spent 10 years as a control software engineer in the locomotive industry. When not behind his desk, he enjoys biking around New York City, camping, and going hiking upstate.
Alice Doherty is a software engineer at MongoDB, having joined the Product Performance team in 2023. She offers valuable experience in root-causing unique performance problems, both for customers and for internal engineering teams, as well as driving performance improvements in MongoDB’s products. Outside of work, Alice enjoys staying active, cooking, and any adventures in the outdoors.
Linda Qin is a staff technical services engineer at MongoDB. She joined MongoDB in 2013 and has worked in the technical support team since then. Linda specializes in issue diagnosis and performance tuning, with deep expertise in sharding. She is also involved in developing tools to streamline diagnostics and improve efficiency. Outside of work, Linda enjoys cooking and bushwalking.
Matt Panton is a product manager at MongoDB. Since joining in 2022, Matt has focused on delivering improvements to key distributed systems features such as sharding and replication. His work is centered on making MongoDB scalable and resilient. Outside of work, Matt enjoys playing soccer, running, and cooking.
Sebastien Mendez (but call him Seb) is a lead engineer on MongoDB’s Query Execution team. Back in the day, he was active in the demoscene as “rakiz,” hacking Z80 assembly demos, creating diskmags on his Amstrad CPC 6128, and organizing coding parties. After exploring several industries, Seb discovered a passion for data processing—building a persistence layer, a query engine, and a highly configurable streaming data pipeline. Today, he specializes in MongoDB’s change streams, helping shape their roadmap around observability, performance, and new features. Outside of work, Seb enjoys video gaming, reading sci-fi and fantasy, tinkering with Arduinos, and growing his pop culture T-shirt collection.
Kevin Arhelger is a staff engineer on the Technical Services team at MongoDB, where he focuses on improving performance for customers across diverse environments. He has been professionally administering Linux systems since 2006 and has specialized in software and database performance since 2008. With deep expertise in server administration, cloud automation, and full-stack performance tuning, Kevin brings a holistic approach to solving complex technical challenges.
Manuel Fontan is a senior technologist on the Curriculum team at MongoDB. Previously, he was a senior technical services engineer in the Core team at MongoDB. In between, Manuel worked as a database reliability engineer at Slack for a little over two years and then for Cognite until he rejoined MongoDB. With over 15 years of experience in software development and distributed systems, he is naturally curious and holds a Telecommunications Engineering MSc from Vigo University (Spain) and a Free and Open Source Software MSc from Rey Juan Carlos University (Spain).
Parker Faucher is a self-taught software engineer with over six years of experience in technical education. He has authored more than 100 educational videos for MongoDB, establishing himself as a knowledgeable resource in database technologies. Currently, Parker focuses on AI and search technologies, exploring innovative solutions in these rapidly evolving fields. When not advancing his technical expertise, Parker enjoys spending quality time with his family and maintains an avid interest in collecting comic books.
Sarah Evans is a curriculum engineer at MongoDB, where she specializes in making complex technical concepts accessible for all learners. With over ten years of experience in software engineering, education, and developing curriculum for technical bootcamps, she’s all about making learning MongoDB less intimidating and more exciting. When she’s not helping developers level up their database skills, Sarah can be found having spontaneous dance parties with her family, exploring the great outdoors with her dog, or tending to her garden.
Acknowledgements
Preface
How this book will help you
Who this book is for
What this book covers
To get the most out of this book
Get in touch
Systems and MongoDB Architecture
What are systems?
Characteristics of systems
Changing systems is a risky business
A system with no delays is simple
A system with delays can behave in unexpected ways
Trying to fix oscillations
Systems surprise us
A typical software system
Algorithmic efficiency (complexity)
Avoid premature optimization
Amdahl’s law (limit of parallel speedup)
Locality and caching
Little’s law (throughput versus latency)
Understanding MongoDB architecture
The document model: MongoDB’s foundation
Key architectural components of MongoDB
The data services system
Query engine
Storage engine/WiredTiger
Libraries
Other system components that mongod uses
Managing complexity in modern data platforms
Flexible data model with rigorous capabilities
Built-in redundancy and resilience
Horizontal scaling with intelligent distribution
Performance tools
Finding bottlenecks
An incremental process for optimization
Summary
References
Schema Design for Performance
Understanding the core principles of schema design
There is no single right way
Data collocation
Read and write trade-offs
Small versus large documents
Common myths
Key strengths of the MongoDB schema design
One-to-many relationships
Embedding weak entities
Dynamic attributes
Caches and snapshots
Optimization for common use cases
Schema evolution
Schema validation
Common schema design mistakes
Overnormalizing
Overembedding
Other common anti-patterns
Schema design patterns by benefit
Patterns for read performance optimization
Patterns for write performance optimization
Patterns for query and analytics optimization
Archive pattern for storage optimization
Real-world application: The Socialite app
Scenario 1: User profile and activity feed
Scenario 2: Chat system
Summary
Indexes
Introduction to indexes
What is an index?
Resource efficiency and trade-offs
Resource usage
Common misconceptions about indexes in MongoDB
Types of indexes in MongoDB
Single-field indexes
Compound indexes
Multikey indexes
Sparse indexes
Wildcard indexes
Partial indexes
Designing efficient indexes
Cardinality and selectivity
Constructing compound indexes
Equality queries
Sorts and range queries
The ESR guideline
Maximizing resources with partial indexes
Covered queries: the performance holy grail
Ascending versus descending index order
Indexing and aggregation pipelines
Summary
Aggregations
MongoDB’s aggregation framework
Core concepts of the aggregation pipeline
Performance considerations
Aggregation pipeline flow
Optimizing aggregation pipelines
Optimization techniques
Filter data early
Avoid unnecessary $unwind and $group
Design efficient $group operations
Avoid common $lookup performance issues
Efficient use of $project and $addFields
Working with large datasets
Aggregation pipeline limits
Managing memory constraints with allowDiskUse
Aggregation in distributed environments
Optimizing aggregation for sharded collections
Understanding shard-local versus merged operations
Monitoring and profiling aggregation performance
Utilizing materialized views
Summary
Replication
Understanding MongoDB replica sets
Components of a replica set
Replication and high availability
Understanding the MongoDB election process
Replica set configuration
Chained replication
Replica set tags and analytics nodes
Replication internals and performance
Flow control
Replication and the oplog
Managing replication lag
Read and write strategies
Read preference
Write concern and durability
Summary
Sharding
Understanding core sharding architecture
Architectural components of a sharded cluster
Sharding a collection and selecting a shard key
Why scatter-gather is bad
Strategic shard key selection
Shard key for targeting operations
Shard key with good granularity
Avoid increasing or decreasing shard key values
Types of sharding
Range-based sharding
Hashed sharding
Zone-based sharding
Advanced sharding administration
Resharding: Whether, when, and how
Balancer considerations
Pre-splitting: Whether, when, and how
Moving unsharded collections
Colocating sharded collection chunks together
Summary
Storage Engines
Exploring storage engines
Overview of WiredTiger
A lookup operation
An update operation
An insert operation
Eviction, checkpointing, and recovery
Compression and encryption
Configuration for improving performance
Changing the size of the WiredTiger cache
Changing syncdelay
Changing minSnapshotHistoryWindowInSeconds
Changing how eviction works
Switching to the in-memory storage engine
Changing the max leaf page size
Summary
Change Streams
Understanding change streams architecture
How change streams work: From write operations to events
Event structure and life cycle
Implementing change streams effectively
Choosing the right scope and filtering strategy
Server-side filtering with aggregation pipelines
Document lookup strategies and performance
Building a price monitoring service
Managing performance and durability
Resource optimization strategies
Handling high-volume event streams
Special considerations for sharded deployments
Advanced patterns and production readiness
Transaction visibility and event batching
Document size limitations and collection life cycle
Monitoring and health checks
Replica set considerations
Performance-tuning recap
Summary
Transactions
Understanding multi-document ACID transactions
History and evolution of transactions in MongoDB
Introduction to ACID properties in MongoDB
Document-level atomicity versus multi-document transactions
Document-level atomicity
Multi-document transactions atomicity
When to use multi-document transactions in MongoDB
Transactions API and session management
Core API
Callback API
Read/write concerns and transaction behavior
Performance considerations with transactions
Replica set versus sharded cluster transactions
WiredTiger cache considerations
Managing transaction runtime limits and errors
Lock acquisition and contention management
Optimizing transaction size and duration
Common transaction anti-patterns and their performance costs
Long-running transactions and their impact on system performance
Unnecessary use of transactions where single-document atomicity would suffice
Single-document transactions
Transactions for read-only operations
Misunderstanding transaction scope and atomicity
Frequent small transactions on hot documents/collections
Improper error handling and retry logic
Insufficient monitoring of transaction metrics
Summary
Client Libraries
What are drivers?
How MongoDB drivers work
Key features of MongoDB drivers
Consistency and reliability through shared specifications
Idiomatic experience
Performance optimization
What are object-document mappers (ODMs)?
Understanding ODMs
Key features of ODMs
Schema enforcement and data validation
Intuitive query APIs
Relationship management
Middleware and life cycle hooks
Type safety and IDE integration
Impact on developer productivity
When to use ODMs
What are application frameworks?
The value of application frameworks
Leveraging ODMs and ORMs in frameworks
Popular MongoDB-compatible frameworks
Best practices when using frameworks with MongoDB
Beyond the basics
Asynchronous and non-blocking patterns
Surfacing and handling failure conditions
Connection management
Read/write concerns and read preferences
Compression and network performance
Summary
Managing Connections and Network Performance
Understanding connection fundamentals
Latency
Connection churn
Network saturation
Understanding the connection lifecycle
Connection establishment
Connection utilization and pooling
Connection termination
MongoDB connection architecture
TCP/IP and the MongoDB Wire Protocol
Driver connection pooling
MongoDB server connection handling
Monitoring and troubleshooting connections
Connection monitoring best practices
Optimizing connection management
Connection pool optimization
Connection timeout configuration
Server-side optimization
Operating system configuration
Performance optimization leveraging network compression
Benefits and trade-offs of network compression
Available compression algorithms
Implementing network compression
Connection strategies for serverless environments
Summary
Advanced Query and Indexing Concepts
Understanding query execution
Plan stages, or “how indexes can be used”
Using the explain command
The queryPlanner section
The executionStats section
Analyzing log messages
Identifying problematic patterns
Query targeting ratio
Waiting for disk or other resources
How to influence query execution
Using hint
Using query settings
Other options
MQL semantics and indexes
Challenges with arrays and multikey indexes
Equality is not just equality
$elemMatch
Deduplication
Challenges with null and $exists
Additional best practices
Updates
findAndModify
Aggregation and query
$sample
$facet
$where, $function, $accumulator, and mapReduce
$text
$regex and indexes
Aggregation versus match expressions
$expr and indexes
$or clause and indexes
Special collections, index types, and features
Time-series collections
Geospatial indexes
Atlas Search
Atlas Vector Search
Collations
Summary
Operating Systems and System Resources
Technical requirements
Managing resources for optimal performance
CPU utilization
Memory management
Storage
Network
Configuring systems for MongoDB performance
Understand the ideal ratio of simultaneous operations per CPU core
Search for other CPU-intensive processes during performance dips
Select the right filesystem for your application
Extent file system (XFS)
Fourth extended filesystem (ext4)
Other file systems
Filesystem settings
Avoid RAID with parity
Adjust readahead settings
Change filesystem cache settings
Check SSD health
Avoid double encryption and compression
Ensure resident memory usage stays below 80%
Networking best practices
Using auto-scaling for Atlas performance
Summary
Monitoring and Observability
Key differences between monitoring and observability
Core MongoDB metrics and signals
Operational metrics
Performance-specific metrics
Monitoring with MongoDB Atlas
Atlas UI features
Real-Time Performance Panel (RTPP)
Performance Advisor
Query Insights
Alerts
Atlas Search issues
Connection issues
Query issues
Oplog issues
Self-managed monitoring tools
mongostat: real-time activity snapshot
mongotop: Collection-level read/write timings
serverStatus: Comprehensive metrics via the shell
Database profiler: Deep dive into slow operations
Integration with external monitoring systems
Prometheus + Grafana
Atlas integration for Prometheus
Visualizing with Grafana
Datadog
Application performance monitoring platforms
OpenTelemetry
Considerations for external tools
Common performance patterns and what to monitor
Disk I/O bottlenecks
Cache pressure (WiredTiger cache utilization)
Replication lag and oplog window drift
Summary
Debugging Performance Issues
Identifying inherently slow operations
Case study: Troubleshooting cluster performance with Atlas
Diagnosing the cause
Root cause identification
Implementing the solution
Results and learnings
Case study: Unexpected admin query causing collection scan
Diagnosing the cause
Implementing the solution
Results and learnings
Managing blocked operations
Case study: Cursor leak investigation
Diagnosing the cause
Root cause identification
Implementing the solution
Results and learnings
Use case: Burst of poorly optimized queries
Diagnosing the cause
Implementing the solution
Results and learnings
Addressing hardware resource constraints
Case study: Atlas cluster resource optimization
Diagnosing the cause
Implementing the solution
Results and learnings
Use case: Diagnosing insufficient IOPs in self‑managed MongoDB
Diagnosing the cause
Implementing the solution
Results and learnings
Systematic approach to performance troubleshooting
Summary
Afterword
Key takeaways
The performance mindset
Practical next steps
Future considerations
Final thoughts
Index
Other Books You May Enjoy
Cover
Index