35,99 €
The third edition of Bayesian Analysis with Python serves as an introduction to the main concepts of applied Bayesian modeling using PyMC, a state-of-the-art probabilistic programming library, and other libraries that support and facilitate modeling like ArviZ, for exploratory analysis of Bayesian models; Bambi, for flexible and easy hierarchical linear modeling; PreliZ, for prior elicitation; PyMC-BART, for flexible non-parametric regression; and Kulprit, for variable selection.
In this updated edition, a brief and conceptual introduction to probability theory enhances your learning journey by introducing new topics like Bayesian additive regression trees (BART), featuring updated examples. Refined explanations, informed by feedback and experience from previous editions, underscore the book's emphasis on Bayesian statistics. You will explore various models, including hierarchical models, generalized linear models for regression and classification, mixture models, Gaussian processes, and BART, using synthetic and real datasets.
By the end of this book, you will possess a functional understanding of probabilistic modeling, enabling you to design and implement Bayesian models for your data science challenges. You'll be well-prepared to delve into more advanced material or specialized statistical modeling if the need arises.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 467
Veröffentlichungsjahr: 2024
Bayesian Analysis with Python
Third Edition
A practical guide to probabilistic modeling
Osvaldo Martin
BIRMINGHAM—MUMBAI
Bayesian Analysis with Python
Third Edition
Copyright © 2024 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Lead Senior Publishing Product Manager: Tushar Gupta
Acquisition Editor – Peer Reviews: Bethany O’Connell
Project Editor: Namrata Katare
Development Editor: Tanya D’cruz
Copy Editor: Safis Editing
Technical Editor: Aniket Shetty
Indexer: Rekha Nair
Proofreader: Safis Editing
Presentation Designer: Pranit Padwal
Developer Relations Marketing Executive: Monika Sangwan
First published: November 2016
Second edition: December 2018
Third edition: January 2024
Production reference: 2300724
Published by Packt Publishing Ltd.
Grosvenor House 11
St Paul’s Square
Birmingham B3 1RB, UK.
ISBN 978-1-80512-716-1
www.packt.com
In gratitude to my family: Romina, Abril, and Bruno.
As we present this new edition of Bayesian Analysis with Python, it’s essential to recognize the profound impact this book has had on advancing the growth and education of the probabilistic programming user community. The journey from its first publication to this current edition mirrors the evolution of Bayesian modeling itself – a path marked by significant advancements, growing community involvement, and an increasing presence in both academia and industry.
The field of probabilistic programming is in a different place today than it was when the first edition was devised in the middle of the last decade. As long-term practitioners, we have seen firsthand how Bayesian methods grew from a more fringe methodology to the primary way of solving some of the most advanced problems in science and various industries. This trend is supported by the continued development of advanced, performant, high-level tools such as PyMC. With this is a growing number of new applied users, many of whom have limited experience with either Bayesian methods, PyMC, or the underlying libraries that probabilistic programming packages increasingly rely on to accelerate computation. In this context, this new edition comes at the perfect time to introduce the next generation of data scientists to this increasingly powerful methodology.
Osvaldo Martin, a teacher, applied statistician, and long-time core PyMC developer, is the perfect guide to help readers navigate this complex landscape. He provides a clear concise and comprehensive introduction to Bayesian methods and the PyMC library, and he walks readers through a variety of real-world examples. As the population of data scientists using probabilistic programming grows, it is important to instill them with good habits and a sound workflow; Dr. Martin here provides sound, engaging guidance for doing so.
What makes this book a go-to reference is its coverage of most of the key questions posed by applied users: How do I express my problem as a probabilistic program? How do I know if my model is working? How do I know which model is best? Herein you will find a primer on Bayesian best practices, updated to current standards based on methodological improvements since the release of the last edition. This includes innovations related to the PyMC library itself, which has come a long way since PyMC3, much to the benefit of you, the end-user.
Complementing these improvements is the expansion of the PyMC ecosystem, a reflection of the broadening scope and capabilities of Bayesian modeling. This edition includes discussions on four notable new libraries: Bambi, Kulprit, PreliZ, and PyMC-BART. These additions, along with the continuous refinement of text and code, ensure that readers are equipped with the latest tools and methodologies in Bayesian analysis. This edition is not just an update but a significant step forward in the journey of probabilistic programming, mirroring the dynamic evolution of PyMC and its community.
The previous two editions of this book have been cornerstones for many in understanding and applying Bayesian methods. Each edition, including this latest one, has evolved to incorporate new developments, making it an indispensable resource for both newcomers and experienced practitioners. As PyMC continues to evolve - perhaps even to newer versions by the time this book is read - the content here remains relevant, providing foundational knowledge and insights into the latest advancements. In this edition, readers will find not only a comprehensive introduction to Bayesian analysis but also a window into the cutting-edge techniques that are currently shaping the field. We hope this book serves as both a guide and an inspiration, showcasing the power and flexibility of Bayesian modeling in addressing complex data-driven challenges.
As co-authors of this foreword, we are excited about the journey that lies ahead for readers of this book. You are joining a vibrant, ever-expanding community of enthusiasts and professionals who are pushing the boundaries of what’s possible in data analysis. We trust that this book will be a valuable companion in your exploration of Bayesian modeling and a catalyst for your own contributions to this dynamic field.
Christopher Fonnesbeck, PyMC’s original author and Principal Quantitative Analyst for the Philadelphia Phillies
Thomas Wiecki, CEO & Founder of PyMC Labs
Osvaldo Martin is a researcher at The National Scientific and Technical Research Council (CONICET), in Argentina. He has worked on structural bioinformatics of biomolecules and has used Markov Chain Monte Carlo methods to simulate molecular systems. He is currently working on computational methods for Bayesian statistics and probabilistic programming. He has taught courses about structural bioinformatics, data science, and Bayesian data analysis. He was also the head of the organizing committee of PyData San Luis (Argentina) 2017, the first PyData in LatinAmerica. He contributed to many open-source projects, including ArviZ, Bambi, Kulprit, PreliZ, and PyMC.
I would like to thank Romina for her continuous support. I also want to thank Tomás Capretto, Alejandro Icazatti, Juan Orduz, and Bill Engels for providing invaluable feedback and suggestions on my drafts. A special thanks go to the core developers and all contributors of the Python packages used in this book. Their dedication, love, and hard work have made this book possible.
Joon (Joonsuk) Park is a former quantitative psychologist and currently a machine learning engineer. He graduated from the Ohio State University with a PhD in Quantitative Psychology in 2019. His research during graduate study was focused on the applications of Bayesian statistics to cognitive modeling and behavioral research methodology. He transitioned into an industry data science and has worked as a data scientist since 2020. He has also published several books on psychology, statistics, and data science in Korean.
Join our Discord community to meet like-minded people and learn alongside more than 5000 members at: https://packt.link/bayesian
Bayesian Analysis with Python Third Edition
Preface
Chapter 1
Thinking Probabilistically
1.1
Statistics, models, and this book’s approach
1.2
Working with data
1.3
Bayesian modeling
1.4
A probability primer for Bayesian practitioners
1.5
Interpreting probabilities
1.6
Probabilities, uncertainty, and logic
1.7
Single-parameter inference
1.8
How to choose priors
1.9
Communicating a Bayesian analysis
1.10
Summary
1.11
Exercises
Join our community Discord space
Chapter 2
Programming Probabilistically
2.1
Probabilistic programming
2.2
Summarizing the posterior
2.3
Posterior-based decisions
2.4
Gaussians all the way down
2.5
Posterior predictive checks
2.6
Robust inferences
2.7
InferenceData
2.8
Groups comparison
2.9
Summary
2.10
Exercises
Join our community Discord space
Chapter 3
Hierarchical Models
3.1
Sharing information, sharing priors
3.2
Hierarchical shifts
3.3
Water quality
3.4
Shrinkage
3.5
Hierarchies all the way up
3.6
Summary
3.7
Exercises
Join our community Discord space
Chapter 4
Modeling with Lines
4.1
Simple linear regression
4.2
Linear bikes
4.3
Generalizing the linear model
4.4
Counting bikes
4.5
Robust regression
4.6
Logistic regression
4.7
Variable variance
4.8
Hierarchical linear regression
4.9
Multiple linear regression
4.10
Summary
4.11
Exercises
Join our community Discord space
Chapter 5
Comparing Models
5.1
Posterior predictive checks
5.2
The balance between simplicity and accuracy
5.3
Measures of predictive accuracy
5.4
Calculating predictive accuracy with ArviZ
5.5
Model averaging
5.6
Bayes factors
5.7
Bayes factors and inference
5.8
Regularizing priors
5.9
Summary
5.10
Exercises
Join our community Discord space
Chapter 6
Modeling with Bambi
6.1
One syntax to rule them all
6.2
The bikes model, Bambi’s version
6.3
Polynomial regression
6.4
Splines
6.5
Distributional models
6.6
Categorical predictors
6.7
Interactions
6.8
Interpreting models with Bambi
6.9
Variable selection
6.10
Summary
6.11
Exercises
Join our community Discord space
Chapter 7
Mixture Models
7.1
Understanding mixture models
7.2
Finite mixture models
7.3
The non-identifiability of mixture models
7.4
How to choose K
7.5
Zero-Inflated and hurdle models
7.6
Mixture models and clustering
7.7
Non-finite mixture model
7.8
Continuous mixtures
7.9
Summary
7.10
Exercises
Join our community Discord space
Chapter 8
Gaussian Processes
8.1
Linear models and non-linear data
8.2
Modeling functions
8.3
Multivariate Gaussians and functions
8.4
Gaussian processes
8.5
Gaussian process regression
8.6
Gaussian process regression with PyMC
8.7
Gaussian process classification
8.8
Cox processes
8.9
Regression with spatial autocorrelation
8.10
Hilbert space GPs
8.11
Summary
8.12
Exercises
Join our community Discord space
Chapter 9
Bayesian Additive Regression Trees
9.1
Decision trees
9.2
BART models
9.3
Distributional BART models
9.4
Constant and linear response
9.5
Choosing the number of trees
9.6
Summary
9.7
Exercises
Join our community Discord space
Chapter 10
Inference Engines
10.1
Inference engines
10.2
The grid method
10.3
Quadratic method
10.4
Markovian methods
10.5
Sequential Monte Carlo
10.6
Diagnosing the samples
10.7
Convergence
10.8
Effective Sample Size (ESS)
10.9
Monte Carlo standard error
10.10
Divergences
10.11
Keep calm and keep trying
10.12
Summary
10.13
Exercises
Join our community Discord space
Chapter 11
Where to Go Next
Join our community Discord space
Bibliography
Other Books You May Enjoy
Index
Title Page
Cover
Table of Contents