Bayesian Analysis with Python - Osvaldo Martin - E-Book

Bayesian Analysis with Python E-Book

Osvaldo Martin

0,0
35,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

The third edition of Bayesian Analysis with Python serves as an introduction to the main concepts of applied Bayesian modeling using PyMC, a state-of-the-art probabilistic programming library, and other libraries that support and facilitate modeling like ArviZ, for exploratory analysis of Bayesian models; Bambi, for flexible and easy hierarchical linear modeling; PreliZ, for prior elicitation; PyMC-BART, for flexible non-parametric regression; and Kulprit, for variable selection.

In this updated edition, a brief and conceptual introduction to probability theory enhances your learning journey by introducing new topics like Bayesian additive regression trees (BART), featuring updated examples. Refined explanations, informed by feedback and experience from previous editions, underscore the book's emphasis on Bayesian statistics. You will explore various models, including hierarchical models, generalized linear models for regression and classification, mixture models, Gaussian processes, and BART, using synthetic and real datasets.

By the end of this book, you will possess a functional understanding of probabilistic modeling, enabling you to design and implement Bayesian models for your data science challenges. You'll be well-prepared to delve into more advanced material or specialized statistical modeling if the need arises.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 467

Veröffentlichungsjahr: 2024

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Bayesian Analysis with Python - Third Edition

Bayesian Analysis with Python - Third Edition

Bayesian Analysis with Python

Third Edition

A practical guide to probabilistic modeling

Osvaldo Martin

BIRMINGHAM—MUMBAI

Bayesian Analysis with Python

Third Edition

Copyright © 2024 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Lead Senior Publishing Product Manager: Tushar Gupta

Acquisition Editor – Peer Reviews: Bethany O’Connell

Project Editor: Namrata Katare

Development Editor: Tanya D’cruz

Copy Editor: Safis Editing

Technical Editor: Aniket Shetty

Indexer: Rekha Nair

Proofreader: Safis Editing

Presentation Designer: Pranit Padwal

Developer Relations Marketing Executive: Monika Sangwan

First published: November 2016

Second edition: December 2018

Third edition: January 2024

Production reference: 2300724

Published by Packt Publishing Ltd.

Grosvenor House 11

St Paul’s Square

Birmingham B3 1RB, UK.

ISBN 978-1-80512-716-1

www.packt.com

In gratitude to my family: Romina, Abril, and Bruno.

Foreword

As we present this new edition of Bayesian Analysis with Python, it’s essential to recognize the profound impact this book has had on advancing the growth and education of the probabilistic programming user community. The journey from its first publication to this current edition mirrors the evolution of Bayesian modeling itself – a path marked by significant advancements, growing community involvement, and an increasing presence in both academia and industry.

The field of probabilistic programming is in a different place today than it was when the first edition was devised in the middle of the last decade. As long-term practitioners, we have seen firsthand how Bayesian methods grew from a more fringe methodology to the primary way of solving some of the most advanced problems in science and various industries. This trend is supported by the continued development of advanced, performant, high-level tools such as PyMC. With this is a growing number of new applied users, many of whom have limited experience with either Bayesian methods, PyMC, or the underlying libraries that probabilistic programming packages increasingly rely on to accelerate computation. In this context, this new edition comes at the perfect time to introduce the next generation of data scientists to this increasingly powerful methodology.

Osvaldo Martin, a teacher, applied statistician, and long-time core PyMC developer, is the perfect guide to help readers navigate this complex landscape. He provides a clear concise and comprehensive introduction to Bayesian methods and the PyMC library, and he walks readers through a variety of real-world examples. As the population of data scientists using probabilistic programming grows, it is important to instill them with good habits and a sound workflow; Dr. Martin here provides sound, engaging guidance for doing so.

What makes this book a go-to reference is its coverage of most of the key questions posed by applied users: How do I express my problem as a probabilistic program? How do I know if my model is working? How do I know which model is best? Herein you will find a primer on Bayesian best practices, updated to current standards based on methodological improvements since the release of the last edition. This includes innovations related to the PyMC library itself, which has come a long way since PyMC3, much to the benefit of you, the end-user.

Complementing these improvements is the expansion of the PyMC ecosystem, a reflection of the broadening scope and capabilities of Bayesian modeling. This edition includes discussions on four notable new libraries: Bambi, Kulprit, PreliZ, and PyMC-BART. These additions, along with the continuous refinement of text and code, ensure that readers are equipped with the latest tools and methodologies in Bayesian analysis. This edition is not just an update but a significant step forward in the journey of probabilistic programming, mirroring the dynamic evolution of PyMC and its community.

The previous two editions of this book have been cornerstones for many in understanding and applying Bayesian methods. Each edition, including this latest one, has evolved to incorporate new developments, making it an indispensable resource for both newcomers and experienced practitioners. As PyMC continues to evolve - perhaps even to newer versions by the time this book is read - the content here remains relevant, providing foundational knowledge and insights into the latest advancements. In this edition, readers will find not only a comprehensive introduction to Bayesian analysis but also a window into the cutting-edge techniques that are currently shaping the field. We hope this book serves as both a guide and an inspiration, showcasing the power and flexibility of Bayesian modeling in addressing complex data-driven challenges.

As co-authors of this foreword, we are excited about the journey that lies ahead for readers of this book. You are joining a vibrant, ever-expanding community of enthusiasts and professionals who are pushing the boundaries of what’s possible in data analysis. We trust that this book will be a valuable companion in your exploration of Bayesian modeling and a catalyst for your own contributions to this dynamic field.

Christopher Fonnesbeck, PyMC’s original author and Principal Quantitative Analyst for the Philadelphia Phillies

Thomas Wiecki, CEO & Founder of PyMC Labs

Contributors

About the author

Osvaldo Martin is a researcher at The National Scientific and Technical Research Council (CONICET), in Argentina. He has worked on structural bioinformatics of biomolecules and has used Markov Chain Monte Carlo methods to simulate molecular systems. He is currently working on computational methods for Bayesian statistics and probabilistic programming. He has taught courses about structural bioinformatics, data science, and Bayesian data analysis. He was also the head of the organizing committee of PyData San Luis (Argentina) 2017, the first PyData in LatinAmerica. He contributed to many open-source projects, including ArviZ, Bambi, Kulprit, PreliZ, and PyMC.

I would like to thank Romina for her continuous support. I also want to thank Tomás Capretto, Alejandro Icazatti, Juan Orduz, and Bill Engels for providing invaluable feedback and suggestions on my drafts. A special thanks go to the core developers and all contributors of the Python packages used in this book. Their dedication, love, and hard work have made this book possible.

About the reviewer

Joon (Joonsuk) Park is a former quantitative psychologist and currently a machine learning engineer. He graduated from the Ohio State University with a PhD in Quantitative Psychology in 2019. His research during graduate study was focused on the applications of Bayesian statistics to cognitive modeling and behavioral research methodology. He transitioned into an industry data science and has worked as a data scientist since 2020. He has also published several books on psychology, statistics, and data science in Korean.

Join our community Discord space

Join our Discord community to meet like-minded people and learn alongside more than 5000 members at: https://packt.link/bayesian

Table of Contents

Thinking Probabilistically  1.1 Statistics, models, and this book’s approach  1.2 Working with data  1.3 Bayesian modeling  1.4 A probability primer for Bayesian practitioners   1.4.1 Sample space and events   1.4.2 Random variables   1.4.3 Discrete random variables and their distributions   1.4.4 Continuous random variables and their distributions   1.4.5 Cumulative distribution function   1.4.6 Conditional probability   1.4.7 Expected values   1.4.8 Bayes’ theorem  1.5 Interpreting probabilities  1.6 Probabilities, uncertainty, and logic  1.7 Single-parameter inference   1.7.1 The coin-flipping problem   1.7.2 Choosing the likelihood   1.7.3 Choosing the prior   1.7.4 Getting the posterior   1.7.5 The influence of the prior  1.8 How to choose priors  1.9 Communicating a Bayesian analysis   1.9.1 Model notation and visualization   1.9.2 Summarizing the posterior  1.10 Summary  1.11 ExercisesProgramming Probabilistically  2.1 Probabilistic programming   2.1.1 Flipping coins the PyMC way  2.2 Summarizing the posterior  2.3 Posterior-based decisions   2.3.1 Savage-Dickey density ratio   2.3.2 Region Of Practical Equivalence   2.3.3 Loss functions  2.4 Gaussians all the way down   2.4.1 Gaussian inferences  2.5 Posterior predictive checks  2.6 Robust inferences   2.6.1 Degrees of normality   2.6.2 A robust version of the Normal model  2.7 InferenceData  2.8 Groups comparison   2.8.1 The tips dataset   2.8.2 Cohen’s d   2.8.3 Probability of superiority   2.8.4 Posterior analysis of mean differences  2.9 Summary  2.10 ExercisesHierarchical Models  3.1 Sharing information, sharing priors  3.2 Hierarchical shifts  3.3 Water quality  3.4 Shrinkage  3.5 Hierarchies all the way up  3.6 Summary  3.7 ExercisesModeling with Lines  4.1 Simple linear regression  4.2 Linear bikes   4.2.1 Interpreting the posterior mean   4.2.2 Interpreting the posterior predictions  4.3 Generalizing the linear model  4.4 Counting bikes  4.5 Robust regression  4.6 Logistic regression   4.6.1 The logistic model   4.6.2 Classification with logistic regression   4.6.3 Interpreting the coefficients of logistic regression  4.7 Variable variance  4.8 Hierarchical linear regression   4.8.1 Centered vs. noncentered hierarchical models  4.9 Multiple linear regression  4.10 Summary  4.11 ExercisesComparing Models  5.1 Posterior predictive checks  5.2 The balance between simplicity and accuracy   5.2.1 Many parameters (may) lead to overfitting   5.2.2 Too few parameters lead to underfitting  5.3 Measures of predictive accuracy   5.3.1 Information criteria   5.3.2 Cross-validation  5.4 Calculating predictive accuracy with ArviZ  5.5 Model averaging  5.6 Bayes factors   5.6.1 Some observations   5.6.2 Calculation of Bayes factors  5.7 Bayes factors and inference  5.8 Regularizing priors  5.9 Summary  5.10 ExercisesModeling with Bambi  6.1 One syntax to rule them all  6.2 The bikes model, Bambi’s version  6.3 Polynomial regression  6.4 Splines  6.5 Distributional models  6.6 Categorical predictors   6.6.1 Categorical penguins   6.6.2 Relation to hierarchical models  6.7 Interactions  6.8 Interpreting models with Bambi  6.9 Variable selection   6.9.1 Projection predictive inference   6.9.2 Projection predictive with Kulprit  6.10 Summary  6.11 ExercisesMixture Models  7.1 Understanding mixture models  7.2 Finite mixture models   7.2.1 The Categorical distribution   7.2.2 The Dirichlet distribution   7.2.3 Chemical mixture  7.3 The non-identifiability of mixture models  7.4 How to choose K  7.5 Zero-Inflated and hurdle models   7.5.1 Zero-Inflated Poisson regression   7.5.2 Hurdle models  7.6 Mixture models and clustering  7.7 Non-finite mixture model   7.7.1 Dirichlet process  7.8 Continuous mixtures   7.8.1 Some common distributions are mixtures  7.9 Summary  7.10 ExercisesGaussian Processes  8.1 Linear models and non-linear data  8.2 Modeling functions  8.3 Multivariate Gaussians and functions   8.3.1 Covariance functions and kernels  8.4 Gaussian processes  8.5 Gaussian process regression  8.6 Gaussian process regression with PyMC   8.6.1 Setting priors for the length scale  8.7 Gaussian process classification   8.7.1 GPs for space flu  8.8 Cox processes   8.8.1 Coal mining disasters   8.8.2 Red wood  8.9 Regression with spatial autocorrelation  8.10 Hilbert space GPs   8.10.1 HSGP with Bambi  8.11 Summary  8.12 ExercisesBayesian Additive Regression Trees  9.1 Decision trees  9.2 BART models   9.2.1 Bartian penguins   9.2.2 Partial dependence plots   9.2.3 Individual conditional plots   9.2.4 Variable selection with BART  9.3 Distributional BART models  9.4 Constant and linear response  9.5 Choosing the number of trees  9.6 Summary  9.7 ExercisesInference Engines  10.1 Inference engines  10.2 The grid method  10.3 Quadratic method  10.4 Markovian methods   10.4.1 Monte Carlo   10.4.2 Markov chain   10.4.3 Metropolis-Hastings   10.4.4 Hamiltonian Monte Carlo  10.5 Sequential Monte Carlo  10.6 Diagnosing the samples  10.7 Convergence   10.7.1 Trace plot   10.7.2 Rank plot   10.7.3 ^R (R hat)  10.8 Effective Sample Size (ESS)  10.9 Monte Carlo standard error  10.10 Divergences  10.11 Keep calm and keep trying  10.12 Summary  10.13 ExercisesWhere to Go Next

Table of Contents

Bayesian Analysis with Python Third Edition

Preface

Chapter 1

Thinking Probabilistically

1.1

Statistics, models, and this book’s approach

1.2

Working with data

1.3

Bayesian modeling

1.4

A probability primer for Bayesian practitioners

1.5

Interpreting probabilities

1.6

Probabilities, uncertainty, and logic

1.7

Single-parameter inference

1.8

How to choose priors

1.9

Communicating a Bayesian analysis

1.10

Summary

1.11

Exercises

Join our community Discord space

Chapter 2

Programming Probabilistically

2.1

Probabilistic programming

2.2

Summarizing the posterior

2.3

Posterior-based decisions

2.4

Gaussians all the way down

2.5

Posterior predictive checks

2.6

Robust inferences

2.7

InferenceData

2.8

Groups comparison

2.9

Summary

2.10

Exercises

Join our community Discord space

Chapter 3

Hierarchical Models

3.1

Sharing information, sharing priors

3.2

Hierarchical shifts

3.3

Water quality

3.4

Shrinkage

3.5

Hierarchies all the way up

3.6

Summary

3.7

Exercises

Join our community Discord space

Chapter 4

Modeling with Lines

4.1

Simple linear regression

4.2

Linear bikes

4.3

Generalizing the linear model

4.4

Counting bikes

4.5

Robust regression

4.6

Logistic regression

4.7

Variable variance

4.8

Hierarchical linear regression

4.9

Multiple linear regression

4.10

Summary

4.11

Exercises

Join our community Discord space

Chapter 5

Comparing Models

5.1

Posterior predictive checks

5.2

The balance between simplicity and accuracy

5.3

Measures of predictive accuracy

5.4

Calculating predictive accuracy with ArviZ

5.5

Model averaging

5.6

Bayes factors

5.7

Bayes factors and inference

5.8

Regularizing priors

5.9

Summary

5.10

Exercises

Join our community Discord space

Chapter 6

Modeling with Bambi

6.1

One syntax to rule them all

6.2

The bikes model, Bambi’s version

6.3

Polynomial regression

6.4

Splines

6.5

Distributional models

6.6

Categorical predictors

6.7

Interactions

6.8

Interpreting models with Bambi

6.9

Variable selection

6.10

Summary

6.11

Exercises

Join our community Discord space

Chapter 7

Mixture Models

7.1

Understanding mixture models

7.2

Finite mixture models

7.3

The non-identifiability of mixture models

7.4

How to choose K

7.5

Zero-Inflated and hurdle models

7.6

Mixture models and clustering

7.7

Non-finite mixture model

7.8

Continuous mixtures

7.9

Summary

7.10

Exercises

Join our community Discord space

Chapter 8

Gaussian Processes

8.1

Linear models and non-linear data

8.2

Modeling functions

8.3

Multivariate Gaussians and functions

8.4

Gaussian processes

8.5

Gaussian process regression

8.6

Gaussian process regression with PyMC

8.7

Gaussian process classification

8.8

Cox processes

8.9

Regression with spatial autocorrelation

8.10

Hilbert space GPs

8.11

Summary

8.12

Exercises

Join our community Discord space

Chapter 9

Bayesian Additive Regression Trees

9.1

Decision trees

9.2

BART models

9.3

Distributional BART models

9.4

Constant and linear response

9.5

Choosing the number of trees

9.6

Summary

9.7

Exercises

Join our community Discord space

Chapter 10

Inference Engines

10.1

Inference engines

10.2

The grid method

10.3

Quadratic method

10.4

Markovian methods

10.5

Sequential Monte Carlo

10.6

Diagnosing the samples

10.7

Convergence

10.8

Effective Sample Size (ESS)

10.9

Monte Carlo standard error

10.10

Divergences

10.11

Keep calm and keep trying

10.12

Summary

10.13

Exercises

Join our community Discord space

Chapter 11

Where to Go Next

Join our community Discord space

Bibliography

Other Books You May Enjoy

Index

Landmarks

Title Page

Cover

Table of Contents