Bayesian Analysis with Python. - Osvaldo Martin - E-Book

Bayesian Analysis with Python. E-Book

Osvaldo Martin

0,0
34,79 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

The second edition of Bayesian Analysis with Python is an introduction to the main concepts of applied Bayesian inference and its practical implementation in Python using PyMC3, a state-of-the-art probabilistic programming library, and ArviZ, a new library for exploratory analysis of Bayesian models.

The main concepts of Bayesian statistics are covered using a practical and computational approach. Synthetic and real data sets are used to introduce several types of models, such as generalized linear models for regression and classification, mixture models, hierarchical models, and Gaussian processes, among others.

By the end of the book, you will have a working knowledge of probabilistic modeling and you will be able to design and implement Bayesian models for your own data science problems. After reading the book you will be better prepared to delve into more advanced material or specialized statistical modeling if you need to.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

Seitenzahl: 440

Veröffentlichungsjahr: 2018

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Bayesian Analysis with Python
Second Edition
Introduction to statistical modeling and probabilistic programming using PyMC3 and ArviZ
Osvaldo Martin
BIRMINGHAM - MUMBAI

Bayesian Analysis with Python Second Edition

Copyright © 2018 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Commissioning Editor: Pavan RamchandaniAcquisition Editor: Joshua NadarContent Development Editor:Unnati GuhaTechnical Editor: Sayli NikaljeCopy Editor: Safis EditingProject Coordinator: Manthan PatelProofreader: Safis EditingIndexer: Priyanka DhadkeGraphics: Jisha ChirayilProduction Coordinator:Arvindkumar Gupta

First published: November 2016 Second edition: December 2018

Production reference: 2261219

Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.

ISBN 978-1-78934-165-2

www.packtpub.com

I dedicate this book to Abril.
mapt.io

Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

Improve your learning with Skill Plans built especially for you

Get a free eBook or video every month

Mapt is fully searchable

Copy and paste, print, and bookmark content

Packt.com

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

Foreword

Probabilistic programming is a framework that allows you to flexibly build Bayesian statistical models in computer code. Once built, powerful inference algorithms that work independently of the model you formulated can be used to fit your model to data. This combination of flexible model specification and automatic inference provides a powerful tool for the researcher to quickly build, analyze, and iteratively improve novel statistical models. This iterative approach is in stark contrast to the way Bayesian models were fitted to data before: previous inference algorithms usually only worked for one specific model. Not only did this require strong mathematical skills to formulate the model and devise an inference scheme, it also considerably slowed down the iterative cycle: change the model, re-derive your inference. Probabilistic programming thus democratizes statistical modeling by considerably lowering the mathematical understanding and time required to successfully build novel models and gain unique insights into your data.

The idea behind probabilistic programming is not new: BUGS, the first of its kind, was first released in 1989. The kinds of model that could be fitted successfully were extremely limited and inference was slow, rendering these first-generation languages not very practical. Today, there are a multitude of probabilistic programming languages that are widely used in academia and at companies such as Google, Microsoft, Amazon, Facebook, and Uber to solve large and complex problems. What has changed? The key factor in lifting probabilistic programming from being a cute toy to the powerful engine that can solve complex large-scale problems is the advent of Hamiltonian Monte Carlo samplers, which are several orders of magnitude more powerful than previous sampling algorithms. While originally devised in 1987, only the more recent probabilistic programming systems named Stan and PyMC3 made these samplers widely available and usable.

This book will give you a practical introduction to this extremely powerful and flexible tool. It will have a big impact on how you think about and solve complex analytical problems. There are few people better suited to have written it than PyMC3 core developer Osvaldo Martin. Osvaldo has the rare talent of breaking complex topics down to make them easily digestible. His deep practical understanding, gained through hard-won experience, allows him to take you, the reader, on the most efficient route through this terrain, which could otherwise easily seem impenetrable. The visualizations and code examples make this book an eminently practicable resource through which you will gain an intuitive understanding of their theoretical underpinnings.

I also would like to commend you, dear reader, for having picked up this book. It is not the fast and easy route. In a time where headlines advertise deep learning as the technique to solve all current and future analytical problems, the more careful and deliberate approach of building a custom model for a specific purpose might not seem quite as attractive. However, you will be able to solve problems that can hardly be solved any other way.

This is not to say that deep learning is not an extremely exciting technique. In fact, probabilistic programming itself is not constrained to classic statistical models. Reading the current machine learning literature, you will find that Bayesian statistics is emerging as a powerful framework to express and understand next-generation deep neural networks. This book will thus equip you not only with the skills to solve hard analytical problems, but also to have a front-row seat in humanity's perhaps greatest endeavor: the development of artificial intelligence. Enjoy!

Thomas Wiecki, PhD

Head of Research at Quantopian.

Contributors

About the author

Osvaldo Martin is a researcher at The National Scientific and Technical Research Council (CONICET), in Argentina. He has worked on structural bioinformatics of protein, glycans, and RNA molecules. He has experience using Markov Chain Monte Carlo methods to simulate molecular systems and loves to use Python to solve data analysis problems.

He has taught courses about structural bioinformatics, data science, and Bayesian data analysis. He was also the head of the organizing committee of PyData San Luis (Argentina) 2017. He is one of the core developers of PyMC3 and ArviZ.

I would like to thank Romina for her continuous support. I also want to thank Walter Lapadula, Bill Engels, Eric J Ma, and Austin Rochford for providing invaluable feedback and suggestions on my drafts. A special thanks goes to the core developers and all contributors of PyMC3 and ArviZ. This book was possible only because of the dedication, love, and hard work they have put into these libraries and into building a great community around them.

About the reviewer

Eric J MA is a data scientist at the Novartis Institutes for Biomedical Research. There, he conducts biomedical data science research, with a focus on using Bayesian statistical methods in the service of making medicines for patients. Prior to Novartis, he was an Insight Health Data Fellow in the summer of 2017, and defended his doctoral thesis in the spring of 2017.

Eric is also an open source software developer, and has led the development of nxviz, a visualization package for NetworkX, and pyjanitor, a clean API for cleaning data in Python. In addition, he has made contributions to a range of open source tools, including PyMC3, Matplotlib, bokeh, and CuPy.

His personal life motto is found in the Luke 12:48.

Austin Rochford is a principal data scientist at Monetate Labs, where he develops products that allow retailers to personalize their marketing across billions of events a year. He is a mathematician by training and is a passionate advocate of Bayesian methods.

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Table of Contents

Title Page

Copyright and Credits

Bayesian Analysis with Python Second Edition

Dedication

About Packt

Why subscribe?

Packt.com

Foreword

Contributors

About the author

About the reviewer

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

Thinking Probabilistically

Statistics, models, and this book's approach

Working with data

Bayesian modeling

Probability theory

Interpreting probabilities

Defining probabilities

Probability distributions

Independently and identically distributed variables

Bayes' theorem

Single-parameter inference

The coin-flipping problem

The general model

Choosing the likelihood

Choosing the prior

Getting the posterior

Computing and plotting the posterior

The influence of the prior and how to choose one

Communicating a Bayesian analysis

Model notation and visualization

Summarizing the posterior

Highest-posterior density

Posterior predictive checks

Summary

Exercises

Programming Probabilistically

Probabilistic programming

PyMC3 primer

Flipping coins the PyMC3 way

Model specification

Pushing the inference button

Summarizing the posterior

Posterior-based decisions

ROPE

Loss functions

Gaussians all the way down

Gaussian inferences

Robust inferences

Student's t-distribution

Groups comparison

Cohen's d

Probability of superiority

The tips dataset

Hierarchical models

Shrinkage

One more example

Summary

Exercises

Modeling with Linear Regression

Simple linear regression

The machine learning connection

The core of the linear regression models

Linear models and high autocorrelation

Modifying the data before running

Interpreting and visualizing the posterior

Pearson correlation coefficient

Pearson coefficient from a multivariate Gaussian

Robust linear regression

Hierarchical linear regression

Correlation, causation, and the messiness of life

Polynomial regression

Interpreting the parameters of a polynomial regression

Polynomial regression – the ultimate model?

Multiple linear regression

Confounding variables and redundant variables

Multicollinearity or when the correlation is too high

Masking effect variables

Adding interactions

Variable variance

Summary

Exercises

Generalizing Linear Models

Generalized linear models

Logistic regression

The logistic model

The Iris dataset

The logistic model applied to the iris dataset

Multiple logistic regression

The boundary decision

Implementing the model

Interpreting the coefficients of a logistic regression

Dealing with correlated variables

Dealing with unbalanced classes

Softmax regression

Discriminative and generative models

Poisson regression

Poisson distribution

The zero-inflated Poisson model

Poisson regression and ZIP regression

Robust logistic regression

The GLM module

Summary

Exercises

Model Comparison

Posterior predictive checks

Occam's razor – simplicity and accuracy

Too many parameters leads to overfitting

Too few parameters leads to underfitting

The balance between simplicity and accuracy

Predictive accuracy measures

Cross-validation

Information criteria

Log-likelihood and deviance

Akaike information criterion

Widely applicable Information Criterion

Pareto smoothed importance sampling leave-one-out cross-validation

Other Information Criteria

Model comparison with PyMC3

A note on the reliability of WAIC and LOO computations

Model averaging

Bayes factors

Some remarks

Computing Bayes factors

Common problems when computing Bayes factors

Using Sequential Monte Carlo to compute Bayes factors

Bayes factors and Information Criteria

Regularizing priors

WAIC in depth

Entropy

Kullback-Leibler divergence

Summary

Exercises

Mixture Models

Mixture models

Finite mixture models

The categorical distribution

The Dirichlet distribution

Non-identifiability of mixture models

How to choose K

Mixture models and clustering

Non-finite mixture model

Dirichlet process

Continuous mixtures

Beta-binomial and negative binomial

The Student's t-distribution

Summary

Exercises

Gaussian Processes

Linear models and non-linear data

Modeling functions

Multivariate Gaussians and functions

Covariance functions and kernels

Gaussian processes

Gaussian process regression

Regression with spatial autocorrelation

Gaussian process classification

Cox processes

The coal-mining disasters

The redwood dataset

Summary

Exercises

Inference Engines

Inference engines

Non-Markovian methods

Grid computing

Quadratic method

Variational methods

Automatic differentiation variational inference

Markovian methods

Monte Carlo

Markov chain

Metropolis-Hastings

Hamiltonian Monte Carlo

Sequential Monte Carlo

Diagnosing the samples

Convergence

Monte Carlo error

Autocorrelation

Effective sample sizes

Divergences

Non-centered parameterization

Summary

Exercises

Where To Go Next?

Other Books You May Enjoy

Leave a review - let other readers know what you think

Preface

Bayesian statistics has been developing for more than 250 years. During this time, it has enjoyed as much recognition and appreciation as it has faced disdain and contempt. Throughout the last few decades, it has gained more and more attention from people in statistics and almost all the other sciences, engineering, and even outside the boundaries of the academic world. This revival has been possible due to theoretical and computational advancements developed mostly throughout the second half of the 20th century. Indeed, modern Bayesian statistics is mostly computational statistics. The necessity for flexible and transparent models and a more intuitive interpretation of statistical models and analysis has only contributed to the trend.

In this book we will adopt a pragmatic approach to Bayesian statistics and we will not care too much about other statistical paradigms and their relationships with Bayesian statistics. The aim of this book is to learn how to do Bayesian data analysis; philosophical discussions are interesting, but they have already been undertaken elsewhere in a richer way that is simply outside the scope of these pages.

We will take a modeling approach to statistics, learn how to think in terms of probabilistic models, and apply Bayes' theorem to derive the logical consequences of our models and data. The approach will also be computational; models will be coded using PyMC3, a library for Bayesian statistics that hides most of the mathematical details and computations from the user, and ArviZ, a Python package for exploratory analysis of Bayesian models.

Bayesian methods are theoretically grounded in probability theory, and so it's no wonder that many books about Bayesian statistics are full of mathematical formulas requiring a certain level of mathematical sophistication. Learning the mathematical foundations of statistics could certainly help you build better models and gain intuition about problems, models, and results. Nevertheless, libraries such as PyMC3 allow us to learn and do Bayesian statistics with only a modest amount of mathematical knowledge, as you will be able to verify yourself throughout this book.

Who this book is for

If you are a student, data scientist, researcher in the natural or social sciences, or a developer looking to get started with Bayesian data analysis and probabilistic programming, this book is for you. The book is introductory, so no previous statistical knowledge is required, although some experience in using Python and NumPy is expected.

What this book covers

Chapter 1, Thinking Probabilistically, covers the basic concepts of Bayesian statistics and its implications for data analysis. This chapter contains most of the foundational ideas used in the rest of the book.

Chapter 2, Programming Probabilistically, revisits the concepts from the previous chapter from a more computational perspective. The PyMC3 probabilistic programming library is introduced, as well as ArviZ, a Python library for exploratory analysis of Bayesian models. Hierarchical models are explained with a couple of examples.

Chapter 3, Modeling with Linear Regression, covers the basic elements of linear regression, a very widely used model and the building block of more complex models.

Chapter 4, Generalizing Linear Models, covers how to expand linear models with other distributions than the Gaussian, opening the door to solving many data analysis problems.

Chapter 5, Model Comparison, discusses how to compare, select, and average models using WAIC, LOO, and Bayes factors. The general caveats of these methods are discussed.

Chapter 6, Mixture Models, discusses how to add flexibility to models by mixing simpler distributions to build more complex ones. The first non-parametric model in the book is also introduced: the Dirichlet process.

Chapter 7, Gaussian Processes, cover the basic idea behind Gaussian processes and how to use them to build non-parametric models over functions for a wide array of problems.

Chapter 8, Inference Engines, provides an introduction to methods for numerically approximating the posterior distribution, as well as a very important topic from the practitioner's perspective: how to diagnose the reliability of the approximated posterior.

Chapter 9, Where To Go Next?, provides a list of resources for you to keep learning from beyond this book, and a very short farewell speech.

To get the most out of this book

The code in the book was written using Python version 3.6. To install Python and Python libraries, I recommend using Anaconda, a scientific computing distribution. You can read more about Anaconda and download it at https://www.anaconda.com/download/. This will install many useful Python packages on you system. You will need to install two more packages. To install PyMC3 please use conda:

conda install -c conda-forge pymc3

And for ArviZ you can do it with the following command:

pip install arviz

An alternative way to install the necessary packages, once Anaconda is installed in your system, is to go to https://github.com/aloctavodia/BAP and download the environment file named bap.yml. Using it, you can install all the necessary packages by doing the following:

conda env create -f bap.yml

The Python packages used to write this book are listed here:

IPython 7.0

Jupyter 1.0 (or Jupyter-lab 0.35)

NumPy 1.14.2

SciPy 1.1

pandas 0.23.4

Matplotlib 3.0.2

Seaborn 0.9.0

ArviZ 0.3.1

PyMC3 3.6

The code presented in each chapter assumes that you have imported at least some of these packages. Instead of copying and pasting the code from the book, I recommend downloading the code from https://github.com/aloctavodia/BAP and running it using Jupyter Notebook (or Jupyter Lab). I will keep this repository updated for new releases of PyMC3 or ArviZ. If you find a technical problem running the code in this book, a typo in the text, or any other mistake, please fill an issue in that repository and I will try to solve it as soon as possible.

Most figures in this book are generated using code. A common pattern you will find in this book is the following: a block of code immediately followed by a figure (generated from that code). I hope this pattern will look familiar to those of you using Jupyter Notebook/Lab, and I hope it does not appear annoying or confusing to anyone.

Download the example code files

You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packt.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

Log in or register at

www.packt.com

.

Select the

SUPPORT

tab.

Click on

Code Downloads & Errata

.

Enter the name of the book in the

Search

box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR/7-Zip for Windows

Zipeg/iZip/UnRarX for Mac

7-Zip/PeaZip for Linux

The code and figures for the book are hosted on GitHub athttps://github.com/PacktPublishing/Bayesian-Analysis-with-Python-Second-Edition.In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available athttps://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here:https://www.packtpub.com/sites/default/files/downloads/9781789341652_ColorImages.pdf.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packt.com/submit-errata,selecting your book, clicking on the Errata Submission Form link, and entering the details.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

For more information about Packt, please visit packt.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.

Thinking Probabilistically

"Probability theory is nothing but common sense reduced to calculation."
- Pierre Simon Laplace

In this chapter, we will learn about the core concepts of Bayesian statistics and some of the instruments in the Bayesian toolbox. We will use some Python code, but this chapter will be mostly theoretical; most of the concepts we will see here will be revisited many times throughout this book. This chapter, being heavy on the theoretical side, may be a little anxiogenic for the coder in you, but I think it will ease the path in effectively applying Bayesian statistics to your problems.

In this chapter, we will cover the following topics:

Statistical modeling

Probabilities and uncertainty

Bayes' theorem and statistical inference

Single-parameter inference and the classic coin-flip problem

Choosing priors and why people often don't like them, but should

Communicating a Bayesian analysis

Statistics, models, and this book's approach

Statistics is about collecting, organizing, analyzing, and interpreting data, and hence statistical knowledge is essential for data analysis. Two main statistical methods are used in data analysis:

Exploratory Data Analysis

(

EDA

): This is about numerical summaries, such as the mean, mode, standard deviation, and interquartile ranges (this part of EDA is also known as

descriptive statistics

). EDA is also about visually inspecting the data, using tools you may be already familiar with, such as histograms and scatter plots.

Inferential statistics

: This is about making statements beyond the current data. We may want to understand some particular phenomenon, or maybe we want to make predictions for future (as yet unobserved) data points, or we need to choose among several competing explanations for the same observations. Inferential statistics is the set of methods and tools that will help us to answer these types of questions.

The focus of this book is upon how to perform Bayesian inferential statistics, and then how to use EDA to summarize, interpret, check, and communicate the results of Bayesian inference.

Most introductory statistical courses, at least for non-statisticians, are taught as a collection of recipes that more or less go like this: go to the statistical pantry, pick one tin can and open it, add data to taste, and stir until you obtain a consistent p-value, preferably under 0.05. The main goal in this type of course is to teach you how to pick the proper can. I never liked this approach, mainly because the most common result is a bunch of confused people unable to grasp, even at the conceptual level, the unity of the different learned methods. We will take a different approach: we will also learn some recipes, but this will be homemade rather than canned food; we will learn how to mix fresh ingredients that will suit different gastronomic occasions and, more importantly, that will let you to apply concepts far beyond the examples in this book.

Taking this approach is possible for two reasons:

Ontological

: Statistics is a form of modeling unified under the mathematical framework of probability theory. Using a probabilistic approach provides a unified view of what may seem like very disparate methods; statistical methods and

machine learning

(

ML

) methods look much more similar under the probabilistic lens.

Technical

: Modern software, such as PyMC3, allows practitioners, just like you and me, to define and solve models in a relative easy way. Many of these models were unsolvable just a few years ago or required a high level of mathematical and technical sophistication.

Working with data

Data is an essential ingredient in statistics and data science. Data comes from several sources, such as experiments, computer simulations, surveys, and field observations. If we are the ones in charge of generating or gathering the data, it is always a good idea to first think carefully about the questions we want to answer and which methods we will use, and only then proceed to get the data. In fact, there is a whole branch of statistics dealing with data collection, known as experimental design. In the era of the data deluge, we can sometimes forget that gathering data is not always cheap. For example, while it is true that the Large Hadron Collider (LHC) produces hundreds of terabytes a day, its construction took years of manual and intellectual labor.

As a general rule, we can think of the process generating the data as stochastic, because there is ontological, technical, and/or epistemic uncertainty, that is, the system is intrinsically stochastic, there are technical issues adding noise or restricting us from measuring with arbitrary precision, and/or there are conceptual limitations veiling details from us. For all these reasons, we always need to interpret data in the context of models, including mental and formal ones. Data does not speak but through models.

In this book, we will assume that we already have collected the data. Our data will also be clean and tidy, something rarely true in the real world. We will make these assumptions in order to focus on the subject of this book. I just want to emphasize, especially for newcomers to data analysis, that even when not covered in this book, these are important skills that you should learn and practice in order to successfully work with data.

A very useful skill when analyzing data is knowing how to write code in a programming language, such as Python. Manipulating data is usually necessary given that we live in a messy worldwith even messier data, and coding helps to get things done. Even if you are lucky and your data isvery clean and tidy, coding will still be very useful since modern Bayesian statisticsis done mostly through programming languages such as Python or R.

If you want to learn how to use Python for cleaning and manipulating data, I recommend reading the excellent book, Python Data Science Handbook, by Jake VanderPlas.

Bayesian modeling

Models are simplified descriptions of a given system or process that, for some reason, we are interested in. Those descriptions are deliberately designed to capture only the most relevant aspects of the system and not to explain every minor detail. This is one reason a more complex model is not always a better one.

There are many different kinds of models; in this book, we will restrict ourselves to Bayesian models. We can summarize the Bayesian modeling process using three steps:

Given some data and some assumptions on how this data could have been generated, we design a model by combining building blocks known as

probability distributions

. Most of the time these models are crude approximations, but most of the time is all we need.

We use Bayes' theorem to add data to our models and derive the logical consequences of combining the data and our assumptions. We say we are conditioning the model on our data.

We criticize the model by checking whether the model makes sense according to different criteria, including the data, our expertise on the subject, and sometimes by comparing several models.

In general, we will find ourselves performing these three steps in an iterative non-linear fashion. We will retrace our steps at any given point: maybe we made a silly coding mistake, or we found a way to change the model and improve it, or we realized that we need to add more data or collect a different kind of data.

Bayesian models are also known as probabilistic models because they are built using probabilities. Why probabilities? Because probabilities are the correct mathematical tool to model uncertainty, so let's take a walk through the garden of forking paths.

Probability theory

The title of this section may be a little bit pretentious as we are not going to learn probability theory in just a few pages, but that is not my intention. I just want to introduce a few general and important concepts that are necessary to better understand Bayesian methods, and that should be enough for understanding the rest of the book. If necessary, we will expand or introduce new probability-related concepts as we need them. For a detailed study of probability theory, I highly recommend the book, Introduction to Probability by Joseph K Blitzstein and Jessica Hwang. Another useful book could be Mathematical Theory of Bayesian Statistics by Sumio Watanabe, as the title says, the book is more Bayesian-oriented than the first, and also heavier on the mathematical side.

Interpreting probabilities

While probability theory is a mature and well-established branch of mathematics, there is more than one interpretation of probability. From a Bayesian perspective, a probability is a measure that quantifies the uncertainty level of a statement. Under this definition of probability, it is totally valid and natural to ask about the probability of life on Mars, the probability of the mass of the electron being 9.1 x 10-31 kg, or the probability of the 9th of July of 1816 being a sunny day in Buenos Aires. Notice, for example, that life on Mars exists or does not exist; the outcome is binary, a yes-no question. But given that we are not sure about that fact, a sensible course of action is trying to find out how likely life on Mars is. Since this definition of probability is related to our epistemic state of mind, people often call it the subjective definition of probability. But notice that any scientific-minded person will not use their personal beliefs, or the informationprovided by an angel to answer such a question, instead they will use all the relevant geophysical data about Mars, and all the relevant biochemical knowledge about necessary conditions for life, and so on. Therefore, Bayesian probabilities, and by extension Bayesian statistics, is as subjective (or objective) as any other well-established scientific method we have.

If we do not have information about a problem, it is reasonable to state that every possible event is equally likely, formally this is equivalent to assigning the same probability to every possible event. In the absence of information, our uncertainty is maximum. If we know instead that some events are more likely, then this can be formally represented by assigning higher probability to those events and less to the others. Notice than when we talk about events in stats-speak, we are not restricting ourselves to things that can happen, such as an asteroid crashing into Earth or my auntie's 60th birthday party; an event is just any of the possible values (or subset of values) a variable can take, such as the event that you are older than 30, or the price of a Sachertorte, or the number of bikes sold last year around the world.

The concept of probability it is also related to the subject of logic. Under Aristotelian or classical logic, we can only have statements that take the values of true or false. Under the Bayesian definition of probability, certainty is just a special case: a true statement has a probability of 1, a false statement has probability of 0. We would assign a probability of 1 to the statement, There is Martian life, only after having conclusive data indicating something is growing, and reproducing, and doing other activities we associate with living organisms. Notice, however, that assigning a probability of 0 is harder because we can always think that there is some Martian spot that is unexplored, or that we have made mistakes with some experiment, or several other reasons that could lead us to falsely believe life is absent on Mars even when it is not. Related to this point is Cromwell's rule, stating that we should reserve the use of the prior probabilities of 0 or 1 to logically true or false statements. Interestingly enough, Richard Cox mathematically proved that if we want to extend logic to include uncertainty, we must use probabilities and probability theory. Bayes' theorem is just a logical consequence of the rules of probability, as we will see soon. Hence, another way of thinking about Bayesian statistics is as an extension of logic when dealing with uncertainty, something that clearly has nothing to do with subjective reasoning in the pejorative sense—people often used the term subjective.

To summarize, using probability to model uncertainty is not necessarily related to the debate about whether nature is deterministic or random at is most fundamental level, nor is related to subjective personal beliefs. Instead, it is a purely methodological approach to model uncertainty. We recognize most phenomena are difficult to grasp because we generally have to deal with incomplete and/or noisy data, we are intrinsically limited by our evolution-sculpted primate brain, or any other sound reason you could add. As a consequence, we use a modeling approach that explicitly takes uncertainty into account.

From a practical point of view, the most relevant piece of information from this section is that Bayesian's use probabilities as a tool to quantify uncertainty.

Now that we've discussed the Bayesian interpretation of probability, let's learn about a few of the mathematical properties of probabilities.

Defining probabilities

Probabilities are numbers in the [0, 1] interval, that is, numbers between 0 and 1, including both extremes. Probabilities follow a few rules; one of these rules is the product rule:

We read this as follows: the probability of and is equal to the probability of given , times the probability of . The expression represents the joint probability of and . The expression is used to indicate a conditional probability; the name refers to the fact that the probability of is conditioned on knowing . For example, the probability that the pavement is wet is different from the probability that the pavement is wet if we know (or given that) it's raining. A conditional probability can be larger than, smaller than, or equal to the unconditioned probability. If knowing does not provides us with information about , then . This will be true only if and are independent of each other. On the contrary, if knowing gives us useful information about , then the conditional probability could be larger or smaller than the unconditional probability, depending on whether knowing makes less or more likely. Let's see a simple example using a fair six-sided die. What is the probability of getting the number 3 if we roll the die, ? This is 1/6 since each of the six numbers has the same chance for a fair six-sided die. And what is the probability of getting the number 3 given that we have obtained an odd number, ? This is 1/3, because if we know we have an odd number, the only possible numbers are and each of them has the same chance. Finally, what is the probability of ? This is 0, because if we know the number is even, then the only possible ones are , and thus getting a 3 is not possible.

As we can see from this simple example, by conditioning on observed data we effectively change the probability of events, and with it, the uncertainty we have about them. Conditional probabilities are the heart of statistics, irrespective of your problem being rolling dice or building self-driving cars.

Bayes' theorem

Now that we have learned some of the basic concepts and jargon from probability theory, we can move to the moment everyone was waiting for. Without further ado, let's contemplate, in all its majesty, Bayes' theorem:

Well, it's not that impressive, is it? It looks like an elementary school formula, and yet, paraphrasing Richard Feynman, this is all you need to know about Bayesian statistics.

Learning where Bayes' theorem comes from will help us to understand its meaning.

According to the product rule we have:

This can also be written as:

Given that the terms on the left are equal for equations 1.5 and 1.6, we can combine them and write:

And if we reorder 1.7, we get expression 1.4, thus is Bayes' theorem.

Now, let's see what formula 1.4 implies and why it is important. First, it says that is not necessarily the same as . This is a very important fact, one that is easy to miss in daily situations even for people trained in statistics and probability. Let's use a simple example to clarify why these quantities are not necessarily the same. The probability of a person being the Pope given that this person is Argentinian is not the same as the probability of being Argentinian given that this person is the Pope. As there are around 44,000,000 Argentinians alive and a single one of them is the current Pope, we have and we also have .

If we replace with hypothesis and with data, Bayes' theorem tells us how to compute the probability of a hypothesis, , given the data, , and that's the way you will find Bayes' theorem explained in a lot of places. But, how do we turn a hypothesis into something that we can put inside Bayes' theorem? Well, we do it by using probability distributions. So, in general, our hypothesis is a hypothesis in a very, very, very narrow sense; we will be more precise if we talk about finding a suitable value for parameters in our models, that is, parameters of probability distributions. By the way, don't try to set to statements such as unicorns are real, unless you are willing to build a realistic probabilistic model of unicorn existence!

Bayes' theorem is central to Bayesian statistics, as we will see in Chapter 2, Programming Probabilistically using tool such as PyMC3 free ourselves of the need to explicitly write Bayes' theorem every time we build a Bayesian model. Nevertheless, it is important to know the name of its parts because we will constantly refer to them and it is important to understand what each part means because this will help us to conceptualize models, so let's do it:

: Prior

: Likelihood

: Posterior

: Marginal likelihood

The prior distribution should reflect what we know about the value of the parameter before seeing the data, . If we know nothing, like Jon Snow, we could use flat priors that do not convey too much information. In general, we can do better than flat priors, as we will learn in this book. The use of priors is why some people still talk about Bayesian statistics as subjective, even when priors are just another assumption that we made when modeling and hence are just as subjective (or objective) as any other assumption, such as likelihoods.

The likelihood is how we will introduce data in our analysis. It is an expression of the plausibility of the data given the parameters. In some texts, you will find people call this term sampling model, statisticalmodel, or just model. We will stick to the name likelihood and we will call the combination of priors and likelihood model.

The posterior distribution is the result of the Bayesian analysis and reflects all that we know about a problem (given our data and model). The posterior is a probability distribution for the parameters in our model and not a single value. This distribution is a balance between the prior and the likelihood. There is a well-known joke: A Bayesian is one who, vaguely expecting a horse, and catching a glimpse of a donkey, strongly believes he has seen a mule. One excellent way to kill the mood after hearing this joke is to explain that if the likelihood and priors are both vague, you will get a posterior reflecting vague beliefs about seeing a mule rather than strong ones. Anyway, I like the joke, and I like how it captures the idea of a posterior being somehow a compromise between prior and likelihood. Conceptually, we can think of the posterior as the updated prior in the light of (new) data. In fact, the posterior from one analysis can be used as the prior for a new analysis. This makes Bayesian analysis particularly suitable for analyzing data that becomes available in sequential order. Some examples could be an early warning system for natural disasters that processes online data coming from meteorological stations and satellites. For more details, read about online machine learning methods.

The last term is the marginal likelihood, also known as evidence