Bayes’s rule has to do with the conditional probability of an event B *given* the occurrence of another event A. In mathematical terms, this conditional probability, P(B|A), is equal to P(B)*P(A|B)/P(A). Students in introductory statistics courses are often given exercises whose solutions require interpretation/application of this Bayesian algebra. The correct results are often counterintuitive.

Bayes’s rule becomes more interesting – and controversial – when applied to learning associated with scientific or research questions. There, events A and B become H (hypotheses) and E (evidence), respectively, so P(H|E) = P(H)*P(E|H)/P(E), or, in Bayesian words, the posterior probability = prior probability*likelihood /normalizing constant. In plainer English, this “diachronic” P(H|E) version of Bayes’s rule shows how learning occurs – how posterior probabilities are modified by new information.

According to The Economist: “The essence of the Bayesian approach is to provide a mathematical rule explaining how you should change your existing beliefs in the light of new evidence. In other words, it allows scientists to combine new data with their existing knowledge or expertise.” Nobel Laureate, Daniel Kahneman, author of “Thinking Fast and Slow,” only wishes humans were effective Bayesian thinkers. His experiments show how we often mishandle “priors” or base rates in our calculation of posteriors.

It hasn’t been a smooth ride for Bayesian analyses in the 250 years since the theorem was first published. For much of the 1900s especially, Bayesian statistics was rejected as too subjective by mainstream “frequentists” – progenitors of the statistical paradigm dominant today.

What’s the big disagreement? “The Bayesian and frequentist approaches differ fundamentally in their characterizations of probability. Frequentists see probability as the objectively measured, relative frequency of an outcome over a large number of trials. Bayesians, in contrast, view probability as a more subjective concept tied to an individual’s judgment of the likelihood of an outcome. For frequentists, the uncertainty surrounding probability is in the events; for Bayesians, the uncertainty has to do with interpretation by observers.”

Today, much to the benefit of business learning, Bayesian statistics is enjoying a re-vitalization in the statistical, science and research worlds, its continuous learning model well-suited to many analytics problems.

Alas, there’s a significant divide between the basic Bayesian concepts and real-world Bayesian methods: understanding the algebra and psychology of Bayes’s rule is one thing; implementing rigorous Bayesian models is quite another. Transition from the concepts to topics such as Markov Chain Monte Carlo (MCMC) methods, hierarchical models, and Bayesian inference Using Gibbs Sampling (BUGS), is anything but straightforward. Bayesian analysis seems to go from simple to complex with no intermediate steps.

Computer scientist and statistician Allen Downey, author of the brief but excellent O’Reilly book “Think Stats,” and a wonderful complementary two and a half hour YouTube lecture Bayesian statistics made (as) simple (as possible), to the rescue. Both the book and video provide “an introduction to Bayesian statistics using Python.” Students who work through the material will learn a good deal about both statistics and programming.

The thesis of this book is that if you know how to program, you can use that skill to help you understand probability and statistics. These topics are often presented from a mathematical perspective, and that approach works well for some people. But some important ideas in this area are hard to work with mathematically and relatively easy to approach computationally

*This blog originally appeared at information-management, written by Steve Miller*

## rimunozc

View all posts by rimunozc