Jonathan Olmsted — written Oct 23, 2014 — source
The purpose of this gallery post is several fold:
sample()
function (see here)The application in this post uses an example from Jackman’s Bayesian Analysis for the Social Sciences (page 72) which now has a 30-year history in the Political Science (See Jackman for more references). The focus is on the extent to which the probability of revolution varies with facing a foreign threat or not. Facing a foreign threat is measured by “defeated …” or “not defeated …” over a span of 20 years. The countries come from in Latin America. During this period of time, there are only three revolutions: Bolivia (1952), Mexico (1910), and Nicaragua (1979).
Revolution | No Revolution | |
---|---|---|
Defeated and invaded or lost territory | 1 | 7 |
Not defeated for 20 years | 2 | 74 |
The goal is to learn about the true, unobservable probabilities of revolution given a recent defeat or the absence of one. That is, we care about
\[\theta_1 = \Pr (\text{revolution} ~ \vert ~ \text{defeat})\]and
\[\theta_2 = \Pr (\text{revolution} ~ \vert ~ \text{no defeat}).\]And, beyond that, we care about whether \(\theta_1\) and \(\theta_2\) differ.
These data are assumed to arise from a Binomial process, where the likelihood of the probability parameter value, \(\theta\), is
\[L(\theta; n, k) = {n \choose k} \theta^k (1 - \theta)^{n - k},\]where \(n\) is the total number of revolutions and non-revolutions and \(k\) is the number of revolutions. The MLE for this model is just the sample proportion, so a Frequentist statistician would be wondering whether \(\hat{\theta}_1 = \frac{1}{1 + 7} = .125\) was sufficiently larger than \(\hat{\theta}_2 = \frac{2}{2 + 74} = .026\) to be unlikely to have happened by chance alone (given the null hypothesis that the two proportions were identical).
A Bayesian statistician could approach the question a bit more directly and compute the probability that \(\theta_1 - \theta_2 > 0.\) To do this, we first need samples from the posterior distribution of \(\theta_1\) and \(\theta_2\). In this post, we will get these samples via Sampling Importance Resampling.
Sampling Importance Resampling allows us to sample from the posterior distribution, \(p(\theta | \text{data})\) where
\[p(\theta | \text{data}) \propto L(\theta; \text{data}) \times p(\theta)\]by resampling from a series of draws from the prior, \(p(\theta)\). Denote one of those \(n\) draws from the prior distribution, \(p(\theta)\), as \(\theta_i\). Then draw \(i\) from the prior sample is drawn with replacement into the posterior sample with probability
\[q_i = \frac{L(\theta_i; \text{data})}{ \sum_{j=1}^{n} L(\theta_j; \text{data})}\]We begin by drawing many samples from a series of prior distributions. Although using a prior Beta prior distribution on the \(\theta\) parameter admits a closed-form solution, the point here is to demonstrate a simulation based approach. On the other hand, a Gamma prior distribution over \(\theta\) is very much not conjugate and simulation is the best approach.
In particular, we will consider our posterior beliefs about the different in probabilities under five different prior distributions.
id dist par1 par2 1 1 beta 1.0 1.0 2 2 beta 1.0 5.0 3 3 gamma 3.0 20.0 4 4 beta 10.0 10.0 5 5 beta 0.5 0.5
Using the data frame dfPriorInfo
and the plyr
package, we will
draw a total of 20,000 values from each of the prior
distributions. This can be done in any number of ways and is
completely independent of using Rcpp for the SIR magic.
However, we can confirm that our draws are as we expect and that we have the right number of them (5 * 20k = 100k).
id draws 1 1 0.7124225 2 1 0.5910231 3 1 0.0595327 4 1 0.4718945 5 1 0.4485650 6 1 0.0431667
[1] 100000 2
Now, we write a C++ snippet that will create our R-level function to
generate a sample of D
values from the prior draws (prdraws
) given
their likelihood after the data (i.e., number of success – nsucc
,
number of failures – nfail
).
The most important feature to mention here is the use of some new and
improved extensions which effectively provide an equivalent,
performant mirror of R’s sample()
function at the
C++-level. Note that you need the RcppArmadillo 0.4.500.0 or newer for this
version of sample()
.
The return value of this function is a length D
vector of draws from
the posterior distribution given the draws from the prior distribution
where the likelihood is used as a filtering weight.
To use the samplePost()
function, we create the R representation
of the data as follows.
As a simple example, consider drawing a posterior sample of size 30 for the “defeated case” from discrete prior distribution with equal weight on the \(\theta\) values of .125 (the MLE), .127, and .8. We see there is a mixture of .125 and .127 values, but no .8 values. \(\theta\) values of .8 were simply to unlikely (given the likelihood) to be resampled from the prior.
0.125 0.127 9 21
Again making use of the plyr package, we construct samples of size
20,000 for both \(\theta_1\) and \(\theta_2\) under each of the 5
prior distribution samples. These posterior draws are stored in the
data frame dfPost
.
id theta1 theta2 1 1 0.3067334 0.0130865 2 1 0.1421879 0.0420830 3 1 0.3218130 0.0634511 4 1 0.0739756 0.0363466 5 1 0.1065267 0.0460336 6 1 0.0961749 0.0440790
[1] 100000 3
Here, we are visualizing the posterior draws for the quantity of interest \((\theta_1 - \theta_2)\) — the difference in probabilities of revolution. These posterior draws are grouped according to the prior distribution used. A test of whether revolution is more likely given a foreign threat is operationalized by the probability that \(\theta_1 - \theta_2\) is positive. This probability for each distribution is shown in white. For all choices of the prior here, the probability that “foreign threat matters” exceeds .90.
The full posterior distribution of \(\theta_1 - \theta_2\) is shown for each of the five priors in blue. A solid, white vertical band indicates “no effect”. In all cases. the majority of the mass is clearly to the right of this band.
Recall that the priors are, themselves, over the individual revolution probabilities, \(\theta_1\) and \(\theta_2\). The general shape of each of these prior distributions of the \(\theta\) parameter is shown in a grey box by the white line. For example, \(Beta(1, 1)\) is actually a uniform distribution over the parameter space, \([0, 1]\). On the other hand, \(Beta(.5,.5)\) has most of its mass at the two tails.
At least across these specifications of the prior distributions on \(\theta\), the conclusion that “foreign threats matter” finds a good deal of support. What is interesting about this application is that despite these distributions over the difference in \(\theta\) probabilities, the p-value associated with Fisher’s Exact Test for 2 x 2 tables is just .262.
Tweet