The code that I used to generate the images in this article can be found on my github.
Posterior Predictive
Given a posterior distribution $p(\theta | y)$, the posterior predictive distribution is defined as
$p(\tilde{y} | y) = \int p(\tilde{y} | \theta) p(\theta | y) d\theta$
and it represents the distribution of a new observation $\tilde{y}$ given your updated information about a parameter $\theta$ and natural variation around the observation that arises from the data-generating process.
Since applied Bayesian techniques have tended towards fully computational MCMC procedures, the posterior predictive is usually obtained through simulation - let's say you have a sample $\theta_j^*$ ($j = 1,2,...,k$) from the posterior distribution of $\theta$ and you want to know about a new observation from some process that uses the parameter $\theta$.
$\tilde{y} \sim p(\tilde{y} | \theta)$
To obtain the posterior predictive, you would simulate a set of observation $\tilde{y}^*_j$ from $p(\tilde{y} | \theta_j^*)$ (in other words, simulate a new observation from the data model for each draw from the MCMC for your parameter). The distribution of these $\tilde{y}^*_j$ approximates the posterior predictive distribution for $\tilde{y}$.
In general, this process is very specific to the problem at hand. It is possible in a few common scenarios, however, to calculate the posterior predictive distribution analytically. One example that is useful in baseball analysis the beta-binomial model.
Beta-Binomial Example
Let's say a batter obtains a 0.300 OBP in 250 PA - that corresponds to 75 on-base events and 175 not on-base events. What can you say about the distribution of on-base events in a new set of 250 PA?
Suppose that the distribution of on-base events is given by a binomial distribution with $n = 250$ and chance of getting on base $\theta$, which is the same in both sets of PA.
$p(y | \theta) \sim Binomial(250, \theta)$
For the prior distribution, let's suppose that a $Beta(1,1)$ distribution was used - this is a uniform distribution between zero and one - so any possible value for $\theta$ is equally likely. Since the beta and binomial are conjugate distributions, the posterior distribution of $\theta$ (the batter's chance of getting on base) is also a beta distribution:
$p(\theta| y = 75) = \dfrac{\theta^{75+1-1}(1-\theta)^{175+1-1}}{\beta(75+1,175+1)} \sim Beta(76,176)$
Now, suppose we are planning to observe another 250 PA for the same batter, and we want to know the distribution of on-base events $\tilde{y}$ in the new 250 PA. This distribution is also binomial
$p(\tilde{y} | \theta) = \displaystyle {250 \choose \tilde{y}} \theta^{\tilde{y}}(1-\theta)^{250-\tilde{y}}$
The posterior predictive distribution for the number of on-base events in another 250 PA is then obtained by multiplying the two densities and integrating out $\theta$.
$p(\tilde{y} | y = 75) = \displaystyle \int_0^1 {250 \choose \tilde{y}} \theta^{\tilde{y}}(1-\theta)^{250-\tilde{y}} * \dfrac{\theta^{75}(1-\theta)^{175}}{\beta(76,176)} d\theta$
The resulting distribution is known as beta-binomial distribution, which has density
$p(\tilde{y} | y = 75) =\displaystyle {250 \choose \tilde{y}} \dfrac{\beta(76 + \tilde{y}, 426-\tilde{y})}{\beta(76,176)}$
(The beta-binomial distribution is obtained from the beta-binomial model - it does get a bit confusing, but they are different things - the beta-binomial model can be thought of as a binomial with extra variance)
Now I can use the posterior predictive for inference. If, for example, I wanted to know the probability that a player will have a 0.300 OBP in another 250 PA (corresponding, again, to $\tilde{y}$ = 75 on-base events) then I can calculate that as
$p(\tilde{y} = 75 | y = 75) = \displaystyle {250 \choose 75} \dfrac{\beta(76 + 75, 426-75)}{\beta(76,176)} \approx 0.0389$
That is, our updated information says there's a 3.89% chance of getting exactly a 0.300 OBP in a new 250 PA by the same player.
The actual distribution of OBP in a new 250 PA is given by
This can also be accomplished by simulation - first, by simulating a large number $k$ of $\theta^*$ values from the posterior
$\theta^*_j \sim Beta(76,176)$
And then using those $\theta^*$ values to simulate from the data model $p(\tilde{y} | \theta^*)$
$y^*_j \sim Binomial(250, \theta^*_j)$
The estimated probability of a 0.300 OBP (equivalent to 75 on-base events) is then
$P(0.300 | y = 75) \approx \displaystyle \dfrac{\textrm{# } y^*_j \textrm{ that equal 75}}{k} $
This is much easier to do in $R$ -with $k = 1000000$, the code to quickly perform this is
> theta <- rbeta(1000000, 76,176) #Simulate from Posterior
> y <- rbinom(1000000, 250, theta) #Simulate from data model
> mean(y == 75) #Estimate P(y = 75)
[1] 0.039077
Notice that the result is very close to the analytic answer of 3.89%. The simulated posterior predictive distribution for OBP is
Which visually looks very similar to the analytic version.
What's the Use?
Aside from doing (as the name implies) prediction, the posterior predictive is very useful for model checking - if data simulated from the posterior predictive (new data) is similar to the data you used to fit the model (old data), that is evidence that the model is a good fit. Conversely, if your posterior predictive data looks nothing like your original data set, you may have misspecified your model (and I'm criminally oversimplifying here - I recommend Bayesian Data Analysis by Gelman et al. for all the detail on model-checking using the posterior predictive distribution).