In advance, I will proceed in the extension of the previous post. I will use the same target distribution function and the similar Gaussian disposal distribution. Even Python script will be better understood if you’ve already read the previous post about importance sampling.

The rejection sampling could be the most familiar Monte Carlo sampling. When need to introduce Monte Carlo method to somebody, it is very intuitive and effective to give an example of computing the area of the circle (or anything) by using random samples. Haha, maybe I had to post about the rejection sampling first before Metropolis-Hastings or importance sampling.

Anyway, that computing area skill is kind of simplified rejection sampling. See the next figure.

Randomly make samples in the box, and if it is above the plot, reject the samples. Red dots are rejected. Then, the area under the plot is

(number of blue dots)/ (number of red dots + number of blue dots) * (Area of box)

Next Python example is finding a target distribution by using similar approach.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import scipy.stats as stats
%matplotlib inline
%precision 8

plt.style.use('ggplot')

def P_star(x):
return np.exp(0.4*(x-0.4)**2-0.08*x**4)

def Q_star(x):
return 2.5*np.exp(-x**2/4.0)

x = np.linspace(-5, 5)

plt.figure(figsize=(12,4))
plt.plot(x, P_star(x), label='P*')
plt.plot(x, Q_star(x), label='Q*')
plt.plot(x, 4*Q_star(x), label ='cQ*')
plt.legend(loc ='best')


$P^* (x)$ and $Q^* (x)$ are same as the previous post. However, we will multiply $c = 4$ with $Q^* (x)$ to cover the target distribution in the whole range. Why? The $cQ^* (x)$ plays the same role the box did in the previous colorful figure. This will become clear in the following examples.

x = np.linspace(-5, 5)

max = P_star(-1.75233)

plt.figure(figsize=(12,4))
plt.subplot(121)
plt.plot(x, P_star(x))
plt.axhline(max, color='grey')

x_arrow = -1.0

plt.arrow(x_arrow,0,0,P_star(x_arrow)-0.1, linewidth=1,

plt.arrow(x_arrow,max,0,-(max-P_star(x_arrow)-0.1), linewidth=1,

plt.text(x_arrow+.25, 2, 'Reject', fontsize=16)
plt.text(x_arrow+.25, 0.5, 'Accept', fontsize=16)

plt.axis([-5,5,0,4])

plt.title('Rejection sampling concepts', fontsize=20)

plt.subplot(122)

n = 100000

x_samples = np.random.uniform(-5, 5, n)

independent_samples = np.random.uniform(0, max, n)

v = x_samples[independent_samples < P_star(x_samples)]

plt.plot(x, P_star(x), linewidth=2)

factor = 7.85218 # Integrated value of P_star(x) from -5 to 5.

hist, bin_edges = np.histogram(v, bins=100, normed=True)

bin_centers = (bin_edges[:-1] + bin_edges[1:]) / 2.

plt.step(bin_centers, factor*hist, linewidth=2)

plt.axis([-5,5,0,4])

plt.title('Histogram of accepted samples', fontsize=20);


Here, the grey line gives the size of the box as the above red/blue figures had. We randomly draw samples below the grey line (or from the square), and accept only if the sample is under the plot.

v = x_samples[independent_samples < P_star(x_samples)]


This line is the part of the accept and the reject. It is exactly the same as how we calculated the integrated area of the plot.

The right figure is just a histogram from the accepted samples under the target distribution. It represents the target distribution up to constant, and I rescaled it to fit the distribution, $P^* (x)$.

This is a special case of an rejection sampling method. The independent box sampler, which drew the sample below the grey line, is not performing very well. In other words, too many samples are rejected. If we choose the disposal distribution sampler, $cQ^* (x)$, close to the target distribution, $P^* (x)$, the performance dramatically improves.

## General rejection sampling.

In the following example, I would not choose that kind of better distribution. To be honest, it is worse sampler. However, by choosing a Gaussian disposal distribution, we can understand the rejection sampling.

x = np.linspace(-5, 5)

plt.figure(figsize=(12,4))
plt.subplot(121)
plt.plot(x, P_star(x))
plt.plot(x, 4*Q_star(x))

x_arrow = -1.0

plt.arrow(x_arrow,0,0,P_star(x_arrow)-0.1, linewidth=1,

plt.arrow(x_arrow,4*Q_star(-1),0,-(4*Q_star(-1)-P_star(x_arrow)-0.1), linewidth=1,

plt.text(x_arrow+.25, 6, 'Reject', fontsize=16)
plt.text(x_arrow+.25, 0.5, 'Accept', fontsize=16)

plt.axis([-5,5,0,10])

plt.title('Rejection sampling concepts', fontsize=20)

plt.subplot(122)

def cQ_samples(N):
sample = []
x = np.random.uniform(-5,5,N)
for i in range(N):
if np.random.uniform(0,10) <= 4.0*Q_star(x[i]) :
sample = sample +[x[i]]
return len(sample), sample

def accept_samples(N):
s = cQ_samples(N)[1]
sample = []
for i in range(len(s)):
if np.random.uniform(0,4.0*Q_star(s[i])) <= P_star(s[i]) :
sample = sample +[s[i]]
return len(sample), sample

plt.plot(x, P_star(x), linewidth=2)

N = 100000

factor = 7.85218 # Integrated value of P_star(x) from -5 to 5.

hist, bin_edges = np.histogram(accept_samples(N)[1], bins=100, normed=True)

bin_centers = (bin_edges[:-1] + bin_edges[1:]) / 2.0

plt.step(bin_centers, factor*hist, linewidth=2)

plt.axis([-5,5,0,10])

plt.title('Histogram of accepted samples', fontsize=20);


Do you see how different the script is from the previous trivial case? In the above Python script, we draw samples under $cQ^* (x)$ first, and then among the selected samples, we re-draw the samples under the target distribution, $P^* (x)$.

Why do I make my problem more complicated? As I mentioned, we should try to choose better $cQ^* (x)$. Gaussian distribution usually has higher peak than other distribution, so it becomes so bad here to cover the elephant-shaped target distribution. I chose it because it is the most famous probability distribution many people knows. However, you can choose the better one for yourselves.

In the realistic and practical problem, we usually face the situation where we need to choose the disposal distribution because sometimes we do know only little about the target density. Then, we need to use some physical intuition to choose the kind of disposal distribution and to choose the constant $c$.

This mechanism of a rejection sampling act as fatal flaw in higher dimensional systems. If it is N multi-dimensional problem to solve, the constant will be $c^N = \displaystyle \prod_{i = 1}^N c_i$. For example, all the $c_i$ are identically 1.2, which is not so bad, it becomes bigger than 6 in 10 multi-dimensional system.