Importance sampling is the first sampling method I faced when I studied Monte Carlo method. Nevertheless, I haven’t seen many examples for the importance sampling. Maybe it is because the importance sampling is not effective for high dimensional systems. The weak point of the importance sampling is that the performance of it is determined by how well we choose the disposal distribution close to the target distribution.

Here, I will present a simple example of the importance sampling.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import scipy.stats as stats
%matplotlib inline'ggplot')
def P_star(x):
    return np.exp(0.4*(x-0.4)**2-0.08*x**4)

def Q_star(x):
    return 2.5*np.exp(-x**2/4.0)

def phi(x):
    return 4*np.sin(x)

x = np.linspace(-5, 5)

plt.plot(x, P_star(x), label='target')
plt.plot(x, Q_star(x), label='disposal')
plt.plot(x, phi(x), label ='phi(x)')
plt.legend(loc ='best')


The target distribution up to normal constant is given. Actually, we can numerically compute the normal constant, but we will assume that we do not know it. I set the disposal distribution, $Q^* (x)$ as a Guassian distribution up to normal constant. The convergence of the importance sampling method is faster as $Q^* (x)$ is chosen close to the target distribution, $P^* (x)$. For some specific $x$, the proportion, $P^* (x) /Q^* (x)$ tells us how important the sample is. It is called “weight”.

Our goal is to evaluate the expectation value of the function, $ \phi(x)$.

$$ < \phi(x) > = \int_{R} \phi(x)~ P(x) ~dx $$

The expectation value of $\phi(x)$ in the probability density, $P(x)$ is about -1.3186311907073391. I calculated it numerically by using Mathematica. Not so difficult to compute the normal constant for the target and disposal density, $Z$, $Z_Q$. We will obtain the expectation value by using the importance method and compare with this value.

The weight from r-th sample, $$ w_r \equiv {P^* (x) \over Q^* (x)} $$ and then, the expectation value from the weight,

$$ <\phi(x)> \equiv { \sum_r w_r~\phi(x^{®}) \over \sum_r w_r} $$

prec = -1.3186311907073391

We draw the samples from the disposal density distribution.

max = P_star(-1.75233)
def Q_samples(N):
    sample = []
    steps = (5.0-(-5.0))/N
    x = np.random.uniform(-5,5,N)
    for i in range(N):
        if np.random.uniform(0,max)< Q_star(x[i]):
            sample = sample +[x[i]]
    return len(sample), sample
def weight(N):
    n,s = Q_samples(N)
    w = np.zeros(n)
    for i in range(n):
        w[i] = P_star(s[i])/Q_star(s[i])
    return w

Among 1000 initial samples, 293 samples are selected by the disposal distribution.


def expectation_value(N):
    n,s = Q_samples(N)
    w = np.zeros(n)
    for i in range(n):
        w[i] = P_star(s[i])/Q_star(s[i])

    return sum(phi(s)*w)/sum(w)
err = prec - (-1.2879605902572107)

See the error is quite small compare to the expectation. Less than 3 %.

When N is bigger than $O(10^5)$, the processing time gets much longer. I want to avoid it, and also would like to make x-axis log scaled. This choice will give us a good converging plot.


The base 10 logarithm seems too small, so I chose binary logarithm.

exp_List = []
for i in range(3,18):
    exp_List = exp_List + [expectation_value(2**i)]


Nicely converges. This expectation value becomes very good near N = $2^{12} \sim 4018$. Want to see closer?

plt.plot(range(11,18),exp_List[8:], 'o-')


You might try to compute the expectation value with a bad disposal distribution and see what happens.