Categories

A sample text widget

Etiam pulvinar consectetur dolor sed malesuada. Ut convallis euismod dolor nec pretium. Nunc ut tristique massa.

Nam sodales mi vitae dolor ullamcorper et vulputate enim accumsan. Morbi orci magna, tincidunt vitae molestie nec, molestie at mi. Nulla nulla lorem, suscipit in posuere in, interdum non magna.

Rejection sampling 101

Working with my friends Ed and Sean on some zero-truncated Poisson logNormal (ZTPLN) community ecology work, I’ve come to realize I’ve done far too little coding of random value generation from odd statistical distributions in the form of rejection sampling. As this forms a core part of MCMC, I thought it high time for me to code some rejection sampling algorithms in R. So this morning I delved into Gammerman and Lopes (2006) for a little nerd gospel.

The basic idea is to use a distribution that is easy to sample to generate value from one that is difficult to do so. The key requirement is that the easy distribution envelops the difficult distribution from which we want to generate a given sample. Formally *Enveloping* means that, for any given value $x$, easy distribution $f(x)$ is always $\geq g(x)$. Informally, this looks like

where the red (in this case easy) $f(x)$ distribution sits over top the black $g(x)$distribution we’re interested in sampling from. For this example I’m following Gammerman and Lopes’ use of an enveloping Cauchy distribution $f(x)$ to generate samples from a Normal distribution $g(x)$; of course, given a uniform random sampler, Normals are relatively easy to sample from, however the process is the same as for a more complex distribution.

The first step in setting setting up the rejection sampler is to write the expression (the probability density function or pdf) for the target density of interest $g(x)$, in this case for a Normal. The pdf for a Normal is given by


# Normal pdf
normd = function(x,mu,sigma){
1/(sqrt(2*pi)*sigma)*exp(-(x-mu)**2/(2*sigma**2))
}

which describes the familiar shape of a standard Normal ($\sim N(0,1)$) distribution.

The next step is to choose a class of enveloping distribution ($f(x)$) and parameters that will a) make $f(x)\gt g(x)$ for all possible values of $x$ and b) is hopefully close enough in shape to $g(x)$ so as to be an efficient algorithm. This process no doubt becomes easier with experience and there are adaptive rejection sampling algorithms to make things efficient, however we can also construct a suitable enveloping distribution through trial and error.

Starting with arbitrary values, let’s see how a Cauchy distribution with location$=0$ and scale$=2$ compares to our $g(x)~N(0,1)$:


# Plot densities of f(x) and g(x)
plot(seq(-5, 5, 0.01), dnorm(seq(-5, 5, 0.01)),type="l",xlab="",ylab="Density",ylim=c(0,0.5))
lines(seq(-5, 5, 0.01), dcauchy(seq(-5, 5, 0.01), location=0, scale = 2), col=2)
text(0,0.45,labels="N(0,1)")
text(2.3,0.4,labels=paste("Cauchy(0,",2,")",sep=""),col=2)

This is certainly non-enveloping so maybe if try multiplying $Cauchy(0,2)$ by some constant it will cover $g(x)$ for all values of $x$. These *enveloping constants* (here called $A$) are a key part of rejection sampling, and are the core part of the tuning process in adaptive rejection sampling algorithms. So if we try multiplying $Cauchy(0,2)$ by $A=3$ it seems to do the trick:


# Plot g(x) and modified f(x)
plot(seq(-5, 5, 0.01), dnorm(seq(-5, 5, 0.01)),type="l",xlab="",ylab="Density",ylim=c(0,0.5))
lines(seq(-5, 5, 0.01), dcauchy(seq(-5, 5, 0.01), location=0, scale = 2)*3, col=2)
text(0,0.45,labels="N(0,1)")
text(2.3,0.4,labels=paste("Cauchy(0,",2,")*3",sep=""),col=2)

Now that we have a suitable (though perhaps not optimal) enveloping distribution, it is time for the rejection sampling step. This incredibly nifty trick works by first drawing a random value, say $u$, that is between zero and one from uniform ($U(0,1)$) distribution. Next a random value for $x$ is drawn from the enveloping distribution $f(x)$. Finally there is an acceptance/rejection step that asks: if the probability $u$ is less than the density of $g(x)$ at $x$ divided by the density of $f(x)*A$, then we’ll accept $x$ as having been drawn from $g(x)$; if not, we’ll sample another value for $x$ from $f(x)$ and try again.

In R code this becomes:


# Rejection sampler
rejsamp = function(A){
while(1){
# Draw single value from between zero and one
u = runif(1)
# Draw candidate value for x from the enveloping distribution
x = rcauchy(1, location=0, scale = 2)
# Accept or reject candidate value; if rejected try again
if(u < normd(x,0,1)/A/dcauchy(x, location=0, scale = Cscale)) return(x) } }

Note the use of R functions runif to draw random probabilities; rcauchy to draw random values from $f(x)$; and dcauchy to provide the density of $f(x)$ at $x$. The d$name$ syntax is used for canned pdfs already programmed into R.

So, to run this rejection sampler we can use the replicate function, which will apply a given function as many times as specified and return the results in an array. So to generate 10k samples with $A=3$:


sampx = replicate(10000, rejsamp(3))

and having a look:


hist(sampx, freq=FALSE, ylim=c(0,0.5), breaks=100, main="N(0,1) by rejection sampling",xlab="")
lines(seq(-5, 5, 0.01), dnorm(seq(-5, 5, 0.01)))
lines(seq(-5, 5, 0.01), dcauchy(seq(-5, 5, 0.01), location=0, scale = 2)*3, col=2)
text(2.3,0.4,labels="Cauchy(0,2)*3",col=2)

Et voila! We have sampled from $g(x)$!

Leave a Reply