Post-Doctoral Fellow – Advances to State-Space Models for Fisheries Science

Applications are invited for a CANSSI/Fields post-doctoral position in connection with a CANSSI Collaborative Research Team project to develop robust and statistically rigorous state-space models (SSM) methodologies for fisheries science with Joanna Mills Flemming and Chris Field (both at Dalhousie University). The research will be motivated by ongoing discussions with fisheries collaborators aimed at addressing pressing fisheries issues in Canada. Goals include:

- Building robust statistical methodologies for SSMs used in fisheries science and management that allow for both errors and misspecification in the observation process and in the state/dynamic process.

- Developing relevant model diagnostics that can be used to evaluate model goodness-of-fit for both the observation and state process. In practice, stock assessment scientists and fisheries managers need to know if their advice is sensitive to reasonable alternative assumptions.

The successful applicant should be familiar with SSMs, possess excellent computational skills and be well organized. Interested applicants should send their CV and Letter of interest as well as arrange to have two letters of reference sent to Joanna Mills Flemming (Joanna.Flemming – [at] – Dal.Ca) before April 1, 2014. The position is for two years, contingent upon funding, with start date to be determined.

Correction to Hussey et al. 2014

In a frustrating development, my co-author Nigel Hussey reported problems from other researchers in replicating our trophic position calculations from our recent Ecology Letters paper. In it, trophic position is, regrettably (and completely my fault) missing a +TPbase (trophic position of the baseline organism) at the end. The correct equation is:


My apologies to Nigel and my co-authors for not catching this in the proofs.

Ecology Letters Volume 17, Issue 2, pages 239–250, February 2014

Correction to JNAFS 30: 77-82.

For those following the latest in Von Bertalanffy growth curve fitting for sharks, my colleague Steve Campana discovered this week that our 2002 paper looking at age and growth in blue sharks contained an error in the reported growth curve parameters. The Von B curve has a couple of forms (for a recent review on best practices see Pardo et al. (2013) MEE 4:353-360) but in our 2002 paper we used the most standard:

$$ L_{\inf}*(1-e^{-k*(age-t_{0})} $$

In the paper we reported the three parameters as $L_{\inf}=302$, $k=0.58$, and $t_{0}=-0.24$. These are way off, and I’m at a loss as to how I estimated them so badly ten years ago. In any case, we have issued a correction, given revised Bayesian estimates $L_{\inf}=336 [319, 353]$, $k=0.15 [0.13, 0.17]$, and $t_{0}=-2.2 [-2.6, -1.8]$ (posterior medians and 95% uncertainty intervals). The data, new fits, and old estimates show just how far off the old ones were. NB: PLOT YOUR MODEL AND DATA PROPERLY!!

Pig smoothed

Intrinsic Conditional Autoregressive (CAR) model – Lip Cancer in PyMC

The autoregression continues this week with an intrinsic conditional autoregressive, or CAR model, in PyMC. There are many, many ways to introduce spatial effects into a modelling framework; due to the availability of cheap MCMC sampling, particularly in WinBUGS, CAR models have been widely used over the past decade to structure the spatial dependence or clustering of gridded data.

The canonical example is in estimating the log-relative risk of lip cancer among 56 districts in Scotland between 1975 and 1980; the original data come from (1, Kemp et al. 1985). Disease mapping is perhaps the oldest form of epidemiology, including the famous cholera map of Dr. John Snow that helped him figure out the bacteria was waterborne, rather than being from ‘bad air.’ In the Scottish case, there is some interest in how lip cancer rates relate to the proportion of the population engaged in agriculture, forestry, or fishing (AFF), as exposure to sunlight is a risk factor.

The data consist of observed (O) cases of lip cancer, and expected (E) rates, based on the age of the local population:

import numpy as np

county = np.array(["skye_lochalsh", "banff_buchan", "caithness,berwickshire", "ross_cromarty", "orkney", "moray", "shetland", "lochaber", "gordon", "western_isles", "sutherland", "nairn", "wigtown", "NE.fife", "kincardine", "badenoch", "ettrick", "inverness", "roxburgh", "angus", "aberdeen", "argyll_bute", "clydesdale", "kirkcaldy", "dunfermline", "nithsdale", "east_lothian", "perth_kinross", "west_lothian", "cumnock_doon", " stewartry", " midlothian", "stirling", "kyle_carrick", "inverclyde", "cunninghame", "monklands", "dumbarton", "clydebank", "renfrew", "falkirk", "clackmannan", "motherwell", "edinburgh", "kilmarnock", "east_kilbride", "hamilton", "glasgow", "dundee", "cumbernauld", "bearsden", "eastwood", "strathkelvin", "tweeddale", "annandale"])

O = np.array([9, 39, 11, 9, 15, 8, 26, 7, 6, 20, 13, 5, 3, 8, 17, 9, 2, 7, 9, 7, 16, 31, 11, 7, 19, 15, 7, 10, 16, 11, 5, 3, 7, 8, 11, 9, 11, 8, 6, 4, 10, 8, 2, 6, 19, 3, 2, 3, 28, 6, 1, 1, 1, 1, 0, 0])

N = len(O)

E = np.array([1.4, 8.7, 3.0, 2.5, 4.3, 2.4, 8.1, 2.3, 2.0, 6.6, 4.4, 1.8, 1.1, 3.3, 7.8, 4.6, 1.1, 4.2, 5.5, 4.4, 10.5, 22.7, 8.8, 5.6, 15.5, 12.5, 6.0, 9.0, 14.4, 10.2, 4.8, 2.9, 7.0, 8.5, 12.3, 10.1, 12.7, 9.4, 7.2, 5.3, 18.8, 15.8, 4.3, 14.6, 50.7, 8.2, 5.6, 9.3, 88.7, 19.6, 3.4, 3.6, 5.7, 7.0, 4.2, 1.8])

logE = np.log(E)

Which for the observed cases of lip cancer look like:

And for the expected cases, based on demographics:

So there was a big disparity between observed and expected rates of lip cancer in the late 1970’s. The exposure to sunlight factor may be responsible, or it may be spatially related to some unmeasured factor (in general space or time stand in for things we haven’t measured). The remaining data are for the CAR calculations outlined below.

aff = np.array([16, 16, 10, 24, 10, 24, 10, 7, 7, 16, 7, 16, 10, 24, 7, 16, 10, 7, 7, 10, 7, 16, 10, 7, 1, 1, 7, 7, 10, 10, 7, 24, 10, 7, 7, 0, 10, 1, 16, 0, 1, 16, 16, 0, 1, 7, 1, 1, 0, 1, 1, 0, 1, 1, 16, 10])/10.

num = np.array([4, 2, 2, 3, 5, 2, 5, 1,  6, 4, 4, 3, 4, 3, 3, 6, 6, 6 ,5, 3, 3, 2, 6, 8, 3, 4, 4, 4,11,  6, 7, 4, 4, 9, 5, 4, 5, 6, 5, 5, 7, 6, 4, 5, 4, 6, 6, 4, 9, 3, 4, 4, 4, 5, 5, 6])

adj = np.array([[5, 9,11,19],
[3, 8],
[2, 7,16,22],
[1, 5, 9,12],
[3, 5,11],
[5, 7,17,19],
[7, 9,13,16,19,29],
[1, 5, 9,13,17],

# Change to Python indexing (i.e. -1)
for i in range(len(adj)):
    for j in range(len(adj[i])):
        adj[i][j] = adj[i][j]-1

weights = np.array([[1,1,1,1],
[1, 1],
[1, 1,1,1],
[1, 1, 1,1],
[1, 1,1],
[1, 1,1,1],
[1, 1,1,1,1,1],
[1, 1, 1,1,1],

Wplus = array([sum(w) for w in weights])

This model has been analyzed in many places, but I’ll use the model specified on page 165 (2, Banerjee et al.), the WinBUGS code for which is available on Brad Carlin’s website. The data are counts so a Poisson model is a good starting point, with the linear model being used to explain deviations from the expected values in each district:

$$ O_{i}|\psi_{i}\sim P(E_{i}e^{\psi_{i}}) $$
$$ \psi_{i}=\beta_{0}+\beta_{1}AFF_{i}+\theta_{i}+\phi_{i} $$

Here the $\beta_{0}$ and $\beta_{1}$ are the intercept and slope for the relationship to proportion of the district engaged in outdoor occupations. The $\theta_{i}$’s are hierarchical, district level effects (despite having only one observation per district) that add extra Poisson variation:

$$ \theta_{i}\sim N(0,\tau_{h}) $$

given a common precision:

$$ \tau_{h}\sim \Gamma(3.2761,1.81) $$

The $\Gamma(3.2761,1.81)$ priors are given in Table 5.4 of (2, Banerjee et al.) as alternative priors to a more conventional $\Gamma(0.001,0.001)$ uninformative prior. The $\phi_{i}$’s are the spatial or clustering effects given by the CAR part of the model:

$$ \phi_{i}\sim CAR(\tau_{c}) $$

with precision

$$ \tau_{c}\sim \Gamma(1.0,1.0) $$

In WinBUGS then, this model is specified as (from Carlin’s website):

   for (i in 1 : regions) {
      O[i] ~ dpois(mu[i])
      log(mu[i]) <- log(E[i]) + beta0 + beta1*aff[i]/10 + phi[i] + theta[i]
      theta[i] ~ dnorm(0.0,tau.h)
   phi[1:regions] ~ car.normal(adj[], weights[], num[], tau.c)

   beta0 ~ dnorm(0.0, 1.0E-5)  # vague prior on grand intercept
   beta1 ~ dnorm(0.0, 1.0E-5)  # vague prior on covariate effect

   tau.h ~ dgamma(3.2761, 1.81)    
   tau.c ~ dgamma(1.0, 1.0)  

   sd.h <- sd(theta[]) # marginal SD of heterogeneity effects
   sd.c <- sd(phi[])   # marginal SD of clustering (spatial) effects

   alpha <- sd.c / (sd.h + sd.c)

The WinBUGS code has three extra bits – sd.h, sd.c, and alpha – posterior nodes that keep tabs on the marginal standard deviations of global heterogeneity and spatial variation, and on their relative contributions to overall variation.

Looking a little more closely at this model brings up two important points. The first is that $\theta_{i}$ and $\phi_{i}$ represent two random effects, global (hierarchical) and spatial, from a single datapoint in each district, making them unidentified. So how can this be overcome?? By specifying hyperpriors $\tau_{h}$ and $\tau_{c}$; however these must be chosen sensibly so as to not favour one component over the other. To do so is tricky, because while $\tau_{h}$ is a marginal precision, $\tau_{c}$ is conditional according to the CAR structure we chose. This raises the second point – what the hell does $\sim CAR(\tau_{c})$ mean? What does this ‘distribution’ look like and what is its structure? In the WinBUGS code for this model, all we get is:

phi[1:regions] ~ car.normal(adj[], weights[], num[], tau.c)

which tells us about as much as $\sim CAR(\tau_{c})$. Because (sadly) I understand code far better than algebra, to find out I delved into Appendix 1 of the GeoBUGS manual, to find out what the car.normal() model meant. It contained a few key sentences:

…the joint multivariate Gaussian model can be expressed in the form of a set of conditional distribuitons
$$S_{i}|S_{-i}\sim Normal(\mu_{i}+\Sigma_{j=1}^{N}\gamma C_{ij}(s_{j}-\mu_{i}), \phi M_{ii})$$

($S_{-i}$ means all elements but i)

Meaning that, rather than calculating the entire massive multivariate normal for the entire dataset (with associated big matrix inversions), the spatial component can be conditionally normal given a mean ($\mu_{i}$) weighted by the neighbouring data cells ($\Sigma_{j=1}^{N}\gamma C_{ij}(s_{j}-\mu_{i})$) and the conditional variance for cell i. Then the intrinsic CAR model (i.e. car.normal()) is given as

$$ S_{i}|S_{-i}\sim Normal(S.bar_{i}, v/n_{i}) $$

which means that, in the intrinsic case

Si has a normal distribution with conditional mean given by the average of the neighbouring Sj ‘s and conditional variance inversely proportional to the number of neighbours ni.

This makes a lot of sense – the value we observe at i is informed by the average of neighbouring values. To be sure, I delved into the very bowels of WinBUGS code – written in the obscure Component Pascal language – to find that the car.normal() distribution is exactly what the manual said:

PROCEDURE (node: Node) Sample (OUT res: SET);
        i, len: INTEGER;
        mu, tau, value, wPlus: REAL;
    len := LEN(node.neighs);
    i := 0;
    mu := 0.0;
    wPlus := 0.0;
    WHILE i < len DO
        mu := mu + node.neighs[i].value * node.weights[i];
        wPlus := wPlus + node.weights[i];
    mu := mu / wPlus;
    tau := node.tau.Value() * wPlus;
    value := MathRandnum.Normal(mu, tau);
    res := {}
END Sample;

This means the CAR sample is a Normally-distributed random variable, with the mean value for observation i, $\mu$, being the weighted (node.weights[i]) average of the values bordering that observation (node.neighs[i].value), given a precision (this is WinBUGS/PyMC remember) that is increased according to the weighted contribution of the number of neighbours present (wPlus). In this model the weights are all given as 1, indicating simply whether the other observations are neighbours or not.

Here then is the source of conditioning on $\tau_{c}$, the weighting scheme used and the number of observations neighbouring each observation. As a result, the prior used for $\tau_{c}$ above comes from (3, Bernardinelli et al.), who argue that the prior marginal standard deviation of $\phi_{i}$ is approximately equal to the prior conditional standard deviation divided by 0.7. As noted by (2, Banerjee et al.), any scale parameter for the prior $\Gamma$ that satisfies:

$$ sd(\theta_{i}) = \frac{1}{\sqrt{\tau_{h}}} \approx \frac{1}{0.7\sqrt{\bar{m}\tau_{h}}} \approx sd(\phi_{i}) $$

where $\bar{m}$ is the average number of neighbours should be a fair prior for each source of variation. Understanding this, all the elements are in place for a [PyMC] CAR model.

First the priors:

# Vague prior on intercept
beta0 = Normal('beta0', mu=0.0, tau=1.0e-5, value=0)
# Vague prior on covariate effect
beta1 = Normal('beta1', mu=0.0, tau=1.0e-5, value=0)

# Random effects (hierarchial) prior
tau_h = Gamma('tau_h', alpha=3.2761, beta=1.81, value=1)
# Spatial clustering prior
tau_c = Gamma('tau_c', alpha=1.0, beta=1.0, value=1)

and the random effects:

# Regional random effects
theta = Normal('theta', mu=0.0, tau=tau_h, value=np.zeros(N))

Next comes the CAR part, which is efficiently handled by a Python list comprehension within a decorator instantiation of stochastic node:

def mu_phi(tau=tau_c, value=np.zeros(N)):
    # Calculate mu based on average of neighbours 
    mu = np.array([ sum(weights[i]*value[adj[i]])/Wplus[i] for i in xrange(N)])
    # Scale precision to the number of neighbours
    taux = tau*Wplus
    return normal_like(value,mu,taux)

# Zero-centre phi
phi = Lambda('phi', lambda mu=mu_phi: mu-mean(mu))

This estimates each mu by summing the values of the adjacent observations (indexed by the ragged array adj) multiplied by the weights (the other ragged array, weights), and dividing by the total weight (Wplus, which because the weights are all 1 is simply the number of adjacent observations); the $\tau$ for each observation is the overall $tau_{c}$ is multipled by the total weight (number of observations). These values are then given a normal likelihood that is subsequently zero-centred at $\phi$. Finally the mean (linear) model and the Poisson likelihood:

# Mean model
mu = Lambda('mu', lambda b0=beta0,b1=beta1,theta=theta,phi=phi: exp(logE+b0+b1*aff+theta+phi) )

# Likelihood
Yi = Poisson('Yi', mu=mu, value=O, observed=T)

As for the WinBUGS model above, we will also keep tabs on the proportions of total variance for the random and spatial components:

# Marginal SD of heterogeniety effects
sd_h = Lambda('sd_h', lambda theta=theta: np.std(theta))
# Marginal SD of clustering (spatial) effects
sd_c = Lambda('sd_c', lambda phi=phi: np.std(phi))
# Proportion sptial variance
alpha = Lambda('alpha', lambda sd_h=sd_h,sd_c=sd_c: sd_c/(sd_h+sd_c))

All that remains is to run the two models in their various frameworks and compare to the results presented in Table 5.4 of (2, Banerjee et al.). Looks pretty good:


That is except the spatial precision tau_c is somewhat more precise in PyMC than WinBUGS. However the proportion of spatial variance is almost identical

$$ \alpha_{winbugs} = 0.57 [0.45, 0.67] $$
$$ \alpha_{pymc} = 0.57 [0.49, 0.65] $$
$$ \alpha_{banerjee} = 0.57 [0.46, 0.68] $$

Next up – time!


[1] Kemp I, Boyle P, Smans M, Muir C. Atlas of Cancer in Scotland, 1975–1980: Incidence and Epidemiologic Perspective. IARC Scientific Publication 72. Lyon, France: International Agency for Research on Cancer; 1985.

[2] Banerjee S, Carlin BP, Gelfand AE. Hierarchical Modeling and Analysis for Spatial Data. Boca Raton, FL: Chapman and Hall; 2004.

[3] Bernardinelli L, Clayton D, Montomoli C, Ghislandi M, Songini M. Bayesian estimates of disease maps: how important are priors. Statistics in Medicine 1995;14:2411-31.

Hierarchical Priors for Multinomial Data

As part of an ongoing project to understand autoregressive structures in hierarchical models I came across some WinBUGS code code that puts a non-Wishart prior on a Multivariate Normal precision matrix for a Multinomial-logistic model of pig weight histograms. Why might one do such a thing? Well, often when putting together histograms of things the distribution doesn’t look as smooth as one would expect. In the case of these pigs, the distribution of weights is slightly uneven:

Pig data

The objective in this model (from pg.177 of Congdon’s 2001 Bayesian Statistical Modelling book) is to smooth the observed histogram so that it looks more like the density of the population from which the sample is drawn, q(y).

Because histograms are broken into bins into which the observations falling are stacked, Congdon’s approach was to use a Multinomial distribution to estimate expected frequencies in weight bins ranging from 19 to 39. The key twist is that the smoothing occurs by assuming that adjacent points on the histogram have similar frequencies and cannily using an AR1 process to structure this belief.

Prior specification using matrices is a tricky business, not one that I’m an expert in by a long shot. But it is important for some water quality modelling I’ve been working on that will have a spatial AR process in the mix.

So on to the PyMC implementation…


First, the observed frequency of pig weight gains from Table 5.8 of Congdon:

import numpy as np

# Observed frequencies
weights = np.array([1,1,0,7,5,10,30,30,41,48,66,72,56,46,45,22,24,12,5,0,1])

# Total number of observations
n = sum(weights)

# Total number of bins
s = len(weights)


Next in our Bayesian model comes prior specification. For this model the covariance matrix $V=P^{-1}$ (that is, the inverse of the precision matrix, $P$) assumes a smoothness structure:

\[ V_{ij}=\sigma^{2}\rho^{|i-j|} \]

which means the covariance between bins i and j is equal to some common variance, $\sigma^{2}$, times a correlation coefficient $\rho$, and that this relationship diminishes exponentially by how far the bins are away from each other (i.e. $|i-j|$). This, Congdon says, is akin to the smoothing we’d see in an AR1 time series and expresses our prior belief as to the histograms’s smoothness. If $\tau = \sigma^{-2}(1-\rho^{2})^{-1}$, then the resulting precision matrix $P$ has the form:

$$ \begin{align} P_{ij} = P_{ji}=0, & for & i=1,…,J-2; j=2+i, J \end{align}$$

$$ \begin{align} P_{jj}=\tau(1+\rho^{2}) & & j=2,…,J-1 \end{align}$$

$$ P_{11,JJ} = \tau $$

$$ \begin{align} P_{j,j+1} = P_{j+1,j} = -\rho\tau & & j=2,…,J-1 \end{align} $$

To build this precision matrix in PyMC requires first defining the priors for $\rho$ and $\tau$:

# Prior on elements of AR Precision Matrix 
rho = Uniform('rho', lower=0, upper=1)
tau = Uniform('tau', lower=0.5, upper=10)

Variable $\rho$ is the correlation and, because we expect adjacent bins to be similar, it is bounded positively between zero and one; $\tau$ is a precision bounded between 0.5 and 10 due to prior information that Congdon cites in his text; we could easily use $\tau\sim U(0,100)$ priors instead, the results are the same. Next the precision matrix itself:

# Define precision matrix
def P(tau=tau, rho=rho):
    M = (np.zeros(s**2)).reshape(s,s)
    M[range(1,s-1),range(1,s-1)] = np.ones(s-2)*(tau*(1+rho**2))
    M[(0,s-1),(0,s-1)] = tau
    M[range(0,s-1),range(1,s)] = np.ones(s-1)*(-tau*rho)
    M[range(1,s),range(0,s-1)] = np.ones(s-1)*(-tau*rho)
    return M

This uses the decorator instantiation of PyMC (@deterministic) that allows the matrix to be specified as a regular Python function (def). A bit about the code here to explain; in the same order as the algebra above and for each line of PyMC code in function P:

  1. Sets up an empty matrix M that is the same dimension as the number of bins;
  2. Sets the diagonal elements to $\tau(1+\rho^{2})$ except for the upper left and lower right elements;
  3. Sets the upper left and lower right elements to $\tau$;
  4. Sets the off-diagonal elements to $-\rho\tau$;
  5. Return the matrix M.


Now that we have priors for P and have P itself is set up, we can estimate the multiple logit model parameters using a Multivariate Normal distribution:

# MVN for logit parameters
theta = MvNormal('theta', mu=np.ones(s)*-np.log(s), tau=P)

The mu=np.ones(s)*-np.log(s) part of this model correspond to a neutral (equal) prior for the Multinomial probabilities $\pi_{j}$ that will be used to estimate the frequencies in the smoothed histogram. From this, we can apply a multiple logit tranformation to get the bin probabilities:

# Bin probabilities
pi = Lambda('pi', lambda g=theta: np.exp(g)/sum(np.exp(g)))


These probabilities are then used with the data to complete the likelihood:

# Likelihood
Yi = Multinomial('Yi', n=n, p=pi, value=weights, observed=True)

Finally, we can calculate the expected bin frequencies for the smoothed histogram from the posterior probabilities for each bin and the total number of pigs weighed:

# Smoothed frequencies
Sm = Lambda('Sm', lambda pi=pi: n*pi)

Put this all together and run for $10e^{5}$ iterations, we get a rather satisfactory result, quite near the estimates from Congdon’s original Table 5.8 and the WinBUGS example:

Pig smoothed