The autoregression continues this week with an *intrinsic conditional autoregressive*, or CAR model, in PyMC. There are many, many ways to introduce spatial effects into a modelling framework; due to the availability of cheap MCMC sampling, particularly in WinBUGS, CAR models have been widely used over the past decade to structure the spatial dependence or clustering of gridded data.

The canonical example is in estimating the log-relative risk of lip cancer among 56 districts in Scotland between 1975 and 1980; the original data come from (1, Kemp et al. 1985). Disease mapping is perhaps the oldest form of epidemiology, including the famous cholera map of Dr. John Snow that helped him figure out the bacteria was waterborne, rather than being from ‘bad air.’ In the Scottish case, there is some interest in how lip cancer rates relate to the proportion of the population engaged in agriculture, forestry, or fishing (AFF), as exposure to sunlight is a risk factor.

The data consist of observed (O) cases of lip cancer, and expected (E) rates, based on the age of the local population:

```
import numpy as np
county = np.array(["skye_lochalsh", "banff_buchan", "caithness,berwickshire", "ross_cromarty", "orkney", "moray", "shetland", "lochaber", "gordon", "western_isles", "sutherland", "nairn", "wigtown", "NE.fife", "kincardine", "badenoch", "ettrick", "inverness", "roxburgh", "angus", "aberdeen", "argyll_bute", "clydesdale", "kirkcaldy", "dunfermline", "nithsdale", "east_lothian", "perth_kinross", "west_lothian", "cumnock_doon", " stewartry", " midlothian", "stirling", "kyle_carrick", "inverclyde", "cunninghame", "monklands", "dumbarton", "clydebank", "renfrew", "falkirk", "clackmannan", "motherwell", "edinburgh", "kilmarnock", "east_kilbride", "hamilton", "glasgow", "dundee", "cumbernauld", "bearsden", "eastwood", "strathkelvin", "tweeddale", "annandale"])
O = np.array([9, 39, 11, 9, 15, 8, 26, 7, 6, 20, 13, 5, 3, 8, 17, 9, 2, 7, 9, 7, 16, 31, 11, 7, 19, 15, 7, 10, 16, 11, 5, 3, 7, 8, 11, 9, 11, 8, 6, 4, 10, 8, 2, 6, 19, 3, 2, 3, 28, 6, 1, 1, 1, 1, 0, 0])
N = len(O)
E = np.array([1.4, 8.7, 3.0, 2.5, 4.3, 2.4, 8.1, 2.3, 2.0, 6.6, 4.4, 1.8, 1.1, 3.3, 7.8, 4.6, 1.1, 4.2, 5.5, 4.4, 10.5, 22.7, 8.8, 5.6, 15.5, 12.5, 6.0, 9.0, 14.4, 10.2, 4.8, 2.9, 7.0, 8.5, 12.3, 10.1, 12.7, 9.4, 7.2, 5.3, 18.8, 15.8, 4.3, 14.6, 50.7, 8.2, 5.6, 9.3, 88.7, 19.6, 3.4, 3.6, 5.7, 7.0, 4.2, 1.8])
logE = np.log(E)
```

Which for the observed cases of lip cancer look like:

And for the expected cases, based on demographics:

So there was a big disparity between observed and expected rates of lip cancer in the late 1970’s. The exposure to sunlight factor may be responsible, or it may be spatially related to some unmeasured factor (in general space or time stand in for things we haven’t measured). The remaining data are for the CAR calculations outlined below.

```
aff = np.array([16, 16, 10, 24, 10, 24, 10, 7, 7, 16, 7, 16, 10, 24, 7, 16, 10, 7, 7, 10, 7, 16, 10, 7, 1, 1, 7, 7, 10, 10, 7, 24, 10, 7, 7, 0, 10, 1, 16, 0, 1, 16, 16, 0, 1, 7, 1, 1, 0, 1, 1, 0, 1, 1, 16, 10])/10.
num = np.array([4, 2, 2, 3, 5, 2, 5, 1, 6, 4, 4, 3, 4, 3, 3, 6, 6, 6 ,5, 3, 3, 2, 6, 8, 3, 4, 4, 4,11, 6, 7, 4, 4, 9, 5, 4, 5, 6, 5, 5, 7, 6, 4, 5, 4, 6, 6, 4, 9, 3, 4, 4, 4, 5, 5, 6])
adj = np.array([[5, 9,11,19],
[7,10],
[6,12],
[18,20,28],
[1,11,12,13,19],
[3, 8],
[2,10,13,16,17],
[6],
[1,11,17,19,23,29],
[2, 7,16,22],
[1, 5, 9,12],
[3, 5,11],
[5, 7,17,19],
[31,32,35],
[25,29,50],
[7,10,17,21,22,29],
[7, 9,13,16,19,29],
[4,20,28,33,55,56],
[1, 5, 9,13,17],
[4,18,55],
[16,29,50],
[10,16],
[9,29,34,36,37,39],
[27,30,31,44,47,48,55,56],
[15,26,29],
[25,29,42,43],
[24,31,32,55],
[4,18,33,45],
[9,15,16,17,21,23,25,26,34,43,50],
[24,38,42,44,45,56],
[14,24,27,32,35,46,47],
[14,27,31,35],
[18,28,45,56],
[23,29,39,40,42,43,51,52,54],
[14,31,32,37,46],
[23,37,39,41],
[23,35,36,41,46],
[30,42,44,49,51,54],
[23,34,36,40,41],
[34,39,41,49,52],
[36,37,39,40,46,49,53],
[26,30,34,38,43,51],
[26,29,34,42],
[24,30,38,48,49],
[28,30,33,56],
[31,35,37,41,47,53],
[24,31,46,48,49,53],
[24,44,47,49],
[38,40,41,44,47,48,52,53,54],
[15,21,29],
[34,38,42,54],
[34,40,49,54],
[41,46,47,49],
[34,38,49,51,52],
[18,20,24,27,56],
[18,24,30,33,45,55]])
# Change to Python indexing (i.e. -1)
for i in range(len(adj)):
for j in range(len(adj[i])):
adj[i][j] = adj[i][j]-1
weights = np.array([[1,1,1,1],
[1,1],
[1,1],
[1,1,1],
[1,1,1,1,1],
[1, 1],
[1,1,1,1,1],
[1],
[1,1,1,1,1,1],
[1, 1,1,1],
[1, 1, 1,1],
[1, 1,1],
[1, 1,1,1],
[1,1,1],
[1,1,1],
[1,1,1,1,1,1],
[1, 1,1,1,1,1],
[1,1,1,1,1,1],
[1, 1, 1,1,1],
[1,1,1],
[1,1,1],
[1,1],
[1,1,1,1,1,1],
[1,1,1,1,1,1,1,1],
[1,1,1],
[1,1,1,1],
[1,1,1,1],
[1,1,1,1],
[1,1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1],
[1,1,1,1,1,1,1],
[1,1,1,1],
[1,1,1,1],
[1,1,1,1,1,1,1,1,1],
[1,1,1,1,1],
[1,1,1,1],
[1,1,1,1,1],
[1,1,1,1,1,1],
[1,1,1,1,1],
[1,1,1,1,1],
[1,1,1,1,1,1,1],
[1,1,1,1,1,1],
[1,1,1,1],
[1,1,1,1,1],
[1,1,1,1],
[1,1,1,1,1,1],
[1,1,1,1,1,1],
[1,1,1,1],
[1,1,1,1,1,1,1,1,1],
[1,1,1],
[1,1,1,1],
[1,1,1,1],
[1,1,1,1],
[1,1,1,1,1],
[1,1,1,1,1],
[1,1,1,1,1,1]])
Wplus = array([sum(w) for w in weights])
```

This model has been analyzed in many places, but I’ll use the model specified on page 165 (2, Banerjee et al.), the WinBUGS code for which is available on Brad Carlin’s website. The data are counts so a Poisson model is a good starting point, with the linear model being used to explain deviations from the expected values in each district:

$$ O_{i}|\psi_{i}\sim P(E_{i}e^{\psi_{i}}) $$

$$ \psi_{i}=\beta_{0}+\beta_{1}AFF_{i}+\theta_{i}+\phi_{i} $$

Here the $\beta_{0}$ and $\beta_{1}$ are the intercept and slope for the relationship to proportion of the district engaged in outdoor occupations. The $\theta_{i}$’s are hierarchical, district level effects (despite having only one observation per district) that add extra Poisson variation:

$$ \theta_{i}\sim N(0,\tau_{h}) $$

given a common precision:

$$ \tau_{h}\sim \Gamma(3.2761,1.81) $$

The $\Gamma(3.2761,1.81)$ priors are given in Table 5.4 of (2, Banerjee et al.) as alternative priors to a more conventional $\Gamma(0.001,0.001)$ uninformative prior. The $\phi_{i}$’s are the spatial or clustering effects given by the CAR part of the model:

$$ \phi_{i}\sim CAR(\tau_{c}) $$

with precision

$$ \tau_{c}\sim \Gamma(1.0,1.0) $$

In WinBUGS then, this model is specified as (from Carlin’s website):

```
model
{
for (i in 1 : regions) {
O[i] ~ dpois(mu[i])
log(mu[i]) <- log(E[i]) + beta0 + beta1*aff[i]/10 + phi[i] + theta[i]
theta[i] ~ dnorm(0.0,tau.h)
}
phi[1:regions] ~ car.normal(adj[], weights[], num[], tau.c)
beta0 ~ dnorm(0.0, 1.0E-5) # vague prior on grand intercept
beta1 ~ dnorm(0.0, 1.0E-5) # vague prior on covariate effect
tau.h ~ dgamma(3.2761, 1.81)
tau.c ~ dgamma(1.0, 1.0)
sd.h <- sd(theta[]) # marginal SD of heterogeneity effects
sd.c <- sd(phi[]) # marginal SD of clustering (spatial) effects
alpha <- sd.c / (sd.h + sd.c)
}
```

The WinBUGS code has three extra bits – `sd.h`

, `sd.c`

, and `alpha`

– posterior nodes that keep tabs on the marginal standard deviations of global heterogeneity and spatial variation, and on their relative contributions to overall variation.

Looking a little more closely at this model brings up two important points. The first is that $\theta_{i}$ and $\phi_{i}$ represent two random effects, global (hierarchical) and spatial, from a single datapoint in each district, making them unidentified. So how can this be overcome?? By specifying hyperpriors $\tau_{h}$ and $\tau_{c}$; however these must be chosen sensibly so as to not favour one component over the other. To do so is tricky, because while $\tau_{h}$ is a marginal precision, $\tau_{c}$ is conditional according to the CAR structure we chose. This raises the second point – what the hell does $\sim CAR(\tau_{c})$ mean? What does this ‘distribution’ look like and what is its structure? In the WinBUGS code for this model, all we get is:

```
phi[1:regions] ~ car.normal(adj[], weights[], num[], tau.c)
```

which tells us about as much as $\sim CAR(\tau_{c})$. Because (sadly) I understand code far better than algebra, to find out I delved into Appendix 1 of the GeoBUGS manual, to find out what the `car.normal()`

model meant. It contained a few key sentences:

…the joint multivariate Gaussian model can be expressed in the form of a set of conditional distribuitons

$$S_{i}|S_{-i}\sim Normal(\mu_{i}+\Sigma_{j=1}^{N}\gamma C_{ij}(s_{j}-\mu_{i}), \phi M_{ii})$$

($S_{-i}$ means all elements but *i*)

Meaning that, rather than calculating the entire massive multivariate normal for the entire dataset (with associated big matrix inversions), the spatial component can be conditionally normal given a mean ($\mu_{i}$) weighted by the neighbouring data cells ($\Sigma_{j=1}^{N}\gamma C_{ij}(s_{j}-\mu_{i})$) and the conditional variance for cell *i*. Then the intrinsic CAR model (*i.e.* `car.normal()`

) is given as

$$ S_{i}|S_{-i}\sim Normal(S.bar_{i}, v/n_{i}) $$

which means that, in the intrinsic case

Si has a normal distribution with conditional mean given by the average of the neighbouring Sj ‘s and conditional variance inversely proportional to the number of neighbours ni.

This makes a lot of sense – the value we observe at *i* is informed by the average of neighbouring values. To be sure, I delved into the very bowels of WinBUGS code – written in the obscure Component Pascal language – to find that the `car.normal()`

distribution is exactly what the manual said:

```
PROCEDURE (node: Node) Sample (OUT res: SET);
VAR
i, len: INTEGER;
mu, tau, value, wPlus: REAL;
BEGIN
len := LEN(node.neighs);
i := 0;
mu := 0.0;
wPlus := 0.0;
WHILE i < len DO
mu := mu + node.neighs[i].value * node.weights[i];
wPlus := wPlus + node.weights[i];
INC(i)
END;
mu := mu / wPlus;
tau := node.tau.Value() * wPlus;
value := MathRandnum.Normal(mu, tau);
node.SetValue(value);
res := {}
END Sample;
```

This means the CAR sample is a Normally-distributed random variable, with the mean value for observation *i*, $\mu$, being the weighted (`node.weights[i]`

) average of the values bordering that observation (`node.neighs[i].value`

), given a precision (this is WinBUGS/PyMC remember) that is increased according to the weighted contribution of the number of neighbours present (`wPlus`

). In this model the weights are all given as 1, indicating simply whether the other observations are neighbours or not.

Here then is the source of conditioning on $\tau_{c}$, the weighting scheme used and the number of observations neighbouring each observation. As a result, the prior used for $\tau_{c}$ above comes from (3, Bernardinelli et al.), who argue that the prior marginal standard deviation of $\phi_{i}$ is approximately equal to the prior conditional standard deviation divided by 0.7. As noted by (2, Banerjee et al.), any scale parameter for the prior $\Gamma$ that satisfies:

$$ sd(\theta_{i}) = \frac{1}{\sqrt{\tau_{h}}} \approx \frac{1}{0.7\sqrt{\bar{m}\tau_{h}}} \approx sd(\phi_{i}) $$

where $\bar{m}$ is the average number of neighbours should be a fair prior for each source of variation. Understanding this, all the elements are in place for a [PyMC] CAR model.

First the priors:

```
# Vague prior on intercept
beta0 = Normal('beta0', mu=0.0, tau=1.0e-5, value=0)
# Vague prior on covariate effect
beta1 = Normal('beta1', mu=0.0, tau=1.0e-5, value=0)
# Random effects (hierarchial) prior
tau_h = Gamma('tau_h', alpha=3.2761, beta=1.81, value=1)
# Spatial clustering prior
tau_c = Gamma('tau_c', alpha=1.0, beta=1.0, value=1)
```

and the random effects:

```
# Regional random effects
theta = Normal('theta', mu=0.0, tau=tau_h, value=np.zeros(N))
```

Next comes the CAR part, which is efficiently handled by a Python list comprehension within a decorator instantiation of stochastic node:

```
@stochastic
def mu_phi(tau=tau_c, value=np.zeros(N)):
# Calculate mu based on average of neighbours
mu = np.array([ sum(weights[i]*value[adj[i]])/Wplus[i] for i in xrange(N)])
# Scale precision to the number of neighbours
taux = tau*Wplus
return normal_like(value,mu,taux)
# Zero-centre phi
phi = Lambda('phi', lambda mu=mu_phi: mu-mean(mu))
```

This estimates each mu by summing the values of the adjacent observations (indexed by the ragged array `adj`

) multiplied by the weights (the other ragged array, `weights`

), and dividing by the total weight (`Wplus`

, which because the weights are all 1 is simply the number of adjacent observations); the $\tau$ for each observation is the overall $tau_{c}$ is multipled by the total weight (number of observations). These values are then given a normal likelihood that is subsequently zero-centred at $\phi$. Finally the mean (linear) model and the Poisson likelihood:

```
# Mean model
mu = Lambda('mu', lambda b0=beta0,b1=beta1,theta=theta,phi=phi: exp(logE+b0+b1*aff+theta+phi) )
# Likelihood
Yi = Poisson('Yi', mu=mu, value=O, observed=T)
```

As for the WinBUGS model above, we will also keep tabs on the proportions of total variance for the random and spatial components:

```
# Marginal SD of heterogeniety effects
sd_h = Lambda('sd_h', lambda theta=theta: np.std(theta))
# Marginal SD of clustering (spatial) effects
sd_c = Lambda('sd_c', lambda phi=phi: np.std(phi))
# Proportion sptial variance
alpha = Lambda('alpha', lambda sd_h=sd_h,sd_c=sd_c: sd_c/(sd_h+sd_c))
```

All that remains is to run the two models in their various frameworks and compare to the results presented in Table 5.4 of (2, Banerjee et al.). Looks pretty good:

That is except the spatial precision `tau_c`

is somewhat more precise in PyMC than WinBUGS. However the proportion of spatial variance is almost identical

$$ \alpha_{winbugs} = 0.57 [0.45, 0.67] $$

$$ \alpha_{pymc} = 0.57 [0.49, 0.65] $$

$$ \alpha_{banerjee} = 0.57 [0.46, 0.68] $$

Next up – time!

Bibliography

[1] Kemp I, Boyle P, Smans M, Muir C. Atlas of Cancer in Scotland, 1975–1980: Incidence and Epidemiologic Perspective. IARC Scientific Publication 72. Lyon, France: International Agency for Research on Cancer; 1985.

[2] Banerjee S, Carlin BP, Gelfand AE. Hierarchical Modeling and Analysis for Spatial Data. Boca Raton, FL: Chapman and Hall; 2004.

[3] Bernardinelli L, Clayton D, Montomoli C, Ghislandi M, Songini M. Bayesian estimates of disease maps: how important are priors. Statistics in Medicine 1995;14:2411-31.