As an attempt to improve the state of my blog, I’m going to start linking to PyMC2 and PyMC3 code examples on the code repository. Enjoy.
As an attempt to improve the state of my blog, I’m going to start linking to PyMC2 and PyMC3 code examples on the code repository. Enjoy. Applications are invited for a CANSSI/Fields postdoctoral position in connection with a CANSSI Collaborative Research Team project to develop robust and statistically rigorous statespace models (SSM) methodologies for fisheries science with Joanna Mills Flemming and Chris Field (both at Dalhousie University). The research will be motivated by ongoing discussions with fisheries collaborators aimed at addressing pressing fisheries issues in Canada. Goals include: – Building robust statistical methodologies for SSMs used in fisheries science and management that allow for both errors and misspecification in the observation process and in the state/dynamic process. – Developing relevant model diagnostics that can be used to evaluate model goodnessoffit for both the observation and state process. In practice, stock assessment scientists and fisheries managers need to know if their advice is sensitive to reasonable alternative assumptions. The successful applicant should be familiar with SSMs, possess excellent computational skills and be well organized. Interested applicants should send their CV and Letter of interest as well as arrange to have two letters of reference sent to Joanna Mills Flemming (Joanna.Flemming – [at] – Dal.Ca) before April 1, 2014. The position is for two years, contingent upon funding, with start date to be determined. In a frustrating development, my coauthor Nigel Hussey reported problems from other researchers in replicating our trophic position calculations from our recent Ecology Letters paper. In it, trophic position is, regrettably (and completely my fault) missing a +TPbase (trophic position of the baseline organism) at the end. The correct equation is: My apologies to Nigel and my coauthors for not catching this in the proofs. Ecology Letters Volume 17, Issue 2, pages 239–250, February 2014 http://onlinelibrary.wiley.com/doi/10.1111/ele.12226/full
For those following the latest in Von Bertalanffy growth curve fitting for sharks, my colleague Steve Campana discovered this week that our 2002 paper looking at age and growth in blue sharks contained an error in the reported growth curve parameters. The Von B curve has a couple of forms (for a recent review on best practices see Pardo et al. (2013) MEE 4:353360) but in our 2002 paper we used the most standard: $$ L_{\inf}*(1e^{k*(aget_{0})} $$ In the paper we reported the three parameters as $L_{\inf}=302$, $k=0.58$, and $t_{0}=0.24$. These are way off, and I’m at a loss as to how I estimated them so badly ten years ago. In any case, we have issued a correction, given revised Bayesian estimates $L_{\inf}=336 [319, 353]$, $k=0.15 [0.13, 0.17]$, and $t_{0}=2.2 [2.6, 1.8]$ (posterior medians and 95% uncertainty intervals). The data, new fits, and old estimates show just how far off the old ones were. NB: PLOT YOUR MODEL AND DATA PROPERLY!!
The autoregression continues this week with an intrinsic conditional autoregressive, or CAR model, in PyMC. There are many, many ways to introduce spatial effects into a modelling framework; due to the availability of cheap MCMC sampling, particularly in WinBUGS, CAR models have been widely used over the past decade to structure the spatial dependence or clustering of gridded data. The canonical example is in estimating the logrelative risk of lip cancer among 56 districts in Scotland between 1975 and 1980; the original data come from (1, Kemp et al. 1985). Disease mapping is perhaps the oldest form of epidemiology, including the famous cholera map of Dr. John Snow that helped him figure out the bacteria was waterborne, rather than being from ‘bad air.’ In the Scottish case, there is some interest in how lip cancer rates relate to the proportion of the population engaged in agriculture, forestry, or fishing (AFF), as exposure to sunlight is a risk factor. The data consist of observed (O) cases of lip cancer, and expected (E) rates, based on the age of the local population:
Which for the observed cases of lip cancer look like: And for the expected cases, based on demographics: So there was a big disparity between observed and expected rates of lip cancer in the late 1970’s. The exposure to sunlight factor may be responsible, or it may be spatially related to some unmeasured factor (in general space or time stand in for things we haven’t measured). The remaining data are for the CAR calculations outlined below.
This model has been analyzed in many places, but I’ll use the model specified on page 165 (2, Banerjee et al.), the WinBUGS code for which is available on Brad Carlin’s website. The data are counts so a Poisson model is a good starting point, with the linear model being used to explain deviations from the expected values in each district: $$ O_{i}\psi_{i}\sim P(E_{i}e^{\psi_{i}}) $$ Here the $\beta_{0}$ and $\beta_{1}$ are the intercept and slope for the relationship to proportion of the district engaged in outdoor occupations. The $\theta_{i}$’s are hierarchical, district level effects (despite having only one observation per district) that add extra Poisson variation: $$ \theta_{i}\sim N(0,\tau_{h}) $$ given a common precision: $$ \tau_{h}\sim \Gamma(3.2761,1.81) $$ The $\Gamma(3.2761,1.81)$ priors are given in Table 5.4 of (2, Banerjee et al.) as alternative priors to a more conventional $\Gamma(0.001,0.001)$ uninformative prior. The $\phi_{i}$’s are the spatial or clustering effects given by the CAR part of the model: $$ \phi_{i}\sim CAR(\tau_{c}) $$ with precision $$ \tau_{c}\sim \Gamma(1.0,1.0) $$ In WinBUGS then, this model is specified as (from Carlin’s website):
The WinBUGS code has three extra bits – Looking a little more closely at this model brings up two important points. The first is that $\theta_{i}$ and $\phi_{i}$ represent two random effects, global (hierarchical) and spatial, from a single datapoint in each district, making them unidentified. So how can this be overcome?? By specifying hyperpriors $\tau_{h}$ and $\tau_{c}$; however these must be chosen sensibly so as to not favour one component over the other. To do so is tricky, because while $\tau_{h}$ is a marginal precision, $\tau_{c}$ is conditional according to the CAR structure we chose. This raises the second point – what the hell does $\sim CAR(\tau_{c})$ mean? What does this ‘distribution’ look like and what is its structure? In the WinBUGS code for this model, all we get is:
which tells us about as much as $\sim CAR(\tau_{c})$. Because (sadly) I understand code far better than algebra, to find out I delved into Appendix 1 of the GeoBUGS manual, to find out what the
($S_{i}$ means all elements but i) Meaning that, rather than calculating the entire massive multivariate normal for the entire dataset (with associated big matrix inversions), the spatial component can be conditionally normal given a mean ($\mu_{i}$) weighted by the neighbouring data cells ($\Sigma_{j=1}^{N}\gamma C_{ij}(s_{j}\mu_{i})$) and the conditional variance for cell i. Then the intrinsic CAR model (i.e.
which means that, in the intrinsic case
This makes a lot of sense – the value we observe at i is informed by the average of neighbouring values. To be sure, I delved into the very bowels of WinBUGS code – written in the obscure Component Pascal language – to find that the
This means the CAR sample is a Normallydistributed random variable, with the mean value for observation i, $\mu$, being the weighted ( Here then is the source of conditioning on $\tau_{c}$, the weighting scheme used and the number of observations neighbouring each observation. As a result, the prior used for $\tau_{c}$ above comes from (3, Bernardinelli et al.), who argue that the prior marginal standard deviation of $\phi_{i}$ is approximately equal to the prior conditional standard deviation divided by 0.7. As noted by (2, Banerjee et al.), any scale parameter for the prior $\Gamma$ that satisfies: $$ sd(\theta_{i}) = \frac{1}{\sqrt{\tau_{h}}} \approx \frac{1}{0.7\sqrt{\bar{m}\tau_{h}}} \approx sd(\phi_{i}) $$ where $\bar{m}$ is the average number of neighbours should be a fair prior for each source of variation. Understanding this, all the elements are in place for a [PyMC] CAR model. First the priors:
and the random effects:
Next comes the CAR part, which is efficiently handled by a Python list comprehension within a decorator instantiation of stochastic node:
This estimates each mu by summing the values of the adjacent observations (indexed by the ragged array
As for the WinBUGS model above, we will also keep tabs on the proportions of total variance for the random and spatial components:
All that remains is to run the two models in their various frameworks and compare to the results presented in Table 5.4 of (2, Banerjee et al.). Looks pretty good: That is except the spatial precision $$ \alpha_{winbugs} = 0.57 [0.45, 0.67] $$ Next up – time! Bibliography [1] Kemp I, Boyle P, Smans M, Muir C. Atlas of Cancer in Scotland, 1975–1980: Incidence and Epidemiologic Perspective. IARC Scientific Publication 72. Lyon, France: International Agency for Research on Cancer; 1985. [2] Banerjee S, Carlin BP, Gelfand AE. Hierarchical Modeling and Analysis for Spatial Data. Boca Raton, FL: Chapman and Hall; 2004. [3] Bernardinelli L, Clayton D, Montomoli C, Ghislandi M, Songini M. Bayesian estimates of disease maps: how important are priors. Statistics in Medicine 1995;14:241131.

InferenceJournalsModellingThe Rest 