-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathtrends.tex
executable file
·185 lines (123 loc) · 16.6 KB
/
trends.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
\documentclass[12pt,letter,draft]{article}
\usepackage{amsmath,amssymb,amsthm,epsfig,rotfloat,psfrag,natbib}
%% fonts
\usepackage[charter]{mathdesign}
\usepackage[scaled=.95]{inconsolata}
%% page margins, inter-paragraph space and no chapters
\usepackage[margin=1in]{geometry}
\setlength{\parskip}{0.5em}
\usepackage{lineno, natbib}
%% Some term shortcuts
\newcommand{\Nij}{\ensuremath{N_{ij}}}
\newcommand{\nij}{\ensuremath{n_{ij}}}
\newcommand{\zij}{\ensuremath{z_{ij}}}
\newcommand{\yij}{\ensuremath{y_{ij}}}
\newcommand{\dij}{\ensuremath{\mathbf{d}_{ij}}}
\newcommand{\xij}{\ensuremath{\mathbf{x}_{ij}}}
\newcommand{\bzi}{\ensuremath{\mathbf{z}_i}}
\newcommand{\fN}{\ensuremath{\mathcal{N}}}
\newcommand{\by}{\ensuremath{\mathbf{y}}}
\newcommand{\bz}{\ensuremath{\mathbf{z}}}
\newcommand{\bX}{\ensuremath{\mathbf{X}}}
\newcommand{\bx}{\ensuremath{\mathbf{x}}}
\newcommand{\bM}{\ensuremath{\mathbf{M}}}
\newcommand{\bT}{\ensuremath{\mathbf{T}}}
\newcommand{\bb}{\ensuremath{\boldsymbol{\beta}}}
\newcommand{\ba}{\ensuremath{\boldsymbol{\alpha}}}
\newcommand{\beps}{\ensuremath{\boldsymbol{\epsilon}}}
\newcommand{\bn}{\ensuremath{\boldsymbol{\eta}}}
\newcommand{\bd}{\ensuremath{\boldsymbol{\delta}}}
\newcommand{\bphi}{\ensuremath{\boldsymbol{\phi}}}
\newcommand{\bg}{\ensuremath{\boldsymbol{\gamma}}}
\bibliographystyle{apalike}
\begin{document}
\begin{center}
\Large A General Paradigm for Estimating Trends from Ecological Monitoring Data
\bigskip\\
\normalsize
{\sc Jay M. Ver Hoef\footnotemark[1] and Devin S. Johnson}\smallskip\\
{\em National Marine Mammal Laboratory, NOAA\\
7600 Sand Point Way NE, Seattle,
WA 98115 USA }\\ \medskip
\end{center}
\footnotetext[1]{Email: [email protected]}
\raggedright \setlength{\parindent}{0.3in}
\renewcommand{\baselinestretch}{1.7}\normalsize
\clubpenalty=0
\linenumbers
{\em Abstract.\ } xxx.
{\em Key words: xxx}
\centerline{\sc Introduction}
Estimating ``trends'' from survey data has a long standing tradition in ecology and resource management. Estimates of trends are often used to inform resource management decisions, invoke resource protection measures, or remove resource protection measures after satisfactory growth in abundance of a population has been reached. The term, ``trend,'' is a nebulous concept, however, with changing definition depending on the situation at hand. In the broadest sense, a trend, is simply the change in abundance over a specified time interval. , of which both components may change depending on the question being asked of the data.
...
\citet{Link:2002zd} allude to this approach by characterizing a composite trend, but, fail to pursue this notion in general.
...
The work presented here is motivated by on-going monitoring of two marine mammal species in Alaska, harbor seals ({\it Phoca vitulina}) and Steller sea lions ({\it Eumetopias jubatus}).
Hierarchical or state-space models have become a popular framework for analyzing abundance trends...
...
First, whether the trend was linear, quadratic, etc., mattered, and also whether the survey window was the last 15 year, 10 years, etc. Secondly, it wasn't very stable. Some small sites, even with fairly strong priors, would tend to "blow up." That is, a site that changed from 0 to 1 in early 2000's might have estimates in the 100,000's by 2010; but then the Markov Chain would go back to near 0, so there was huge variability. So, I opted for a more flexible form where trend is simply a temporally autocorrelated process. However, this has no "parameters" that naturally estimate trend, so I ended up computing the simple linear regression (least squares for a line) on the the posterior distribution of the annual abundance estimates. I did this for the last 5, 10, and 15 years. I'm quite pleased with this approach. It appears very stable and is completely flexible; allowing us to add years without having to reconsider an underlying trend model as we accumulate data.
...
\section{Methods}
In this section we present a general approach for estimating trends from populations for whom abundance has been assessed by some survey method at several sites and times. The goal is to estimate the change in some function of abundance over a time span of interest.
\subsection{Notation}
To begin the description of the general framework, we need to provide some notational background. The backbone of trend estimation is the true abundance at a given site and time. We recognize the fact that truth, when it comes to abundance at a given point in space and time, is a nebulous concept that will have different meaning depending on the species of interest. Nevertheless, we assume that there is some concept of a true abundance. We denote $\Nij$ as this true abundance at site $s_i$ and time $t_j$ and $\zij=\log(\Nij)$. In addition, there may be known or hypothesized vector of drivers to abundance change, say $\mathbf{d}_{ij}$, that could be added to better predict true abundance where it is unknown. The set of times for which population growth is related will be denoted $T$. For simplicity, we will assume $T = \{t_1\dots,t_J\}$ is discrete, but we envision that the proposed method could certainly extend to continuous time spans. The set of all sites will be denoted by $S = \{s_1,\dots,s_I\} $. Again, $S$ could be extended to a continuous spatial domain as well, we refrain here for convenience. The product space of all sites and times is $A = S \times T$.
To estimate trend, a subset of sites are surveyed (censused or sampled) at a subset of times. We denote $\nij$ as the surveyed abundance at site $s_i$ and time $t_j$. The set of surveyed sites and times is $B\subseteq A$. The log-scale value of the surveyed abundance is $\yij=\log(\nij)$. There may or may not be systematic or random deviations of $\nij$ from $\Nij$. For each $i$ and $j$, there may exist covariates, say $\bx_{ij}$, that can help account for these deviations. We use the notational convention that $\mathbf{N}_C$ represents a vector of all $\Nij$ such that $(s_i, t_j)$ are elements of the set $C$ of sites and time. If $C=A$ or $B$, we leave the subscript off, i.e., $\mathbf{n}_B = \mathbf{n}$, unless it is not clear from the context.
We define a trend to be a function of the true abundances at a subset of sites relative to a specified set of times, say $\tau_C=\tau(\mathbf{N}, C)$. For example we might calculate $\tau_A$ as the least-squares slope of $\mathbf{z}$ with respect to $T$, or, $\tau_A$ might be the difference (derivative) rate of population change, $(\Nij/N_{i,j-1}) - (N_{i,j-1}/N_{i,j-2})$, $j=3,\dots,J$, in which case, $\tau_A$ is a vector. By defining trend in this fashion we consider trend to be a summary of abundance time-series not a parameter in a probabilistic model as it is traditionally. These two views are not mutually exclusive. For example, if
\subsection{Bayesian inference for trends}
In this section we take a broad general approach without defining to many specifics as the research at-hand will dictate many details applicable to the population of interest. Following this section we illustrate two example case studies that have real conservation implications. Here we seek to present a paradigm shift in the way ecologists consider estimating trends.
To estimate trends we begin with the general hierarchical model with the observation component denoted $[n_{ij}\ |\ \Nij,\xij, \varphi_o]$ and the process component denoted $[\Nij\ |\ \mathbf{N}_{A_{ij}},\dij,\varphi_p]$, where $A_{ij} = A\setminus (s_i,t_j)$, i.e., $A_{ij}$ is the set of all sites and times previous to $(s_i,t_j)$ We use the symbol $\varphi$ to generically denote a set of parameters. Note, here we assume that the observation, $\nij$, depends only on $\Nij$ and is independent of the true abundance at other sites and times, while, the true abundance is allowed to depend on, potentially, the true abundance at all other sites and times. The restriction on $\nij$ could be relaxed, but, we do not envision that situation as a widespread issue, so, we made that notational simplification. From this pair of models we can form the posterior distribution of $\Nij$ and any parameters using Bayes rule,
\begin{equation}
[\mathbf{N}, \varphi_o, \varphi_p\ |\ \mathbf{n}, \mathbf{x}, \mathbf{d}] \propto \prod_{(i,j)\in A}\left\{[n_{ij}\ |\ \Nij,\xij, \varphi_o]^{1_B(i,j)} [\Nij\ |\ \mathbf{N}_{A_{ij}}, \dij, \varphi_p]\right\}[\varphi_o, \varphi_p],
\end{equation}
where $1_B(i,j)$ is the indicator that $(s_i,t_j)$ is in the collection of surveyed sites and times, $B$ and $[\varphi_o, \varphi_p]$ is the prior distribution of $(\varphi_o, \varphi_p)$. Finally, the marginal posterior distribution of $\mathbf{N}$ is
\begin{equation}
[\mathbf{N}|\mathbf{n}] \propto \int [\mathbf{N}, \varphi_o, \varphi_p\ |\ \mathbf{n}, \mathbf{x}, \mathbf{d}]d(\varphi_o,\varphi_p).\end{equation}
The most straightforward method for summarizing properties of $[\mathbf{N}\ |\ \mathbf{n}]$ is using a method such as Markov chain Monte Carlo (MCMC; see \citealt{Givens:2005dy} for a general description) to obtain the posterior sample $\mathbf{N}^{(1)},\dots,\mathbf{N}^{(M)}$. However, trends are our main focus. To approximate the posterior distribution of a trend, $[\tau(\mathbf{N})\ |\ \mathbf{n}]$, one need only apply the trend function to the abundance sample to obtain a sample from the posterior trend distribution. From the sample, $\tau(\mathbf{N}^{(1)}),\dots,\tau(\mathbf{N}^{(M)})$, one can approximate measures of central tendency, variance, credible intervals, etc, using the empirical versions calculated from the sample.
\subsection{Realized vs. predicted abundance?}
Here I think it would be good to discuss the benefits/differences between the posterior and posterior predictive versions of the method.
The posterior predictive distribution of abundance is
\begin{equation}
[\mathbf{N}^{rep}|\mathbf{n}] \propto \int \prod_{(i,j)\in A}\left\{[\Nij^{rep}\ |\ \mathbf{N}_{A_{ij}}^{rep}, \dij, \varphi_p]\right\}[\varphi_p\ |\ \mathbf{n}, \mathbf{x}, \mathbf{d}]d\varphi_p,\end{equation}
where $\Nij^{rep}$ is the {\em replicated} abundance.
Sampling from the posterior predictive distribution usually involves only one additional step to the MCMC sampler mentioned previously. For $\varphi_p^{(1)},\dots,\varphi_p^{(M)}$
\section{Case study: Steller sea lion monitoring}
In this section we present estimation of current trends for the western Distinct Population Segment (wDPS) of Steller sea lions. The Steller sea lion is a pinniped that inhabits the North Pacific giving birth and breeding at terrestrial rookeries from California to Alaska and Russia \citep{}. The wDPS consists of those rookeries and haul-outs in Alaska west of 144$^\circ$ W longitude and is of considerable conservation interest due to its precipitous decline in the 1980s and subsequent failure to recover is some portions of its range \citep{}. Figure A1 illustrates the Steller sea lion sites of the wDPS.
To develop conservation management plans for the wDPS trends for aggregations of counts are desired within 6 geographical regions \citep{}(see Fig. A1). That is to say, for a geographical region $S_r$, we would like to estimate the temporal trend of $\tilde N_{r1},\dots,\tilde N_{rT}$, where $\tilde N_{rj} = \sum_{s_i \in S_r} N_{ij}$. For this application we used data from $t_1=1990,\dots,t_{32}=2012$. Our goal is to summarize population growth over the interval 2000--2012, so, we considered the trend of interest, $\tau_C(\mathbf{N})$, to be the the least-squares linear slope of $\tilde z_{rj} = \log(\tilde N_{rj})$ where $C=\{(s_1,\dots,s_{204}) \times (2000,\dots,2012)\}$.
The observation model used was,
\begin{equation}
[n_{ij}\ |\ \Nij,x_j, \gamma,\sigma] = \left\{
\begin{array}{ll}
\mathcal{L}(x_j \gamma + z_{ij}, \sigma^2)& \mbox{if } N_{ij}>0\\
0 & \mbox{if } N_{ij}=0
\end{array}\right.,
\end{equation}
where $x_j$ is the indicator that $t_j < 2004$ and $\mathcal{L}$ represents a log-normal density function. The coefficient $\gamma$ adjusts for the fact that prior to the surveys conducted in 2004 photographs of sites were taken at oblique angles with handheld cameras, while in 2004 and after photographs were taken vertically at fixed altitude with medium format cameras. The medium format method produces higher counts on average than the oblique photo method. In this example, there are no replicated surveys, therefore, there is no information to estimate $\sigma$. Thus, we set it to a trivially small value $\approx 0$.
Next, we model the $\Nij$ using the zero-inflated log-normal model:
\begin{equation}
[\Nij\ |\ \beta_{i0}, \beta_{i1}, \eta_{ij}] = \left\{
\begin{array}{ll}
\mathcal{L}(\beta_{i0} + \beta_{i1}t + \eta_{ij}, \zeta^2) & \mbox{with prob. } p_{ij}\\
0 & \mbox{with prob. } 1-p_{ij}
\end{array}\right.,
\end{equation}
where $(\beta_{i0},\beta_{i1})$ are the linear growth parameters and $[\eta_{i1},\dots,\eta_{iT}|\xi] = \fN(\mathbf{0},\xi^2\mathbf{Q}^{-1})$ is a random walk of order 2 (RW2; see \citealt{} for a definition of the unscaled precision matrix $\mathbf{Q}$). The RW2 model is a smooth time series model similar to a thin-plate spline \citep{Speckman:2003xa}. Note, that the $\Nij$ model we used in this section did directly model dependence in the abundance but rather indirectly through the smooth RW2 times series $(\eta_{i1},\dots,\eta_{iT})$. A zero-inflated log-normal model was chosen rather than traditional count data models, e.g., Poisson or negative-binomial, for two reasons. First, it is the model that has been previously used when estimating regional wDPS trends \citep{} and we would like our results to be comparable. Second, and more importantly, sea lion survey sites can have substantial over {\em and} under inflation relative to the traditional count models. Under inflation occurs at big rookeries where year-to-year variation is often smaller than the average count due to breeding system of sea lions. Under this model we can interpret ``true abundance'' to mean the survey count of nonpups using the medium-format photography method. Thus, to account for the variability in counts in replicated surveys we use the posterior predictive distribution $[\Nij^{rep}\ |\ \mathbf{n}]$, to estimate growth trends.
To complete the model, the zero-inflation probability was modeled by:
\begin{equation}
\mbox{probit}(p_{ij}) = \theta_{i0} + \theta_{i1}t ,
\end{equation}
where $\theta_{i0}$ and $\theta_{i1}$ are linear coefficients.
Unfortunately, not all surveyed sites in the wDPS possessed enough data to estimate the parameters in the full version of the stated model. Therefore, we fitted the following abundance sub-models to sites with less than 11 positive ($\Nij>0$) counts. If the number of positive counts was in [0--5] we set $\beta_{i1}=0$ and $\eta_{ij}=0$, for all $j$, if the number of positive counts was in [6--10] we set $\eta_{ij}=0$, for all $j$, and if the number of positive counts was greater than 10 the full model was used. In addition, if all of the observed counts were positive the zero-inflation model was not used for that particular site (i.e., $p_{ij}=1$ for all $j$) and if the total number of surveys was less than 6 we set $\theta_{i1}=0$. To complete the specifics of the analysis we used the prior distributions $[\gamma]=\mathcal{N}(-0.039, 0.011^2)$, a flat distribution for each $[\beta_{i0}]$, $[\beta_{i1}] = \mathcal{N}(0,0.071^2)$, $[\xi^{-2}]=\mathcal{G}(0.5,0.0005)$ and $[\zeta^{-2}]=\mathcal{G}(0.5,0.0005)$ ($\mathcal{G}$ refers to a gamma distribution). The informative prior for $\gamma$ was derived from an oblique/medium-format pilot study in SE Alaska \citep{}. The informative prior distribution for each $\beta_{i1}$ was used to constrain average growth to be within $\pm20$\% per year, a more than reasonable constraint for a $K$-selected species. Lastly, A flat prior for the zero-inflation parameters, $[\theta_{i0},\theta_{i1}]$, was used.
For this analysis a burnin of 5,000 iterations was discarded then the MCMC sampler was run for 50,000 iterations. I retained every 5th iteration due to storage constraints for a total sample of 10,000 draws from the posterior predictive count at each site in each year.
\section{Case study: Harbor seal monitoring}
Jay,... Insert Harbor seal trends here...
\section{Discussion}
\nocite{*}
\bibliography{trendsBiblio}
\clearpage
\begin{figure}[htbp] % figure placement: here, top, bottom, or page
\centering
\includegraphics[width=0.9\textwidth]{wdpsNonpupsRegion2000-2012.png}
\caption{Trends of Steller sea lion counts in the western Distinct Population Segment. The back line illustrates the posterior predictive median count. The gray envelope is the 90\% highest probability density prediction interval. The blue line is the median posterior predictive linear trend.}
\label{fig:example}
\end{figure}
\end{document}