Occupancy modeling species–environment relationships with non-ignorable survey designs
Statistical models supporting inferences about species occurrence patterns in relation to environmental gradients are fundamental to ecology and conservation biology. A common implicit assumption is that the sampling design is ignorable and does not need to be formally accounted for in analyses. The analyst assumes data are representative of the desired population and statistical modeling proceeds. However, if data sets from probability and non-probability surveys are combined or unequal selection probabilities are used, the design may be non-ignorable. We outline the use of pseudo-maximum likelihood estimation for site-occupancy models to account for such non-ignorable survey designs. This estimation method accounts for the survey design by properly weighting the pseudo-likelihood equation. In our empirical example, legacy and newer randomly selected locations were surveyed for bats to bridge a historic statewide effort with an ongoing nationwide program. We provide a worked example using bat acoustic detection/non-detection data and show how analysts can diagnose whether their design is ignorable. Using simulations we assessed whether our approach is viable for modeling data sets composed of sites contributed outside of a probability design. Pseudo-maximum likelihood estimates differed from the usual maximum likelihood occupancy estimates for some bat species. Using simulations we show the maximum likelihood estimator of species–environment relationships with non-ignorable sampling designs was biased, whereas the pseudo-likelihood estimator was design unbiased. However, in our simulation study the designs composed of a large proportion of legacy or non-probability sites resulted in estimation issues for standard errors. These issues were likely a result of highly variable weights confounded by small sample sizes (5% or 10% sampling intensity and four revisits). Aggregating data sets from multiple sources logically supports larger sample sizes and potentially increases spatial extents for statistical inferences. Our results suggest that ignoring the mechanism for how locations were selected for data collection (e.g., the sampling design) could result in erroneous model-based conclusions. Therefore, in order to ensure robust and defensible recommendations for evidence-based conservation decision-making, the survey design information in addition to the data themselves must be available for analysts. Details for constructing the weights used in estimation and code for implementation are provided.
Irvine, K.M., Rodhouse, T.J., Wright, W.J. and Olsen, A.R.
Ecological Applications, 28(6), pp.1616-1625