Historical data sets from vast and relatively inaccessible areas are sources of potentially unique information still valuable for biodiversity studies today. In many research fields, ranging from climate change to projection of species loss, great efforts have been made to integrate historical data sets with recent data to create databases that are as complete as possible. Unlocking the information contained in presence-only data, largely prevalent in such databases, presents a challenge for statistical modeling because of insidious observational errors due to the opportunistic nature of the data-gathering process. In this article, we propose an appropriate statistical method for the joint analysis of historical and newly collected presence-only data, that is, a Bayesian semiparametric generalized linear mixed model with Dirichlet process random effects. The potential of the method is illustrated by considering the Ross Sea section of the SOMBASE, an international compilation of Southern OceanMollusc distributional records, from 1899 to 2004 and beyond. Despite the presence of sampling bias and non detection errors, the proposedmodel draws latent information from the data, such that the resulting estimates of the parameters of interest not only are coherent with those obtained in indirectly related studies based on well-structured data but also suggest interesting ideas for further research.

A Bayesian semiparametric GLMM for historical and newly collected presence-only data: An application to species richness of Ross Sea Mollusca

C. Ghiglione;S. Schiaparelli
2017-01-01

Abstract

Historical data sets from vast and relatively inaccessible areas are sources of potentially unique information still valuable for biodiversity studies today. In many research fields, ranging from climate change to projection of species loss, great efforts have been made to integrate historical data sets with recent data to create databases that are as complete as possible. Unlocking the information contained in presence-only data, largely prevalent in such databases, presents a challenge for statistical modeling because of insidious observational errors due to the opportunistic nature of the data-gathering process. In this article, we propose an appropriate statistical method for the joint analysis of historical and newly collected presence-only data, that is, a Bayesian semiparametric generalized linear mixed model with Dirichlet process random effects. The potential of the method is illustrated by considering the Ross Sea section of the SOMBASE, an international compilation of Southern OceanMollusc distributional records, from 1899 to 2004 and beyond. Despite the presence of sampling bias and non detection errors, the proposedmodel draws latent information from the data, such that the resulting estimates of the parameters of interest not only are coherent with those obtained in indirectly related studies based on well-structured data but also suggest interesting ideas for further research.
File in questo prodotto:
File Dimensione Formato  
Carota et al Environmetrics 2017.pdf

accesso chiuso

Tipologia: Documento in versione editoriale
Dimensione 436.68 kB
Formato Adobe PDF
436.68 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11567/897085
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact