Reproducibility of pyrosequencing data for biodiversity assessment in complex communities

Document Type


Publication Date


Publication Title

Methods in Ecology and Evolution





First Page


Last Page





High-throughput sequencing is rapidly becoming a popular method to profile complex communities and has generated deep insights into community biodiversity. However, the reproducibility of this method for biodiversity assessment remains largely unexplored. Here we evaluated reproducibility by analysing 454 pyrosequenced biological replicates of two complex plankton communities collected from one freshwater port and one marine port. We also tested whether reproducibility potentially influences biodiversity estimates, notably - and -diversity. Our evaluation of reproducibility revealed a complex scenario, having both technical and biological significance. At the Operational Taxonomic Unit (OTU) level, reproducibility was 100% for high-abundance OTUs (>100 sequences), although it was lower for low-abundance OTUs, and sometimes 88% of irreproducible OTUs had high sequence similarity to existing records, suggesting that some singletons may reflect rare lineages/genotypes in communities. However, spurious amplification of distantly related taxonomic groups generated mainly low-abundance OTUs that were characterized by low reproducibility. At a broad taxonomic level (i.e. order level), reproducibility decreased as the abundance of OTUs decreased and was particularly low for distantly related taxonomic groups such as algae and protists that were not the targets of our zooplankton biodiversity survey. At a lower taxonomical level (i.e. family-level), overall reproducibility was high (>80%) for crustaceans, the dominant group in zooplankton samples. Therefore, we suggest that random variation during both sample collection and sequencing processes can be responsible for low reproducibility. Our analyses also suggest that random sampling processes may influence both - and -diversity estimates. Our results add to growing evidence that caution needs to be applied when designing and interpreting experiments utilizing high-throughput sequencing data for biodiversity assessments. Technical replicates are needed to statistically correct intra-sample variation, while field-based replicate samples are desirable to substantiate results. An overestimation of species diversity can occur when OTUs are uniquely characterized by spuriously amplified sequences and errors/artifacts. Therefore, careful management of low-abundance OTUs is required to reveal unique/rare lineages. Our results suggest that further studies are needed to determine the ecological significance of low-abundance OTUs in complex communities.


This is an accepted manuscript version of an aritcle whose version of record was published in:Methods in Ecology and Evolution: http://dx.doi.org/10.1111/2041-210X.12230