Amplicon sequence variant

An amplicon sequence variant (ASV) is any one of the inferred single DNA sequences recovered from a high-throughput analysis of marker genes. Because these analyses, also called "amplicon reads," are created following the removal of erroneous sequences generated during PCR and sequencing, using ASVs makes it possible to distinguish sequence variation by a single nucleotide change. The uses of ASVs include classifying groups of species based on DNA sequences, finding biological and environmental variation, and determining ecological patterns.

ASVs were first described in 2013, by Eren and colleagues.^[1] Before that, for many years the standard unit for marker-gene analysis was the operational taxonomic unit (OTU), which is generated by clustering sequences based on a threshold of similarity. Compared to ASVs, OTUs reflect a coarser notion of similarity. Though there is no single threshold, the most commonly chosen value is 3%, which means these units share 97% of the DNA sequence. ASV methods on the other hand are able to resolve sequence differences by as little as a single nucleotide change, thus avoiding similarity-based operational clustering units altogether. Therefore, ASVs represent a finer distinction between sequences.

ASVs are also referred to as exact sequence variants (ESVs), zero-radius OTUs (ZOTUs), sub-OTUs (sOTUs), haplotypes, or oligotypes.^[2]^[3]

This compares ASVs and OTUs. This chart provides a check mark in regards to whether or not that that marker-gene analysis method is precise, traceable, reproducible, or comprehensive.

This graph shows a real sequence that was sequenced over a hundred times. The black dots are called the error cloud, with the Y-axis being how many types that specific error showed up in this set. The red vertical line represents the 3% cut-off, that means everything to the right of this line is new biology and everything to the left is an error. This demonstrates the errors or new biology that can be missed when using OTUs, since OTUs will include these in the 3% dissimilarity threshold.

This is the same real sequence that was sequenced over a hundred times as the above graph. The black dots are called the error cloud, with the Y-axis being how many types that specific error showed up in this set. Now this diagram shows how ASVs prevent these errors associated with OTUs from being included in the data set because ASVs limit the errors to being below the black curved line and new biology being those dots above the curved black line. This means that ASVs are more exact in measuring differences among sequences.

This visually demonstrates how OTUs pick up erroneous amplicon reads created from PCR and sequencing. When these sequences are amplified into clustered units, these errors are pick-up and placed into clustered units. OTUs therefore pick up a wider set of data points and have the potential to accidentally group two distinct DNA sequences into the same unit as seen by only two colors or DNA sequences being picked up into OTUs instead of four colors (DNA sequences).

This visually shows how ASVs remove and correct errors from PCR, when compared to the OTU diagram above. ASVs are able to create groups for all four colors or DNA sequences observed. This allows ASVs to be more precise in finding sequence variation

Uses of ASVs versus OTUs

The introduction of ASV methods was marked by a debate about their utility. Although OTUs do not provide such precise and accurate measurements of sequence variation, they are still an acceptable and valuable approach. In one research study, Glassman and Martiny confirmed the suitability of OTUs for investigating broad-scale ecological diversity.^[4] They concluded that OTUs and ASVs provided similar results, with ASVs enabling a slightly stronger detection of fungal and bacterial diversity. And their work indicated that even though species diversification can be measured more accurately with ASVs, the use of OTUs in well-constructed studies is generally valid to demonstrate diversification at broad scales.

Some have argued that ASVs should replace OTUs in marker-gene analysis. Their arguments focus on the precision, tractability, reproducibility, and comprehensiveness they can bring to marker-gene analysis. For these researchers, the utility of finer sequence resolution (precision) and the advantage of being able to easily compare sequences between different studies (tractability and reproducibility) make ASVs the better option for analyzing sequence differences. By contrast, since OTUs depend on the specifics of the similarity thresholds used to generate them, the units within any OTU can vary across researchers, experiments, and databases. Thus comparison across OTU-based studies and datasets can be very challenging.^[3]

ASV methods

Popular methods for resolving ASVs including DADA2,^[5] Deblur,^[6] MED,^[7] and UNOISE.^[8] These methods work broadly by generating an error model tailored to an individual sequencing run and employing algorithms that use the model to distinguish between true biological sequences and those generated by error.

References

^ Eren AM, Maignien L, Sul WJ, Murphy LG, Grim SL, Morrison HG, Sogin ML (December 2013). "Oligotyping: Differentiating between closely related microbial taxa using 16S rRNA gene data". Methods in Ecology and Evolution. 4 (12): 1111–1119. Bibcode:2013MEcEv...4.1111E. doi:10.1111/2041-210X.12114. hdl:1912/6377. PMC 3864673. PMID 24358444.
^ Porter TM, Hajibabaei M (January 2018). "Scaling up: A guide to high-throughput genomic approaches for biodiversity analysis". Molecular Ecology. 27 (2): 313–338. Bibcode:2018MolEc..27..313P. doi:10.1111/mec.14478. PMID 29292539.
^ ^a ^b Callahan BJ, McMurdie PJ, Holmes SP (December 2017). "Exact sequence variants should replace operational taxonomic units in marker-gene data analysis". The ISME Journal. 11 (12): 2639–2643. Bibcode:2017ISMEJ..11.2639C. doi:10.1038/ismej.2017.119. PMC 5702726. PMID 28731476.
^ Glassman SI, Martiny JB (July 2018). "Broadscale Ecological Patterns Are Robust to Use of Exact Sequence Variants versus Operational Taxonomic Units". mSphere. 3 (4). doi:10.1128/mSphere.00148-18. PMC 6052340. PMID 30021874.
^ Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJ, Holmes (2015-08-06). "DADA2: High resolution sample inference from amplicon data". bioRxiv. doi:10.1101/024034.
^ Amir A, McDonald D, Navas-Molina JA, Kopylova E, Morton JT, Zech Xu Z, et al. (2017-04-25). Gilbert JA (ed.). "Deblur Rapidly Resolves Single-Nucleotide Community Sequence Patterns". mSystems. 2 (2). doi:10.1128/mSystems.00191-16. PMC 5340863. PMID 28289731.
^ Eren AM, Morrison HG, Lescault PJ, Reveillaud J, Vineis JH, Sogin ML (March 2015). "Minimum entropy decomposition: unsupervised oligotyping for sensitive partitioning of high-throughput marker gene sequences". The ISME Journal. 9 (4): 968–979. Bibcode:2015ISMEJ...9..968E. doi:10.1038/ismej.2014.195. PMC 4817710. PMID 25325381.
^ Edgar RC (2016-10-15). "UNOISE2: improved error-correction for Illumina 16S and ITS amplicon sequencing". bioRxiv. doi:10.1101/081257.

[1] Eren AM, Maignien L, Sul WJ, Murphy LG, Grim SL, Morrison HG, Sogin ML (December 2013). "Oligotyping: Differentiating between closely related microbial taxa using 16S rRNA gene data". Methods in Ecology and Evolution. 4 (12): 1111–1119. Bibcode:2013MEcEv...4.1111E. doi:10.1111/2041-210X.12114. hdl:1912/6377. PMC 3864673. PMID 24358444.

[2] Porter TM, Hajibabaei M (January 2018). "Scaling up: A guide to high-throughput genomic approaches for biodiversity analysis". Molecular Ecology. 27 (2): 313–338. Bibcode:2018MolEc..27..313P. doi:10.1111/mec.14478. PMID 29292539.

[Replace-3] Callahan BJ, McMurdie PJ, Holmes SP (December 2017). "Exact sequence variants should replace operational taxonomic units in marker-gene data analysis". The ISME Journal. 11 (12): 2639–2643. Bibcode:2017ISMEJ..11.2639C. doi:10.1038/ismej.2017.119. PMC 5702726. PMID 28731476.

[4] Glassman SI, Martiny JB (July 2018). "Broadscale Ecological Patterns Are Robust to Use of Exact Sequence Variants versus Operational Taxonomic Units". mSphere. 3 (4). doi:10.1128/mSphere.00148-18. PMC 6052340. PMID 30021874.

[5] Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJ, Holmes (2015-08-06). "DADA2: High resolution sample inference from amplicon data". bioRxiv. doi:10.1101/024034.

[6] Amir A, McDonald D, Navas-Molina JA, Kopylova E, Morton JT, Zech Xu Z, et al. (2017-04-25). Gilbert JA (ed.). "Deblur Rapidly Resolves Single-Nucleotide Community Sequence Patterns". mSystems. 2 (2). doi:10.1128/mSystems.00191-16. PMC 5340863. PMID 28289731.

[7] Eren AM, Morrison HG, Lescault PJ, Reveillaud J, Vineis JH, Sogin ML (March 2015). "Minimum entropy decomposition: unsupervised oligotyping for sensitive partitioning of high-throughput marker gene sequences". The ISME Journal. 9 (4): 968–979. Bibcode:2015ISMEJ...9..968E. doi:10.1038/ismej.2014.195. PMC 4817710. PMID 25325381.

[8] Edgar RC (2016-10-15). "UNOISE2: improved error-correction for Illumina 16S and ITS amplicon sequencing". bioRxiv. doi:10.1101/081257.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]