[go: up one dir, main page]

Skip to content

PEPPER v0.1 release

Compare
Choose a tag to compare
@kishwarshafin kishwarshafin released this 09 Oct 20:11
· 8 commits to r0.1 since this release

PEPPER v0.1 release notes (haploid assembly polisher)

PEPPER is a recurrent neural network-based haploid genome assembly polisher. This is the first release of the haploid assembly polishing component of PEPPER. We tested PEPPER's performance on several human genome samples, Zymo microbial community samples, and non-model organisms. The performance of PEPPER suggests that we can achieve highly accurate genome assemblies using ONT reads only.

Installation

PEPPER is available via pip to install.

python3 -m pip install pepper-polish
# if you get permission error, then try:
python3 -m pip install --user pepper-polish

python3 -m pepper.pepper --help
python3 -m pepper.pepper polish --help
# Expected output: PEPPER VERSION:  0.1.1

Models

The model files are available here: https://github.com/kishwarshafin/pepper/tree/r0.1/models

MinION_r10_native_microbial.pkl : For R10.3 guppy 3.4.8 (Microbial)
MinION_r10_pcr_microbial.pkl : For R10.3 guppy 3.4.8 (Microbial)
PEPPER_polish_haploid_guppy360.pkl : Supports Guppy 3.0.5 to Guppy 4+ (Large genomes- trained to be sensitive to the heterozygosity of the genome, can be used in phase-aware polishing)
PromethION_r941_guppy305_HAC_human.pkl : Supports Guppy 3.0.5 to Guppy 4+ (Large genomes)
PromethION_r941_guppy305_HAC_microbial.pkl : Supports Guppy 3.0.5 to Guppy 4+ (Microbial)

Motivation

Assemblies generated using ONT data usually have low base-level quality and require further polishing. Existing polishers like Racon-Medaka can improve the base-level quality of an assembly but performs poorly in transcriptome completeness. Previously, we introduced a new polisher suite, MarginPolish-HELEN, with superior performance in transcriptome completeness and base-level accuracy. However, MarginPolish-HELEN has runtime and cost overhead. To overcome the issue, we developed PEPPER, where we use local realignment of reads to the assembly to produce highly accurate polished genome assemblies while being sensitive to the structural integrity of the assembly. PEPPER can be paired with Shasta, Flye, Canu or any other ONT based assemblers. The performance of PEPPER as a standalone assembly polisher is superior to any other existing ONT assembly polisher including MarginPolish-HELEN.

We participated in the HPRC assembly bakeoff where Shasta-PEPPER HG002 assembly was able to achieve Q35 in assembly quality while having similar transcriptome completeness to that reported in the Shasta-MarginPolish-HELEN paper.

Extension to variant calling

In collaboration with Google Health, we used a modified version of the haploid assembly polisher mode of PEPPER and paired it with DeepVariant to achieve state-of-the-art performance in reference based small variant calling with ONT reads. Our effort has been recognized by the PrecisionFDA truth challenge V2 where PEPPER-DeepVariant achieved top awards in ONT category. This work is still in development and future releases will include details about modules that we are developing to enable ONT-based variant calling.

Collaboration with Darwin tree of life project and other projects.

The Darwin Tree of Life project plans to sequence and assemble all known species of animals, plants, fungi and protists in Britain and Ireland. The project picked Shasta to generate de novo ONT assemblies efficiently and after evaluating multiple existing assembly polishers, the tree of life project picked PEPPER to polish the assemblies. We are collaborating with Ksenia Krasheninnikova from the Wellcome Sanger Institute, who is actively evaluating PEPPER on non-model vertebrate genomes and helping us to improve our methods.

We are also collaborating with several other groups to use PEPPER to polish ONT based genome assemblies. We have applied PEPPER to polish tomato genomes, non-human vertebrate genomes, highly heterozygous plant genomes and microbial genomes. In all cases, we saw better performance than existing polishing tools when it comes to structural integrity of the genome assembly and base-level quality.

Future direction

PEPPER builds a foundation upon which we plan to develop a set of next-generation genome inference tools for ONT reads. In collaboration with Google Health, we were able to use PEPPER as a primary candidate finder that enabled DeepVariant to identify variants from ONT reads accurately. We plan to keep improving the variant-calling pipeline. Moreover, Shasta is now producing haplotype-resolved genome assemblies, and we plan to deploy a diploid assembly polishing pipeline soon.