Workflow for de novo transcriptome assembly from paired-end reads, protein prediction and annotation, and quality checks. It includes:
- Quality check (fastqc)
- De novo transcriptome assembly (Trinity)
- Assembly quality checks (with bowtie2 and support scripts from Trinity)
- Prediction of open reading frames (Transdecoder)
- Functional annotation (with blastp and hmmer)
Snakemake: https://snakemake.readthedocs.io/en/stable/index.html
To run succefully every step you need to pre-install: FASTQC Trinity Transdecoder BLAST HMMER
Installation can be done using the conda package manager bioconda
From the workflow directory run the example command lines:
Dry-run
snakemake -np --use-conda
Run
snakemake --cores <max_n_cores> -p --use-conda
Generate pipeline diagram
snakemake --dag | dot -Tsvg > dag.svg --use-conda
- Move new fastq files to input/fastq_file
- Change sample names in config/sample.tsv
- Update resources/databases with those needed to run blast and hmmer