smsk_selection: A Snakemake pipeline to find orthologs and marks of positive selection

1. Description

This is a pipeline to (briefly described):

Predict proteins from transcriptomes (transdecoder),
Find orhogroups with OrthoFinder, and methods from Yang et al.
Find patterns of positive selection with FastCodeML.
Annotate transcripts with transdecoder / trinotate
Assess transcriptome completeness with Busco

2. First steps

Install conda
Install snakemake:

conda install --yes snakemake

Clone this repo. In case of error with SSL certificates, add -c http.sslVerify=false

git clone --recursive https://github.com/jlanga/smsk_orthofinder.git

Compile the necessary dependencies: phyx, guidance and fastcodeml:

bash src/compile_deps.sh

Introduce the paths to your samples in samples.tsv.
Run the pipeline as is:

snakemake --use-conda --jobs

or run it inside a Docker container:

bash src/docker_run.sh -j 4

3. File organization

The hierarchy of the folder is the one described in A Quick Guide to Organizing Computational Biology Projects:

smsk_selection
├── data: raw data, downloaded fastas, databases,....
├── README.md
├── Snakefile: Pipeline runner
├── results: processed data.
|   ├── busco: SCOs identified
|   ├── cdhit: clustered transcriptome
|   ├── homologs: clustered orthogroups as in Yang et al.
|   ├── orthofinder: clustered orthogroups by orthofinder
|   ├── selection: alignments and positive selection results
|   ├── transcriptome: links to input transcriptomes
|   ├── transdecoder: predicted CDS
|   ├── tree: ML and bayesian species tree from 4fold degenerate sites
|   └── trinotate: transcriptome annotation
└── src: additional source code, tarballs, snakefiles, etc.

4. Requirements

To run this pipeline it should be only necessary to have snakemake and conda / mamba. They together are able to download the required packages to run each step.

In case of doubt, the Dockerfile contains the list of the required packages to install.

Name		Name	Last commit message	Last commit date
Latest commit History 392 Commits
bin		bin
data/transcriptomes		data/transcriptomes
src		src
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.travis.yml		.travis.yml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
Snakefile		Snakefile
cluster.yml		cluster.yml
features.yml		features.yml
params.yml		params.yml
rulegraph.svg		rulegraph.svg
samples.tsv		samples.tsv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

smsk_selection: A Snakemake pipeline to find orthologs and marks of positive selection

1. Description

2. First steps

3. File organization

4. Requirements

Bibliography

About

Releases

Packages

Languages

License

jlanga/smsk_selection

Folders and files

Latest commit

History

Repository files navigation

smsk_selection: A Snakemake pipeline to find orthologs and marks of positive selection

1. Description

2. First steps

3. File organization

4. Requirements

Bibliography

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages