SPLASH 2

Introduction

SPLASH is an unsupervised and reference-free unifying framework to discover regulated sequence variation through statistical analysis of k-mer composition in both DNA and RNA sequence. SPLASH leverages our observation that detecting sample-regulated sequence variation, such as alternative splicing, RNA editing, gene fusions, V(D)J, transposable element mobilization, allele-specific splicing, genetic variation in a population, and many other regulated events can be unified–in theory and in practice. This is achieved with a simple model, SPLASH, that analyzes k-mer composition of raw sequencing reads (Chaung et al. 2022). SPLASH finds constant sequences (anchors) that are followed by a set of sequences (targets) with sample-specific target variation and provides valid p-values. SPLASH is reference-free, sidestepping the computational challenges associated with alignment and making it significantly faster and more efficient than alignment, and enabling discovery and statistical precision not currently available, even from pseudo-alignment.

The first version of SPLASH pipeline proved its usefulness. It was implemented mainly in Python with the use of NextFlow. Here we provide a new and improved implementation based in C++ and Python (Kokot et al. 2024). This new version is much more efficient and allows for the analysis of datasets >1TB size in hours on a workstation or even a laptop.

How does it work

A key concept of SPLASH is the analysis of composition of pairs of substrings anchor–target across many samples. The substrings can be adjacent in reads or can be separated by a gap.

The image below presents the SPLASH pipeline on a high-level.

Compactors

Compactors is a new statistical approach to local seed-based assembly. It comes as a part of SPLASH package and was particularly suited to assemble regions divere across across samples (see figure below). However, it can be used as an independent assembler on any types of seeds provided by the user.

Installation, usage, example

Please visit our Wiki page.

References

Marek Kokot, Roozbeh Dehghannasiri, Tavor Baharav, Julia Salzman, and Sebastian Deorowicz. Scalable and unsupervised discovery from raw sequencing reads using SPLASH2, Nature Biotechnology (2024), https://doi.org/10.1038/s41587-024-02381-2

Kaitlin Chaung, Tavor Baharav, Ivan Zheludev, Julia Salzman. A statistical, reference-free algorithm subsumes myriad problems in genome science and enables novel discovery, bioRxiv (2022)

Tavor Baharav, David Tse, and Julia Salzman. An Interpretable, Finite Sample Valid Alternative to Pearson’s X2 for Scientific Discovery, bioRxiv (2023)

George Henderson, Adam Gudys, Tavor Baharav, Punit Sundaramurthy, Marek Kokot, Peter L. Wang, Sebastian Deorowicz, Allison F. Carey, Julia Salzman. Ultra-efficient, unified discovery from microbial sequencing with SPLASH and precise statistical assembly bioRxiv 2024.01.18.576133 (2024)

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
OASIS		OASIS
analysis_notebooks		analysis_notebooks
build_tools		build_tools
example		example
libs		libs
postprocessing		postprocessing
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
.gitmodules		.gitmodules
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SPLASH_build.R		SPLASH_build.R
SPLASH_extendor_classification.R		SPLASH_extendor_classification.R
SPLASH_splicing_analysis_notebook.Rmd		SPLASH_splicing_analysis_notebook.Rmd
build_docker.py		build_docker.py
build_release.py		build_release.py
download_kmc.py		download_kmc.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SPLASH 2

Introduction

How does it work

Compactors

Installation, usage, example

References

About

Releases 7

Packages

Contributors 5

Languages

License

refresh-bio/SPLASH

Folders and files

Latest commit

History

Repository files navigation

SPLASH 2

Introduction

How does it work

Compactors

Installation, usage, example

References

About

Resources

License

Stars

Watchers

Forks

Releases 7

Packages 0

Contributors 5

Languages

Packages