Plotting #3: Sequencing QC Plots/Analysis

Plotting Sequencing/Alignment Metrics

In addition to other QC metrics previously discussed it can also be helpful to plot the metrics generated during alignment of sequencing data to see if there are differences that might arise as result of sequencing differences as opposed to biological differences.

For this tutorial, I will be utilizing de-identified example data from my own experience but I will walk through process for using your own data.

library(ggplot2)
library(dplyr)
library(magrittr)
library(Seurat)
library(scCustomize)
library(qs)

Loading Data

Before we can plot these metrics we first need to import raw data.

Accepted Data Types

These functions currently only work with data from 10X Genomics processed via Cell Ranger. If there is interest in adapting to other platforms I’d be happy to entertain PRs.

Data Location

Inside the output folder from Cell Ranger count is a file called “metrics_summary” which contains the metrics normally displayed in the html summary. See abbreviated example below.

Parent_Directory
├── sample_01
│   └── outs
│       └── filtered_feature_bc_matrix/
│       └── raw_feature_bc_matrix/
│       └── web_summary.html
│       └── metrics_summary.csv
└── sample_02
│   └── outs
│       └── filtered_feature_bc_matrix/
│       └── raw_feature_bc_matrix/
│       └── web_summary.html
│       └── metrics_summary.csv
└── sample_03

NOTE: If the path to these files on your system is different that the default 10X path be sure to specify the secondary_path param and set default_10X = FALSE when importing data.

Support for Cell Ranger `multi` pipeline

Read_Metrics_10X includes support for data processed using Cell Ranger multi pipeline by setting the parameter cellranger_multi = TRUE. The function will automatically alter it’s default secondary path value and will return both gene expression and TCR metrics as separate data.frames.

Read Data

We can read all metrics files within a given parent directory using the Read_Metrics_10X function.

raw_metrics <- Read_Metrics_10X(base_path = "path/to/parent/directory/", default_10X = TRUE)

head(raw_metrics)

The result of this function will be a data.frame with one row per library.

sample_id	Estimated_Number_of_Cells	Mean_Reads_per_Cell	Median_Genes_per_Cell	Number_of_Reads	Valid_Barcodes	Sequencing_Saturation	Q30_Bases_in_Barcode	Q30_Bases_in_RNA_Read	Q30_Bases_in_UMI	Reads_Mapped_to_Genome	Reads_Mapped_Confidently_to_Genome	Reads_Mapped_Confidently_to_Intergenic_Regions	Reads_Mapped_Confidently_to_Intronic_Regions	Reads_Mapped_Confidently_to_Exonic_Regions	Reads_Mapped_Confidently_to_Transcriptome	Reads_Mapped_Antisense_to_Gene	Fraction_Reads_in_Cells	Total_Genes_Detected	Median_UMI_Counts_per_Cell
sample_1	2526	193933	724	489875507	96.3%	89.8%	96.0%	94.2%	95.7%	95.9%	86.5%	5.2%	21.7%	59.5%	56.1%	2.3%	60.1%	21105	1840
sample_2	2642	179146	986	473302676	97.7%	88.3%	95.9%	94.3%	95.6%	95.8%	88.7%	4.6%	18.4%	65.7%	63.0%	1.7%	56.7%	21321	3394
sample_3	7337	79420	2454	582707755	93.5%	72.3%	96.5%	94.8%	96.2%	95.5%	90.8%	7.1%	20.1%	63.7%	58.3%	4.3%	72.5%	23556	6685
sample_4	2352	165755	722	389854975	97.0%	86.9%	96.0%	94.1%	95.6%	95.0%	84.0%	4.8%	20.3%	58.9%	55.7%	2.2%	66.4%	21141	2360
sample_5	3697	152621	794	564238318	97.7%	89.2%	96.0%	94.3%	95.7%	96.3%	85.7%	3.8%	20.9%	61.0%	58.1%	1.9%	66.3%	21764	2224
sample_6	5043	101180	1126	510250215	94.5%	74.6%	96.4%	94.8%	96.0%	96.1%	81.9%	5.1%	15.8%	61.0%	56.5%	3.5%	78.5%	22565	8792

Optional parameters

Similarly to the scCustomize functions for reading raw count matrices (See Read & Write Data Functions Vignette) there are a few optional parameters for Read_Metrics_10X.

secondary_path/default_10X tells the function how to navigate from the parent directory (base_path) to the “metrics_summary.csv” files.
- If the directory specified is directly from Cell Ranger output or has an identical structure (see above; parent/library_name/outs/files) then user can specify default_10X = TRUE and function will find required files.
- If structure is different then user must supply secondary_path.
- If all files are in one directory simply supply secondary_path = "".
lib_list if only a subset of libraries are desired then user can supply the names of those libraries (sub-directories) to the lib_list parameter to limit which files are read in. Default is NULL and will read all samples from parent directory.
lib_names How to name files in output data.frame. Default will use sub-directory names to label the samples. Alternatively can provide a vector a sample/library names to use instead.

Add Meta Data

In order to make the resulting plots more meaningful we need to added corresponding sample level meta data.
NOTE: Make sure that one column in your meta data data.frame matches the “sample_id” column to metrics data.frame so that they are properly joined.

head(meta_metrics)

sample_id	Seq_Batch	Exp_Batch	Full_Batch
sample_1	A	1	1_A
sample_2	A	2	2_A
sample_3	B	3	3_B
sample_4	A	1	1_A
sample_5	A	2	2_A
sample_6	B	3	3_B

Now we join the two data.frames before plotting.

metrics_final <- right_join(raw_metrics, meta_metrics)

## Joining with `by = join_by(sample_id)`

Plotting

In total there are 15 Sequencing QC plotting functions. See Reference Manual for full details.

However, to simplify plotting there are also two wrapper functions which will return patchwork layouts of several of these functions in a single plot.

Basic Metrics Summary

A set of basic metrics can be plotted using Seq_QC_Plot_Basic_Combined

Seq_QC_Plot_Basic_Combined(metrics_dataframe = metrics_final, plot_by = "Full_Batch")

Alignment Metrics Summary

A set of alignment-based metrics can be plotted using Seq_QC_Plot_Alignment_Combined

Seq_QC_Plot_Alignment_Combined(metrics_dataframe = metrics_final, plot_by = "Full_Batch")

Basic Statistics

All of the Seq_QC_Plot_... functions also contain parameter to calculate significance between all groups using ggpubr::stat_compare_means().
NOTE: See function help pages and ggpubr for more info to make sure the right test is being used for your data.

Seq_QC_Plot_Basic_Combined(metrics_dataframe = metrics_final, plot_by = "Full_Batch", significance = T)

Barcode Rank Plots

scCustomize can also be used to create customized barcode rank plots from raw feature barcode data using outputs from DropletUtils package. scCustomize provides the function Iterate_Barcode_Rank_Plot which reads data, processes using DropletUtils, and returns plots.

The plots are adaptation of the plots from DropletUtils package moved to ggplot2 framework with some additional customization and the ability to return rastered plots.

Iterate_Barcode_Rank_Plot(dir_path_h5 = "assets/Barcode_Rank_Example/", plateau = 20000, file_path = "plots/",
    file_name = "Barcode_Rank_Plots.pdf")

Example of barcode rank plot output. (10K PBMCs from 10X Genomics Public Data).

Compiled: February 28, 2024