User Guide
Introduction
This Shiny app is a wrapper around DESeq2, an R package for “Differential gene expression analysis based on the negative binomial distribution”.
It is meant to provide an intuitive interface for researchers to easily upload, analyze, visualize, and explore RNAseq count data interactively with no prior programming knowledge in R.
This tool supports simple or multi-factorial experimental design. It also allows for exploratory analysis when no replicates are available.
The app also provides svaseq Surrogate Variable Analysis for hidden batch effect detection. The user can then include Surrogate Variables (SVs) as adjustment factors for downstream analysis (eg. differential expression). For more information on svaseq, go to this link
For details on how this package is used for RNASeq count data analysis and visualization, see documentation
Various visualizations and output data are included:
* Click image to enlarge
1. Example data (simple/multi-factorial experiment)
-
It is recommended for the user to try oneor both of the pre-loaded example data sets to carry out the analysis and get familiar with the app. Then the user should be able to replicate the analysis on their own datasets.
-
The simple factorial example loads the Tissue Comparison Data (Human) (RNASeq counts) that belongs to this published study
-
The multi-factorial example loads the Mouse Data (RNASeq counts and experiment design metadata) that belongs to this published study
2. Upload your own data (gene counts)
-
A .csv/.txt file that contains a table of the gene counts
-
The first column should have gene names/ids followed by columns for sample counts. The file can be either comma or tab delimited
-
If your counts are not merged, you can use this Gene Count Merger to consolidate all your sample count files
-
For convenience, if this is a simple factor experiment and samples contain replicates and sample names are denoted by underscore and the replicate number (see figure 2), then the conditions will be automatically set by parsing the samples/replicate numbers
-
If no replicates, then select No replicates option to help set the default experiment conditions for the next step. This is necessary because newer versions of DESeq2 (> 1.22.0) do not work on experiments with no replicates. See here for more details
-
Avoid using special characters or spaces in sample names (figure 1), other than underscores to denote replicates (figure 2)
-
First column can either contain gene.ids or gene.names
-
Prefilter: You can also set a minimum number of counts per gene to include
-
For a simple-factor experiment sample counts file, download and view sampleCounts.csv file
-
For a multi-factorial experiment example file, download and view this counts file and the following metadata file.
-
Experimental design meta data can either be uploaded as a csv file, or constructed in-page with in the “Edit Conditions” step
figure 1 (No Replicates)
figure 2 (Replicates)
Setup experiment condition table
-
By default, if there are replicates, the sample name will be set as the condition for those samples
-
For example, if we take samples with replicates (figure 2), the default condition table will be:
figure 3 (Condition Table)
Upload Gene Counts
(select .CSV)
.csv/.txt counts file (tab or comma delimited)
Gene Counts Table
Design Formula
Conditions/Factors
Option 1) Edit Table:
Option 2) Upload Experiment design table (meta table)
.csv/.txt counts file (tab or comma delimited)
1. Initialize DESeq2 Dataset
Initialize DESeq2 dataset with current counts and experimental design conditions
Surrogate variable analysis (svaseq): hidden batch effects
We can sometimes identify the source of batch effects, and by using statistical models, we can remove any sample-specific variation we can predict based on features like sequence content or gene length. Here we use Surrogate Variable Analysis (SVA), which doesn’t require the use of knowing exactly how the counts will vary across batches. It uses only the biological condition, and looks for large scale variation which is orthogonal to the biological condition. This approach requires that the technical variation be orthogonal to the biological conditions.
For more information, see following link
Estimate Surrogate Variables (SVA)
Run DESeq
DESeq run settings:
The DESeq function performs Differential Expression analysis based on the Negative Binomial Distribution using the following steps:
- 1. estimation of size factors
- 2. estimation of dispersion
- 3. Negative Binomial GLM fitting and Wald statistics
Design Formula:
Showing only the first 5 rows of colData table:
(Optional) Surrogate Variable Analysis (SVA)
Run Surrogate Variable Analysis (for hidden batch detection)
You may choose to include computed Surrogate Variables (SVs) in your design formula for downstream differential expression analysis
Regularized Log Transformation
This function transforms the count data to the log2 scale in a way which minimizes differences between samples for rows with small counts, and which normalizes with respect to library size. The rlog transformation produces a similar variance stabilizing effect as varianceStabilizingTransformation, though rlog is more robust in the case when the size factors vary widely. The transformation is useful when checking for outliers or as input for machine learning techniques such as clustering or linear discriminant analysis.
Variance Stabilizing Transformation
This function calculates a variance stabilizing transformation (VST) from the fitted dispersion-mean relation(s) and then transforms the count data (normalized by division by the size factors or normalization factors), yielding a matrix of values which are now approximately homoskedastic (having constant variance along the range of mean values). The transformation also normalizes with respect to library size. The rlog is less sensitive to size factors, which can be an issue when size factors vary widely. These transformations are useful when checking for outliers or as input for machine learning techniques such as clustering or linear discriminant analysis.
Differential Expression Analysis
VS
Volcano Plot
Venn Diagram
Threshold Settings:
Gene Expression Boxplot
Plot Settings:
Heatmap
* This heatmap uses normalized counts which can be viewed/downloaded below the figure