DE analysis App

Introduction

This interactive web application (DEApp) is developed in R with Shiny to 1). conduct differential expression (DE) analysis with edgeR, limma-voom, and DESeq2 based on the provided input data (raw count results and experimental design information); 2). cross-validate the DE analysis results with these 3 different DE analysis methods.

The goal of this App is to provide biologists with an easy way to conduct and cross-validate DE analysis with 3 different methods on their own data.

If your are ready for the DE analysis on your own data, you can skip the below introductions, uploade the 'Signel-factor Experiment' data through page, and 'Multi-factor Experiment' data through page.

1. Data input

The input data for this App is 2 files in '.txt' or '.csv' format, they are named as 'Raw Count Data' and 'Meta-data Table'.

1.1 Raw Count Data

This input file should contain summarized count results of all samples in the experiment, an example of the expected input data format is presented as below:

Where, columns correspond to samples, rows correspond to mapped genomic features (e.g. genes, exons, transcript, miRNAs etc.).

An example of demo 'Raw Count Data' input text file for single-factor experiment used in this App is provided in the 'data' folder named as 'pnas-count_singleFactor.txt', it is also accessible here.

1.2 Meta-data Table

This input data contains summarized experimental design information for each sample. This App is able to conduct DE analysis of both single-factor and multi-factor experiments, and the experiment design information is illustrated on the below 'Meta-data Table':

1.2.1 Single-factor Experiment

If the experiment has one single experimental factor, such as 'Group', the input 'Meta-data Table' file should be prepared as below:

An example of corresponding demo 'Meta-data Table' text file for single-factor experiment used in this App is provided in the 'data' folder named as 'pnas-count_singleFactor-meta.txt', it is also available here.

1.2.2 Multi-factor Experiment

For multi-factor experiment, the 'Meta-data Table' should look as below:

An example of the 'Meta-data Table' csv file for multi-factor experiment used in this App is provided in the 'data' folder named as 'ReadCounts-Chen-edgeRSpringer-multiFactor-meta.csv', it is also available here. The corresponding 'Raw Count Data' csv file is also provided in the 'data' folder named as 'ReadCounts-Chen-edgeRSpringer-multiFactor.csv', it is accessible here.

2. Filter low expression tags

For the 'Data Summarization' section in this App, it is aiming to filter out genetic features with very low counts. The guideline of this step is to keep genetic features which are expressed in at least one sample out of each factor level.

For example, if there are 3 combined factor levels in the experiment, then at least 3 samples should have expression level above the expression cutoff value presented in count per million (CPM).

The expression cutoff (CPM) value is determined according to the library size and normalization factors with formula $$\text{CPM} = \frac{\text{counts}}{\text{library size} * \text{normalization factors} * 10^{-6}}$$ For example, if the expression cutoff CPM value is 10, the library size and normalization factors are estimated approximately equal to $\ 2 \text{ x} 10 ^ 6$ and 1 for majority samples, then 10 CPM expression cutoff corresponds to about 20 read counts. Therefore, in this example genetic features in more than 3 samples have less than 20 read counts (10 CPM) is classified as low expression genetic features and are removed for further downstream DE analysis.

3. DE analysis methods

This App implements 3 different methods to conduct DE analysis, they are edgeR, limma-voom, and DESeq2 , which are implemented in the 'DE analysis' section. For the single-factor experiment, the DE analysis could be conducted between any 2 levels of that single-factor; for the multi-factor experiment, the DE analysis could be conducted in a way to combine all the experimental factors into combined factors, so that DE analysis can be conducted in any 2 selected combined factor levels.

Analysis Workflow

Step 1: Data Input Upload your input data ('Raw Count Table' and 'Meta-data Table') via 'Data Input' section panel for single-factor or multi-factor experiment, a summary of your input data will be presented.

Step 2: Data Summarization Filter out the low expression genetic features via 'Data Summarization' section panel, summarized count results after filtering will be presented here.

Step 3: DE analysis (3 methods) Conduct DE analysis on the 'Data Analysis' section.

Step 4: Methods Comparison Compare/cross-validate DE analysis results via 'Methods Comparison' section panel.

The analysis execution workflow for this App is illustrated in a pdf file, that can be downloaded here.

Frequently Asked Questions and Answers

Q1. Can I use this App to analyze RPKM data, such as quantified results from cuffquant?

A: No, this App can only be used to analyze raw count data via edgeR, limma-voom, and DESeq2. The recommended 'Raw Count Data' input is usually obtained from 'HTSeq' or 'featureCounts' program.

Q2. What kinds of statistical analysis methods are implemented in this App for DE analysis?

A: This App applied 3 different methods including edgeR, limma-voom, and DESeq2 to conduct DE analysis.

Q3. Can I use this App to conduct DE analysis from a multi-factor experiment?

A: Yes, this App supports both single factor and multi-factor experiment's DE analysis, which reflects in the 'Meta-data Table' input. For the multi-factor experiment, the DE analysis is conducted in a way to combine all the experimental factors into one combined factor, so that DE analysis can be conducted in any 2 chosen combined factor levels. Additionally, please make sure there is more than 1 biological replicate (>=2 samples) for each factor/combined-factor levels.

Q4. What is normaliation factor used for DE analysis in DEApp?

A: According to the edgeR user's guide, the normalization factor presented in the 'Data Summarization' panel is calculated with calcNormFactors() function in the edgeR, this function normalizes for RNA composition by finding a set of scaling factors for the library sizes that minimize the log-fold changes between the samples for most genes. The default method for computing these scale factors uses a trimmed mean of M-values (TMM) between each pair of samples. The product of original library size and the scaling factor will be used in the downsteam DE analyses.

Q5. What is the maximum size of uploading data? What can I do if exceeding the maximum uploading data size?

A: The maximum size of uploading data is 30Mb. If your data exceeds the maximum size, you can install the application in your local computer.

Feedback

This App is developed and maintained by Yan Li at the bioinformatics core, Center for Research Informatics (CRI), Biological Science Division (BSD), University of Chicago.

As a bioinformatics core, we are actively improving and expanding our NGS analysis services and analysis products. If you have any questions, comments, or suggestions, feel free to contact our core at bioinformatics@bsd.uchicago.edu or the developer at yli22@bsd.uchicago.edu

Input data: Single-factor Experiment

Input 1: Raw Count Data

Upload your 'Raw Count Data' here, if no file is selected, the demo file for single-factor experiment will be used and displayed.

Browse...

Separator

Comma

Semicolon

Tab

The demo file of 'Raw Count Data' for the single-factor experiment is available here

Input 2: Meta-data Table

Upload your 'Meta-data Table' here, if no file is selected, the corresponding demo file for single-factor experiment will be used and displayed.

Browse...

Separator

Comma

Semicolon

Tab

The corresponding 'Meta-data Table' of the demo file for the single factor experiment is accessible here

Input Information Summary

Input Raw Count Summary

Sample Group Information Summary

Input data: Multi-factor Experiment

Input 1: Raw Count Data

Upload your 'Raw Count Data' here, if no file is selected, the demo file for multi-factor experiment will be used and displayed.

Browse...

Separator

Comma

Semicolon

Tab

An example of full 'Raw Count Data' csv file for the multi-factor experiment is accessible here

Input 2: Meta-data Table

Upload your 'Meta-data Table' here, if no file is selected, the corresponding demo file for multi-factor experiment will be used and displayed.

Browse...

Separator

Comma

Semicolon

Tab

An example of 'Meta-data Table' csv file corresponding to the example of input 1 - 'Raw Count Data' for the multi-factor experiment is accessible here

Input Information Summary

Input Raw Count Summary

Sample Group Information Summary

Low Expression Removal

Raw Count Summary

Low Expression Removal Options

Low expression mapped genetic features will be removed with

CPM value

at least in No. Samples

After Low Expression Removal

Sample Normalization Results

Sample MDS Exploration

edgeR DE Analysis Options

DE Analysis Group Levels

Please select any 2 levels from the above available group levels for DE analysis.

Level 1

Level 2

DE Analysis Filtering Criteria

DE Analysis is based on

Nominal p-value

FDR adjusted p-value

p-value or FDR adjusted p-value

Fold Change (FC)

Estimated BCV Summary

Estimated Dispersion

DE Analysis Results

DE Results Summary

Download

Volcano Exploration

Limma-voom DE Analysis Options

DE Analysis Group Levels

Please select any 2 levels from the above available group levels for DE analysis.

Group 1

Group 2

DE Analysis Filtering Criteria

DE Analysis is based on

Nominal p-value

FDR adjusted p-value

p-value or FDR adjusted p-value

Fold Change (FC)

Estimated Dispersion

DE Analysis Results

DE Results summary

Download

Volcano Exploration

DESeq2 DE Analysis Options

DE Analysis Group Levels

Please select any 2 levels from the above available group levels for DE analysis.

Group 1

Group 2

DE Analysis Filtering Criteria

DE Analysis is based on

Nominal p-value

FDR adjusted p-value

p-value or FDR adjusted p-value

Fold Change (FC)

Estimated Dispersion

DE Analysis Results

DE Results summary

Download

Volcano Exploration

DE Analysis Comparison Options

Methods for Comparison

DE analysis method selection

edgeR

limma-voom

DESeq2

DE Analysis Group Levels

Please select any 2 levels from the above available group levels for DE analysis.

Group 1

Group 2

DE Analysis Filtering Criteria

DE Analysis is based on

Nominal p-value

FDR adjusted p-value

Nominal p-value or FDR adjusted p-value

Fold Change (FC)

Comparison Summary

Comparison Venn-Diagram

Gene list can be downloaded here:

Download