Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Analysis setup 

...

Anchor
Analysis pipeline steps and reports
Analysis pipeline steps and reports
Analysis pipeline steps and reports

The steps performed by the pipeline include:

  1. Trim adapter sequences
  2. Fastqc for quality control of the samples (run in parallel with the other steps)
  3. Map reads to the selected reference genome
  4. When the protocol is MARS-Seq, add UMI and gene information to the reads 
  5. Quantify gene expression by counting reads
  6. When the protocol is MARS-Seq and DESeq2 is selected, count UMIs to correct for PCR duplications. 
  7. Detect Differentially Expressed (DE) genes for a model with a single factor

Steps 4 and 6 are performed only for MARS-Seq


Upon completion, you will get an email with links to the results report. For an interactive detailed explanation of the report use the relevant e-learning module.

The report includes several sections:

  1. Sequencing and Mapping QC
    1. Figure 1 - Plots the average quality of each base across all reads. Qualities of 30 (predicted error rate 1:1000) and up are good
    2. Figure 2 - Histogram showing the number of reads for each sample in the raw data
    3. Figure 3 - Histogram showing the percent of reads discarded after trimming the adapters (after the removing of the adapters, some read and polyA/T or low quality reads may be too short; the pipeline discards them)
    4. Figure 4 - Histogram with the number of reads for each sample in each step of the pipeline
    5. Figure 5 - Plots showing sequence coverage on and near gene regions
    6. Figure 6 -
      1. Histogram showing the percent of reads that mapped uniquely and not uniquely per sample
      2. Histogram showing the percent of the uniquely mapped reads that mapped to genes (genes included must have at least 5 reads)
  2. Exploratory Analysis
    1. Figure 7 - Heatmap plotting the highly-expressed genes (above 5% of total expression). For example the expression of gene RN45S in sample SRR3112243 amounts to 15% of the expression
    2. Figure 8 - Heatmap of Pearson correlation between samples according to the gene expression values
    3. Figure 9 - Clustering dendogram dendrogram of the samples according to the gene expression
    4. Figure 10 - PCA analysis
      1. Histogram of % explained variability for each PC component
      2. PCA plot of PC1 vs PC2 c. PCA plot of PC1 vs PC3
  3. Differential Expression Analysis (this section exists only if you run the DESeq2 analysis) - a table with the number of differentially expressed (DE) genes  in each category (up/down) for the different contrasts.  In addition, links for p-value distribution, volcano plots and heatmaps, as well as a table of the DE genes with dot plots of their expression values
  4. Bioinformatics Pipeline Methods - description of the utilized pipeline methods 
  5. Links to additional results - links for downloading tables with raw, normalized counts, log normalized values (rld) and statistical data of contrasts. In the case of model with batches, "combat" values are calculated (instead of rld) using the "sva" package, and are batch corrected normalized log2 count values.

Annotation file:

For the counts of the reads per gene we use with annotation files (gtf format) from RefSeq or GENCODE (more elaborate i.e. contains more genes and transcripts) . In MARS-Seq analysis we use a modified version of the gene that includes 1000 bp upstream   of the TES (transcription end site) on the transcript and 100 bp downstream of the TES. 


Please regard this analysis as a good starting point and not an end result.


LINK:

Transcriptome pipeline for Weizmann Institute users:  http://utap.wexac.weizmann.ac.il

...