UTAP: DESeq from counts matrix pipeline guidelines

Pipeline to perform DESeq2 analysis from a matrix containing raw counts of genes for replicated samples per at least two conditions . 

Pipeline website: http://utap.wexac.weizmann.ac.il

Before you start:

This pipeline runs on the wexac cluster. 
Please prepare the following in advance:

  1. An account (userID) on wexac, via your department administrator.
  2. A "Collaboration" folder within your lab folder on wexac, with read and write permission for Bioinformatics Unit staff. This must be set up by the computing center (hpc@weizmann.ac.il).
  3. Sufficient free storage space on wexac (> 400Gb), via your department administrator.

Setting up a new analysis

Before running the pipeline, make sure that to prepare your counts matrix file in advance, and transferred it to the Collaboration folder.

Counts matrix file format and structure:

  1. The counts matrix file should be in excel or txt (tab delimited file) format.
  2. The matrix should contain the genes as rows (each row is a gene)  and the samples as columns (each column, other than the first,  is a sample). 
  3. When creating the counts matrix in excel, make sure to initially format the cells in the genes column to be Text (right-click on the column and select Format cells) to ensure that all gene symbols, including those that look like dates, retain the names that you enter. Unfortunately, the default General format in excel causes gene symbols that look like dates (e.g. SEPT6,  MARCH5) to be treated as customized dates (with format d-mmm - 6-Sep, 5-Mar for our examples), and sometimes produces duplicated names. (Note that subsequently changing the format to Text does NOT bring back the names that were originally entered; instead, they are converted to numbers!). HGNC has committed to changing such names but it might take some time...

Then, login to utap.wexac.weizmann.ac.il via Firefox or Chrome (the pipeline is NOT compatible with Internet Explorer) using your Weizmann userID and password, and click on Run pipeline:

Click on  DESeq_from counts_matrix in the Choose pipeline box

Choose your counts matrix file using 'Input folder' and click on run DESeq in order to identify differentially expressed genes with the DESeq2 package as described in the DESeq2 manual.

All of the samples from the counts matrix file will be parsed to the popup "choice box" .

Fill in you desired report folder name in the relevant field.

When choosing to run DESeq2 (with the 'Deseq run'), at least  two categories must be created (by filling in the category names and dragging the relevant samples). Additional explanations can be found in the page-

UTAP: Transcriptome (RNA -seq and MARS-seq) pipelines guidelines

Transcriptome pipeline for Weizmann Institute users:  http://utap.wexac.weizmann.ac.il

Demo of the UTAP interface (for internal and external users):  http://utap-demo.weizmann.ac.il

Acknowledgments

Citation

Kohen et al. BMC Bioinformatics (2019) 20:154 https://doi.org/10.1186/s12859-019-2728-2 (PMID: 30909881)

Bioinformatics support staff for UTAP: 

  • UTAP development and maintenance team:  utap@weizmann.ac.il
  • Dena Leshkowitz
  • Ester Feldmesser 
  • Gil Stelzer
  • Bareket Dassa
  • Noa Wigoda

Visit our web site http://www.weizmann.ac.il/LS_CoreFacilities/bioinformatics-lscf/about