16.2.22 UTAP requirements & description (Weizmann users)

UTAP: User-friendly Transcriptome Analysis Pipeline

 

UTAP is an intuitive and scalable pipeline, which enables fast and user-friendly data analysis. The pipeline executes the full process, starting from sequences or count matrix. Output files are placed in a structured folder system, and summarization of the results is displayed in a rich and comprehensive report.


Before you start:

In order to run a transcriptome analysis, your lab must have:

  1. An account on Wexac

  2. Sufficient free storage space (> 400Gb)

Requirements 1 and 2 can be set up by your department secretary or administrator.

3. A "Collaboration" folder within your lab folder, with read and write permissions for the LSCF (Life Science Core Facility) Bioinformatics Unit.

Must be set up by the computing center (contact hpc@weizmann.ac.il). 

4. Input data folders in the Collaboration folder with a required structure.

Fastq files must be organized, directly under the selected folder (root folder), into subfolders as shown below.  The subfolders names are derived from the sample names.

Fastq files must  start with the same sample name as the subfolders and end with "_R1.fastq"  (or "_R1.fastq.gz") for single-read data. The files can be compressed (.fastq.gz files) or not (.fastq).

For each protocol the corresponding files must have IDENTICAL names except for the suffix, for example: SAMPLE1_R1.fastq" (or "SAMPLE1_R1.fastq.gz") and SAMPLE1_R2.fastq" (or "SAMPLE1_R2.fastq.gz").

The required read files for each protocol are detailed in the table below:

NOTES:

  1. SE- single end, PE- paired end.

  2. Illumina compatible protocols refer to all protocols that have the barcode in the adaptors, including: RNA-Seq, ChIP-Seq, ATAC-Seq & Ribo-Seq.

Files structure example:

  • root_folder 

    • sample1

      • sample1_R1.fastq

      • sample1_R2.fastq

      • sample1_R3.fastq

    • sample2

      • sample2_R1.fastq.gz

      • sample2_R2.fastq.gz

      • sample2_R3.fastq

The pipeline also supports the convention of the fastq file format  _S*_L00*_R1.fastq or _S*_L00*_R1_0*.fastq.

For example:

  • root_folder 

    • sample1

      • sample1_S0_L001_R1_001.fastq

      • sample1_S0_L001_R1_002.fastq

      • sample1_S0_L002_R1_001.fastq

      • sample1_S0_L002_R1_002.fastq

      • sample1_S0_L001_R2_001.fastq 

      • sample1_S0_L001_R2_002.fastq

      • sample1_S0_L002_R2_001.fastq

      • sample1_S0_L002_R2_002.fastq

    • sample2

      • sample2_S0_L001_R1_001.fastq

      • sample2_S0_L001_R1_002.fastq

      • sample2_S0_L002_R1_001.fastq

      • sample2_S0_L002_R1_002.fastq

      • sample2_S0_L001_R2_001.fastq 

      • sample2_S0_L001_R2_002.fastq

      • sample2_S0_L002_R2_001.fastq

      • sample2_S0_L002_R2_002.fastq


Running analyses in the portal:

After completing all of these UTAP setup requirements, login to the http://utap.wexac.weizmann.ac.il website via Firefox or Chrome (the pipeline is NOT compatible with Internet Explorer) with the user name and password of your Weizmann account.

In the portal there are four sections:  e-learning, User Datasets, Upload data and Run pipeline, with links to them in the upper navigation bar as shown below:

  1. e-learning: contains a link to interactive e-learning modules (for RNA-Seq & MARS-Seq) with detailed explanations of the library structure, pipeline analysis & report components.

The module also includes instructions how to run the pipeline.

Note: it is highly recommended to go over the e-learning to understand the analysis & results.

2. User Datasets: contains a table with all of the available information about the user's analysis runs.

This includes the name, status of each run (RUNNING/SUCCESSFUL/FAILED), the relevant pipeline, and the date & time of the run execution.

Refresh the page to see if any of the statuses has changed. In addition, an email is sent at the end of each run. 

3. Upload data: Currently not available for fastq files.

You can upload a folder with a gene counts matrix to your Collaboration folder.

To run the uploaded matrix, you need to run the pipeline using “DESeq2 from counts matrix”.

4. Run pipeline:

The following pipelines are supported, please select for further information:

1) Transcriptomes from RNA-Seq, MARS-Seq, SCRB-Seq

2) DESeq2 from counts matrix   

3) ATAC-Seq 

4) ChIP-Seq 

5) Ribo-Seq

6) Demultiplexing from BCL files

7) Demultiplexing from RUNID

8) Demultiplexing from FASTQ (need to add a page)

List of links 

Pipeline website for Weizmann Institute users: https://utap.wexac.weizmann.ac.il/

Information on UTAP versions

Acknowledgments

Citation:  

Kohen et al. BMC Bioinformatics (2019) 20:154 https://doi.org/10.1186/s12859-019-2728-2 (PMID: 30909881)

Bioinformatics support staff for UTAP: 

  • UTAP development and maintenance team:  utap@weizmann.ac.il

  • Dena Leshkowitz

  • Gil Stelzer

  • Bareket Dassa

  • Noa Wigoda

Visit our web site http://www.weizmann.ac.il/LS_CoreFacilities/bioinformatics-lscf/about