The bioinformatics unit provides several services for obtaining NGS data. The entire process can be divided into 2 main steps -
UTAP: User-friendly Transcriptome Analysis Pipeline
From samples to analyzed NGS data & UTAP#Analysis pipeline steps and reports - an automatic pipeline for analyzing transcriptome data
The 2 steps can be setup at the first entry point, before sequencing has completed, or stopping after sequencing has completed and executing the transcriptome analysis pipeline at a later time.
In order to sequence your samples on the NextSeq machines you are required to have an account (userID and password) on the SusanC3 server. You may request a userID from Irit Orr (irit.orr@weizmann.ac.il or 08-934-2470).
In order to upload your sequencing data to the Illumina cloud services you may need an Illumina account.
Once you login to SusanC3, select "Apps" and then "NGS Start Run"
Select the NextSeq machine you are using
Raw (Bcl and Fastq) output files from the NextSeq machine are temporarily stored on the Stefan server (same userID and password as SusanC3). Upon sequencing completion and automatic transfer to the Stefan server, you may choose to simply download raw fastq files from the Stefan server (created by the NextSeq machines). Instructions may be found here - Getting your NextSeq data.
In order to use post-sequencing services, such as demultiplexing and quality control please select these services as appears below
If you wish demultiplexing and quality control services, you are required to upload a sample sheet to the SusanC3 server 5 minutes (at least) after a NextSeq machine has begun running and before sequencing has completed. The sample sheet must be prepared in a format (csv or xlsx) corresponding to one of three experiment protocols -
We recommend you test the sample sheet format here. The same sample sheet may contain samples for several users which should be detailed within the file.
DO NOT mix protocols on a run.
Note: Valid characters for sample names are A-Z a-z 0-9 . _ -
Do NOT use special characters such as " ' ` ? , ; + = @ # $ % ^ & () [] {} <> / \ in sample names.
Do NOT use Hebrew, Arabic, Chinese or any character-set other than English (Roman alphabet)
The sample sheet will be analyzed and you may add notes at this stage
You may review the sample names before submission
If a sample sheet is uploaded in the correct format, demultiplexing and quality control will be executed automatically.
An email will be sent to you informing of sequencing completion with instructions for downloading Bcl and\or Fastq files. If you selected to use post-sequencing services the files will be demultiplexed and an additional link for viewing QC results will be included in the email (http://stefan.weizmann.ac.il/fqc/RUN_ID).
Please backup your files and delete them from the Stefan server using the following service. Note that your files will be automatically deleted 3 months after they are created.
In order to run a transcriptome analysis your lab must have -
Requirements 1 and 2 may be setup by your administrator. Requirement 3 must be setup by the computing center (hpc@weizmann.ac.il).
In addition you must have a userID on Wexac, which may be setup by your administrator.
Transcriptome analysis of sequencing results that immediately follows demultiplexing and quality control can be setup at the first entry point (after uploading a sample sheet).
In the final step of this setup process please click "Run pipeline" for setting up the transcriptome pipeline
You may also save the link (the parameters will change according to the uploaded sample sheet) as appearing above if you wish to run the pipeline at a later time.
Notice: users that defined as "superuser" on UTAP system cannot use the link. These users need to cancel the superuser definition and then to use the link. See instructions here: UTAP - admin site
Fill in the project name, select the genome and annotation.
If your portocol is RNA-seq, you will get this screen:
Fill in the project name, select the genome and annotation.
Select if your protocol is starnded (the sequenced reads saves the original strand of RNA fragments) or non-stranded. If you don't know, select in "find automatically" option.
Type your adapters on each read (R1 and R2). These adapters will be removed from the reads by the pipeline. You can remain the default adapters if you use with P5 and P7 adapters of True-seq protocol).
Select if you desire to identify differentially expressed genes using the DESeq2 package DESeq2 manual. If you selected this option, by default, two categories must be created (fill in the category names)
Sort the samples by selecting them and using the arrows to move to the appropriate categories
You may add additional categories by using the proper butttons
If the samples were prepared in different pools, you can add this information: After the moving the samples into categories boxes, click on "Add Batch Effect" button, then select the samples that belongs to one batch and click on "Batch 1" button. Return on the operation with the other groups of the samples.
All steps of the pipeline (mapping, counts etc.) will be run on all samples, but Deseq will be run only on the samples with categories.
Finally, submit the run for analysis. When sequencing has completed, the files will be copied from the Stefan (temporary) server to the Collaboration folder within your lab folder on Wexac and analysis will be executed as defined.
The pipeline website:http://ngspipe.wexac.weizmann.ac.il:7000
The transcriptome pipelines run on the Wexac cluster. In order to run a new transcriptome analysis you must first transfer demultiplexed sequencing data (fastq files) to your Collaboration folder on Wexac. If you setup the transcriptome analysis as described above after uploading a sample sheet, your demultiplexed fastq files will be automatically copied to the appropriate Collaboration folder in your lab's Wexac account. Within the Collaboration folder, a directory structure will be created according to the transcriptome analysis setup.
If you wish to run an analysis AFTER sequencing has completed using the link provided in the email you received (upon demultiplexing and QC completion) paste the link in a browser. Thereupon, you only need to setup the analysis (enter project name, create categories etc) and the files will be copied for you to your collaboration folder on Wexac as described above.
If you wish to run a new analysis from existing files in the Collaboration folder or you uploaded data to Wexac from an external source (sequencing data not performed in the LSCF) login to http://ngspipe.wexac.weizmann.ac.il:7000 using your Weizmann userID and password. Click "Run pipeline"
Select the pipeline you desire
Browse within your Collaboration directory structure as and Select the root folder of the sub-folders of the samples for analysis with the appropriate button. Note that if you wish to go up one level (or more) please click the desired folder level on the path at the top of the window.
Fastq files must be orginazed, within the selected folder (root folder), into subfolders as shown below. The subfolders names are derived from the sample names.
Fastq files must start with the same sample name as the subfolders and end with "_R1.fastq" (or "_R1.fastq.gz") for single-read data . In the case of paired-end data (required for Mars-Seq), corresponding files must exist that are IDENTICAL in their name except for the ending "_R2.fastq" (or "_R2.fastq.gz") instead of "_R1.fastq".
Where R is the read number.
For example:
The pipeline also support the convention of the fastq file format _S*_L00*_R1.fastq or _S*_L00*_R1_0*.fastq.
For example:
If there is an error with the folder you selected, you need to fix it in order to run the pipeline.
The output folder should be different from the one automatically filled in (based on the selected input folder), select the desired output folder.
Continue by filling in all the fields and if you select to run DESeq2, create categories and sort the samples accordingly as shown in analysis setup BEFORE sequencing completion.
Finally, submit the run.
The steps performed by the pipeline -
Steps 4 and 6 are performed only for Mars-Seq
Steps 6 is performed only if DESeq2 is selected
Upon completion you will get an email with links to the results report
The report includes several sections -
For the counts of the reads per gene we use with annotation files (gtf format) from "ensemble" or "gencode". In MARS-seq analysis we extend the 3' UTR exon away from the transcript on the DNA and extend or cut the 3" UTR exon towards the 5' direction on the mRNA.
RNA-Seq example: public data set from Klepikova AV et al. BMC Genomics. 2015 Jun 18;16:466
https://bip.weizmann.ac.il/rna-seq
Mars-seq example: public data set from Feigelson SW et al. Cell Rep. 2018 Jan 23;22(4):849-859
https://bip.weizmann.ac.il/mars-seq
Please regard this analysis as a good starting point and not an end result.
Other pipelines available through UTAP:
See : ATAC-Seq manual
Pipeline nextseq:
Help page:
https://susanc3.weizmann.ac.il/ngsb/howto
QC:
http://stefan.weizmann.ac.il/fqc/{type RUN ID here}
List of runs and Deleting run:
http://susanc3.weizmann.ac.il/ngsb/storage
Transcriptome pipeline:
http://ngspipe.wexac.weizmann.ac.il:7000
Demo UTAP interface:
UTAP - User-friendly Transcriptome Analysis Pipeline (for external users)
Kohen et al. BMC Bioinformatics (2019) 20:154 https://doi.org/10.1186/s12859-019-2728-2