16.2.22 UTAP: SCRB-Seq analysis pipeline
SCRB-Seq analysis in UTAP pipelines
SCRB-Seq analysis can be done in UTAP in two steps. First, demultiplex from BCLs and second, run a transcriptome analysis. Available options are:
1A. Demultiplexing from BCL with or without pools
1B. Demultiplexing from BCL with dual-index
2. Transcriptome SCRB-Seq analysis
Demultiplexing from BCL
Demultiplexing SCRB-Seq libraries from BCLs to fastq files can be done on either pooled or non-pooled data.
The demultiplexing pooled data is done in two steps, the first demultiplexing using the pools index, and the second, demultiplexing using the samples index within each pools.
FastQC analysis is run after each demultiplexing step, and is summed in a multiQC report.
After the demultiplexing, the separated fastq files are being modified (R1 read is being switched with R2 read) in order to prepare the files for transcriptome SCRB-Seq analysis, which is based on MARS-Seq analysis.
Setting up a new analysis
After selecting SCRB-Seq as the sample preparation protocol you will get the following form:
Fill in the project name and select the input folder which contains your BCL-files,
The output will be written to the /home/labs/USER_LAB/Collaboration/your project name + runid
If you wish to delete the BCL-files after demultiplexing is completed, set “Delete BCL files” field to yes.
Fill in the lengths if the UMI- barcode and the barcode that you used to prepare your libraries.
Demultiplex non-pooled data
Fill in the table form (shown below) with your samples names and their indexes. If you have dual-index libraries, press on Add Dual-Index button and the table will be extended by an additional column (Index2) in which you can fill in the second Index.
The input data in this table will be use to generate the samplesheet the demultiplexing analysis.
The output folder will look as follows:
1_pools
2_fastqc
3_multiQC
Demultiplex pooled data
If you have pooled data, press the “Add pools” button and the following form will open:
Fill in your pools names and their indexes, if you have dual-index libraries , press on “Add Dual-Index” button and the table will be extended by an additional column (Index2) in which you can fill in the second Index.
The input data in this table will be use to generate the samplesheet for the first step demultiplexing done on the pooled data.
In the second table (shown below) fill in the samples names and their indexes and for each sample fill in the pool name that contain the sample.
The input data in this table will be use to generate the samplesheet for the second step demultiplexing analysis which is done on the samples.
The output folder will look as follow:
1_pools (fastq file, where R1 contains Umi+barcode, R2 contains the cDNA insert)
2_fastqc
3_multiQC
3_samples (fastq file, where R1 contains the cDNA insert, R2 contains Umi+barcode)
4_fastqc
5_multiQC
After filling all the required information, press the “Check sample sheet” button in order to validate the samples names and indexes.
If the validation is successful, press on “run analysis”.
SCRB-Seq transcriptome analysis with UTAP:
After the demultiplexing is completed you can proceed the analysis with the pipeline “transcriptome SCRB-Seq” (setup form for running the pipeline is similar to transcriptome – MARS-Seq).
The steps performed by the pipeline -
Trim adapter sequences
Fastqc for quality control of the samples will be run in parallel to the steps described
Map reads to the selected reference genome
Add UMI and gene information to the reads
Quantify gene expression by counting reads
Count UMI's and correct for clashes
Detect Deferentially Expressed (DE) genes for a model with a single factor