...
Fastq file name conventions: Fastq file names must start with the same sample name as the subfolders, and end with "_R1.fastq" (or "_R1.fastq.gz") for single-read data . In the case of paired-end data, corresponding files must exist that are IDENTICAL in their name, but contain the suffix "_R2.fastq" (or "_R2.fastq.gz") instead of "_R1.fastq", where R is the read number.
For example:
- root_folder
- sample1
- sample1_R1.fastq
- sample1_R2.fastq (must exist in paired-end)
Or:
- sample2_R1.fastq.gz
- sample2_R2.fastq.gz (must exist in paired-end)
- sample1
- sample2
The pipeline also support supports the fastq file format conventions _S*_L00*_R1.fastq or _S*_L00*_R1_0*.fastq.
For example:
- root_folder
- sample1
- sample1_S0_L001_R1_001.fastq
- sample1_S0_L001_R1_002.fastq
- sample1_S0_L002_R1_001.fastq
- sample1_S0_L002_R1_002.fastq
- sample1_S0_L001_R2_001.fastq (must exist in paired-end)
- sample1_S0_L001_R2_002.fastq (must exist in paired-end)
- sample1_S0_L002_R2_001.fastq (must exist in paired-end)
- sample1
- sample2_S0_L001_R1_001.fastq
- sample2_S0_L001_R1_002.fastq
- sample2_S0_L002_R1_001.fastq
- sample2_S0_L002_R1_002.fastq
- sample2_S0_L001_R2_001.fastq (must exist in paired-end)
- sample2_S0_L001_R2_002.fastq (must exist in paired-end)
- sample2_S0_L002_R2_001.fastq (must exist in paired-end) sample2
- _S0_L002_R2_002.fastq (must exist in paired-end)
- sample1
- sample2
UTAP user interface input information required:
...
Finally, click the “Run analysis” button to submit the analysis. Once the analysis is completed, you will be notified by email (usually after a few hours).
All of the output files will be stored in your Wexac Collaboration folder.
At this point no report is being created.
Analysis workflow
Pipeline steps and associated tools:
- Quality control: Reads are quality trimmed using cutadapt. In this process primers corresponding to the TruSeq protocol are removed .(output is in folder 1)
- Quality control: Reads quality control is evaluated using FastQC (in output folder 2), and a report file, containing quality reports for all of the samples, is generated using multiQC (in output folder 3).
- Mapping to genome: The quality trimmed paired-end reads are mapped to Mouse/Human genomes using Bowtie2.
- Following the alignment, mitochondrial genes are removed from the analysis. Duplicated reads are removed using picard-tools. The remaining unique reads are indexed and sorted using samtools index and samtools sort
- Generate statistics on the alignment using flagstat.
- Visualization in graphs: The analyzed reads are graphically visualized using ngsplot.
- Select nucleosome-free fragments: fragments of length <120bp are selected.
- Peak calling: Peaks are called using MACS2.
...