Page Comparison

...

Fastq file name conventions: Fastq file names must start with the same sample name as the subfolders, and end with "_R1.fastq" (or "_R1.fastq.gz") for single-read data . In the case of paired-end data, corresponding files must exist that are IDENTICAL in their name, but contain the suffix "_R2.fastq" (or "_R2.fastq.gz") instead of "_R1.fastq", where R is the read number.

For example:

root_folder

sample1
- sample1_R1.fastq
- sample1_R2.fastq (must exist in paired-end)
  Or:
- sample2_R1.fastq.gz
- sample2_R2.fastq.gz (must exist in paired-end)

sample1
sample2

The pipeline also support supports the fastq file format conventions _S*_L00*_R1.fastq or _S*_L00*_R1_0*.fastq.

For example:

root_folder

sample1
- sample1_S0_L001_R1_001.fastq
- sample1_S0_L001_R1_002.fastq
- sample1_S0_L002_R1_001.fastq
- sample1_S0_L002_R1_002.fastq
- sample1_S0_L001_R2_001.fastq (must exist in paired-end)
- sample1_S0_L001_R2_002.fastq (must exist in paired-end)
- sample1_S0_L002_R2_001.fastq (must exist in paired-end)
- sample1

_S0_L002_R2_002.fastq (must exist in paired-end)

sample2_S0_L001_R1_001.fastq
sample2_S0_L001_R1_002.fastq
sample2_S0_L002_R1_001.fastq
sample2_S0_L002_R1_002.fastq
sample2_S0_L001_R2_001.fastq (must exist in paired-end)
sample2_S0_L001_R2_002.fastq (must exist in paired-end)
sample2_S0_L002_R2_001.fastq (must exist in paired-end)

sample2

- _S0_L002_R2_002.fastq (must exist in paired-end)

sample1
sample2

UTAP user interface input information required:

...

Finally, click the “Run analysis” button to submit the analysis. Once the analysis is completed, you will be notified by email (usually after a few hours).
All of the output files will be stored in your Wexac Collaboration folder.
At this point no report is being created.

Analysis workflow

Pipeline steps and associated tools:

Quality control: Reads are quality trimmed using cutadapt. In this process primers corresponding to the TruSeq protocol are removed .(output is in folder 1)
Quality control: Reads quality control is evaluated using FastQC (in output folder 2), and a report file, containing quality reports for all of the samples, is generated using multiQC (in output folder 3).
Mapping to genome: The quality trimmed paired-end reads are mapped to Mouse/Human genomes using Bowtie2.
Following the alignment, mitochondrial genes are removed from the analysis. Duplicated reads are removed using picard-tools. The remaining unique reads are indexed and sorted using samtools index and samtools sort
Generate statistics on the alignment using flagstat.
Visualization in graphs: The analyzed reads are graphically visualized using ngsplot.
Select nucleosome-free fragments: fragments of length <120bp are selected.
Peak calling: Peaks are called using MACS2.

...

Versions Compared

Old Version 3

New Version 4

Key

Analysis workflow