Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Fastq file name conventions: Fastq file names must start with the same sample name as the subfolders, and end with "_R1.fastq"  (or "_R1.fastq.gz") for single-read data . In the case of paired-end data, corresponding files must exist that are IDENTICAL in their name, but contain the suffix "_R2.fastq" (or "_R2.fastq.gz") instead of "_R1.fastq", where R is the read number.

For example:

  • root_folder 
    • sample1
      • sample1_R1.fastq
      • sample1_R2.fastq (must exist in paired-end)
        Or:
      • sample2_R1.fastq.gz
      • sample2_R2.fastq.gz (must exist in paired-end)
    • sample1
    • sample2

The pipeline also support supports the fastq file format conventions  _S*_L00*_R1.fastq or _S*_L00*_R1_0*.fastq.

For example:

  • root_folder 
    • sample1
      • sample1_S0_L001_R1_001.fastq
      • sample1_S0_L001_R1_002.fastq
      • sample1_S0_L002_R1_001.fastq
      • sample1_S0_L002_R1_002.fastq
      • sample1_S0_L001_R2_001.fastq (must exist in paired-end)
      • sample1_S0_L001_R2_002.fastq (must exist in paired-end)
      • sample1_S0_L002_R2_001.fastq (must exist in paired-end)
      • sample1
    _S0_L002_R2_002.fastq (must exist in paired-end)
  • sample2_S0_L001_R1_001.fastq
  • sample2_S0_L001_R1_002.fastq
  • sample2_S0_L002_R1_001.fastq
  • sample2_S0_L002_R1_002.fastq
  • sample2_S0_L001_R2_001.fastq (must exist in paired-end)
  • sample2_S0_L001_R2_002.fastq (must exist in paired-end)
  • sample2_S0_L002_R2_001.fastq (must exist in paired-end)
  • sample2
      • _S0_L002_R2_002.fastq (must exist in paired-end)
    • sample1
    • sample2


UTAP user interface input information required: 

...

Finally, click the “Run analysis” button to submit the analysis. Once the analysis is completed, you will be notified by email (usually after a few hours).
All of the output files will be stored in your Wexac Collaboration folder.
At this point no report is being created.

Analysis workflow

Pipeline steps and associated tools:

  1. Quality control: Reads are quality trimmed using cutadapt. In this process primers corresponding to the TruSeq protocol are removed .(output is in folder 1)
  2. Quality control: Reads quality control is evaluated using FastQC (in output folder 2), and a report file, containing quality reports for all of the samples, is generated using multiQC (in output folder 3).
  3. Mapping to genome: The quality trimmed paired-end reads are mapped to Mouse/Human genomes using Bowtie2.
  4. Following the alignment, mitochondrial genes are removed from the analysis. Duplicated reads are removed using picard-tools. The remaining unique reads are indexed and sorted using samtools index and samtools sort
  5. Generate statistics on the alignment using flagstat.
  6. Visualization in graphs: The analyzed reads are graphically visualized using ngsplot.
  7. Select nucleosome-free fragments: fragments of length <120bp are selected.
  8. Peak calling: Peaks are called using MACS2.

...