The ATAC-Seq (Assay for transposase-accessible chromatin using sequencing) pipeline facilitates the analysis of ATAC-Seq data in order to capture open and accessible regions of chromatin across the genome. The pipeline receives paired-end reads as input, performs quality control and pre-processing steps, and maps the reads onto mouse or human genomes. Nucleosome-free fragments are selected, peaks are identified and analyzed.
This pipeline is available in https://utap.wexac.weizmann.ac.il/
Analysis setup
The ATAC-Seq pipeline run on the Wexac cluster. In order to run a new analysis, your fastq files must be in your Collaboration folder on Wexac, in the correct structure (UTAP requirements & description).
To run a new analysis from existing files in the Collaboration folder login to https://utap.wexac.weizmann.ac.il using your Weizmann userID and password and click on "Run pipeline":
Click "Run pipeline"
Browse within your Collaboration folder and select the folder containing your sample files (fastq). Fastq files must be organized, within the selected folder (root folder), into subfolders as shown below.
Note that if you wish to go up one level (or more), click the desired folder level on the path at the top of the window.
UTAP user interface input information required:
If you want the output folder to be different from the one automatically filled in (based on the selected input folder), overwrite the output folder name in the text box associated with the screen’s Output folder: field with your name of choice.
Fill in a project name, and select the reference genome to which the reads will be aligned.
For mouse, you have the option to choose a TSS file, containing either a broad or narrow definition of the genes’ TSS (Transcription Start Site) regions (based on Nature. 2016 Jun 30;534(7609):652-7 - The landscape of accessible chromatin in mammalian preimplantation embryos).
Default adapters are P5 and P7 adapters of the TruSeq protocol.
By default, the ATAC-Seq pipeline runs without a “naked-DNA” control.
Finally, click the “Run analysis” button to submit the analysis. Once the analysis is completed, you will be notified by email (usually after a few hours).
All of the output files will be stored in your Wexac Collaboration folder.
At this point no report is being created.
Pipeline steps and associated tools:
Reads shifting: To account for the 9-bp duplication created by DNA repair of the nick by Tn5 transposase (as described by Buenrostro et al. 2013 PMID: 24097267), aligned reads are shifted + 4 bp and − 5 bp for positive and negative strand respectively, using using bedtools bamtobed and awk commands.
Peak annotation: The predicted peaks are collected from all samples using multi_intersectBed and then annotated according to the corresponding genome using Homer (with default parameters). Analysis of peaks distribution in genomic regions, and around TSS is done using ChIPseeker, together with a venn diagram of overlap of peaks sets.
Output folders:
1_cutadapt
2_fastqc
3_multiqc
4_mapping
5_process_alignment
6_nucleosome_free
7_shift_bam
8_ngs_plot
9_picard_plot
10_tss_count
11_call_peak
12_peaks_annotation
13_reports
Log files (one directory above the output directory):
snakemake_stdout.txt