Pipeline website: http://utap.wexac.weizmann.ac.il
The ChIP-seq (Chromatin Immuno-Precipitation followed by Sequencing) pipeline facilitates the analysis of ChIP-seq data in order to identify genome-wide DNA binding sites for transcription factors and other proteins. The pipeline receives single or paired-end reads as input (the type of input is automatically determined by the number of fastq files generated per sample), performs quality control and pre-processing steps, and maps the reads onto mouse or human genomes. Peak analysis is then performed (after some post-processing), on the identified DNA binding fragments, and significant peaks (as compared to a control background if present) are selected and analyzed.
Before you start:
This pipeline runs on the Wexac cluster.
Please prepare the following in advance:
Setting up a new CHIP-seq analysis
In order to run a new transcriptome analysis, you must first transfer demultiplexed sequencing data (fastq files) to your Collaboration folder. Within the Collaboration folder, the files must conform to the directory structure described below.
Then, login to utap.wexac.weizmann.ac.il via Firefox or Chrome (the pipeline is NOT compatible with Internet Explorer) using your Weizmann userID and password and click on "Run pipeline":
2. Provide the name of the input folder:
Browse within your Collaboration folder, and select the folder containing your sample (fastq) files. Fastq files must be organized, within the selected folder (depicted as root_folder in the example below), into subfolders as shown below.
Note that if you wish to go up one level (or more), click the desired folder level on the path at the top of the folder-browsing window.
Fastq file name conventions: Fastq file names must start with the same sample name as the subfolders, and end with "_R1.fastq" (or "_R1.fastq.gz") for single-read data. For paired-end data, corresponding files must exist that are IDENTICAL in their name, but contain the suffix "_R2.fastq" (or "_R2.fastq.gz") instead of "_R1.fastq" (or "_R1.fastq.gz").
For example:
The pipeline also supports the fastq file format conventions _S*_L00*_R1.fastq or _S*_L00*_R1_0*.fastq.
For example:
3. Optionally change the name of the output folder
If you want the output folder to be different from the one automatically filled in (based on the selected input folder), overwrite the output folder name in the text box associated with the screen’s Output folder: field with your name of choice.
Additional setups
Fill in a project name, and select the reference genome to which the reads will be aligned.
Default adapters are the P5 and P7 adapters of the Tru-seq protocol.
4. Run with control
Chose “run with control” (in the drop-down menu associated with the run with control line) in order to enable comparison of each treatment with its corresponding control. When selecting this option, a new group of control and treatment boxes will open. Organize the samples by selecting them and using the arrows to move items to the appropriate categories.
If you have more than one treatment against control, press on the "Add group" button as shown in the figure below.
Each group must contain at least one sample in each of the treatment and control boxes.
When moving a sample to the control box, a copy of the sample is retained, so that you can use it again in a new group.
If you move more than one sample to the treatment or control box, the pipeline will automatically combine the samples into one big treatment/control sample.
Important: All of the pipeline steps (mapping, counts etc.) will be run (only) on the samples in the treatment/control boxes.
5. Run the pipeline
Finally, click the “Run analysis” button to submit the analysis. You will be notified by email when the analysis is ready (usually after a few hours). All of its output files will be stored in the relevant subfolders within your wexac Collaboration folder.
Output folders:
0_concatenating_fastq
1_cutadapt
2_fastqc
3_multiQC
4_mapping
5_filtered_alignment
6_peaks_prediction
7_peaks_annotation
8_graphs
9_BigWig
10_reports
Log files (stored one directory above the output directory)
snakemake_stdout.txt (stored one directory above the output directory)