Pipeline website: http://ngsbio.wexac.weizmann.ac.il
The CHIP-seq (Chromatin Immunoprecipitation followed by Sequencing) pipeline facilitates the analysis of CHIP-seq data in order to identify genome-wide DNA binding sites for transcription factors and other proteins. The pipeline receives single or paired-end reads as input (the type of input is automatically determined by the number of fastq files generated per sample), performs quality control and pre-processing steps, and maps the reads onto mouse or human genomes. Peak analysis is then performed (after some post-processing), on the identified DNA binding fragments, and significant peaks (as compared to a control background if present) are selected and analyzed.
Before you start:
This pipeline runs on the Wexac cluster.
Please prepare the following in advance:
In order to run a new transcriptome analysis, you must first transfer demultiplexed sequencing data (fastq files) to your Collaboration folder. Within the Collaboration folder, a directory structure will be created supporting the transcriptome analysis setup described below.
Setting up a new CHIP-seq analysis
If you wish to run a new CHIP-seq analysis from existing files in the Collaboration folder, or if you’ve uploaded new sequencing data (not produced within the LSCF) to Wexac from an external source, login to ngsbio.wexac.weizmann.ac.il via Firefox or Chrome (the pipeline is NOT compatible with Internet Explorer) using your Weizmann userID and password.
2. Select the input folder:
Browse within your Collaboration folder and select the folder containing your sample files (fastq). Fastq files must be organized, within the selected folder (depicted as root folder in the example below), into subfolders as shown below.
Note that if you wish to go up one level (or more), click the desired folder level on the path at the top of the window.
Fastq file name conventions: Fastq file names must start with the same sample name as the subfolders, and end with "_R1.fastq" (or "_R1.fastq.gz") for single-read data. For paired-end data, corresponding files must exist that are IDENTICAL in their name, but contain the suffix "_R2.fastq" (or "_R2.fastq.gz") instead of "_R1.fastq", where R is the read number.
For example:
The pipeline also supports the fastq file format conventions _S*_L00*_R1.fastq or _S*_L00*_R1_0*.fastq.
For example:
3. Select the output folder
If you want the output folder to be different from the one automatically filled in (based on the selected input folder), overwrite the output folder name in the text box associated with the screen’s Output folder: field with your name of choice. Additional setups
Fill in a project name, and select the reference genome to which the reads will be aligned.
Default adapters are the P5 and P7 adapters of the Tru-seq protocol.
4. Run with control
Choose “run with control” in order to compare each treatment to its corresponding control. When selecting this option, a new group of control and treatment boxes will open. Organize the samples by selecting them and using the arrows to move to the appropriate categories.
If you have more than one treatment against control, press on the "Add group" button as shown in the figure below.
Each group must contain at least one sample in each of the treatment and control boxes.
When moving a sample to the control box, a copy of the sample is retained, so that you can use it again in a new group.
If you move more than one sample to the treatment or control box, the pipeline will automatically combine the samples into one big treatment/control sample.
Important: All of the pipeline steps (mapping, counts etc.) will be run (only) on the samples in the treatment/control boxes.
5. Run the pipeline
Finally, click the “Run analysis” button to submit the analysis. You will be notified by email when the analysis is ready (usually after a few hours). All of its output files will be stored in the relevant subfolders within your Wexac Collaboration folder.
Output folders:
1_cutadapt
2_fastqc
3_multiQC
4_alignment
5_samtools
6_peaks_prediction
7_peaks_annotation
8_graphs
9_BigWig
Log files (stored one directory above the output directory)
snakemake_stdout.txt (stored one directory above the output directory)