UTAP - Installation Guide

You can find updateded documentation here: utap.readthedocs.io

Support: refael.kohen@weizmann.ac.il


Requirement:

The application need to be installed on linux server that can submit cluster commands like qsub (pbs cluster), or bsub (lsf cluster).

The host server need have ~40G of RAM memory. If you install on compute cluster, all compute nodes in the queue need to have ~40G of RAM memory.


On the server need to be installed: 

docker version >= 17 

miniconda

apache



Create directory for UTAP software and the output:

Notice: within this folder the data of the users will be written, therefore you need verify that you have sufficient disk space approximately ~400G per analysis.

HOST_MOUNT=  ... fill here the path
mkdir $HOST_MOUNT
cd $HOST_MOUNT


Download the meta data:


#Download by browser:
https://drive.google.com/file/d/11OLRgh8YlPolyh71ESe10bP_t7P9ZtNf/view?usp=sharing
#OR by ftp:
ftp dors.weizmann.ac.il
username: bioimg
password: bioimaging
get UTAP/utap-meta-data-v1.0.7.tar.gz

mv utap-meta-data-v1.0.7.tar.gz $HOST_MOUNT
cd $HOST_MOUNT
tar -xzvf utap-meta-data-v1.0.7.tar.gz


Create conda environments:


conda create -y --name utap r-essentials r-base=3.3.2
conda activate utap
# Install packages in the utap environment:
# Run the script in this file in your shell:
export CONDA_DIR=YOUR_CONDA_DIR # For example: CONDA_DIR=/home/user/miniconda2
$HOST_MOUNT/utap-meta-data/installation/install-conda-packages-transcriptome.sh >& $HOST_MOUNT/utap-meta-data/installation/conda-install-transcriptome.stdout
conda deactivate

conda create -y --name utap-chromatin
conda activate utap-chromatin
export CONDA_DIR=YOUR_CONDA_DIR # For example: CONDA_DIR=/home/user/miniconda2
$HOST_MOUNT/utap-meta-data/installation/install-conda-packages-chromatin.sh >& $HOST_MOUNT/utap-meta-data/installation/conda-install-chromatin.stdout
conda deactivate

conda create -y -n utap-py35 python=3.5 anaconda
conda activate utap-py35
conda install -y -c bioconda snakemake==3.13.3
conda deactivate


########## OLD COMMANDS - NOT IN USE ############################
conda create -y --name utap
conda activate utap
conda env create -f utap_environment.yml -n utap
#Run the file for installation packages on utap environment
$HOST_MOUNT/utap-meta-data/installation/install-conda-packages.sh
conda deactivate
#################################################################


Create genomes:

Extract the genomes to fasta format and create Star index of the genomes (requires ~135G of disc, but in the building process temporary files requires ~200G):

#Extract genome files:
#=====================
$HOST_MOUNT/utap-meta-data/softwares/bin/twoBitToFa $HOST_MOUNT/utap-meta-data/2bit_genomes/hg19.2bit $HOST_MOUNT/utap-meta-data/genomes/Homo_sapiens/UCSC/hg19/gemone_hg19.fa
$HOST_MOUNT/utap-meta-data/softwares/bin/twoBitToFa $HOST_MOUNT/utap-meta-data/2bit_genomes/hg38.2bit $HOST_MOUNT/utap-meta-data/genomes/Homo_sapiens/UCSC/hg38/gemone_hg38.fa
$HOST_MOUNT/utap-meta-data/softwares/bin/twoBitToFa $HOST_MOUNT/utap-meta-data/2bit_genomes/mm10.2bit $HOST_MOUNT/utap-meta-data/genomes/Mus_musculus/UCSC/mm10/gemone_mm10.fa
$HOST_MOUNT/utap-meta-data/softwares/bin/twoBitToFa $HOST_MOUNT/utap-meta-data/2bit_genomes/danRer10.2bit $HOST_MOUNT/utap-meta-data/genomes/Danio_rerio/UCSC/danRer10/gemone_danRer10.fa
$HOST_MOUNT/utap-meta-data/softwares/bin/twoBitToFa $HOST_MOUNT/utap-meta-data/2bit_genomes/tair11-araport.2bit $HOST_MOUNT/utap-meta-data/genomes/Arabidopsis_thaliana/ARAPORT/tair11/gemone_tair11-araport.fa
$HOST_MOUNT/utap-meta-data/softwares/bin/twoBitToFa $HOST_MOUNT/utap-meta-data/2bit_genomes/tair10.2bit $HOST_MOUNT/utap-meta-data/genomes/Arabidopsis_thaliana/NCBI/tair10/gemone_tair10.fa
$HOST_MOUNT/utap-meta-data/softwares/bin/twoBitToFa $HOST_MOUNT/utap-meta-data/2bit_genomes/sl3.2bit $HOST_MOUNT/utap-meta-data/genomes/Solanum_lycopersicum/SGN/sl3/gemone_sl3.fa



#Build STAR index to genome files:
==================================
These commands take ~1 hour per genome. The commands run on 30 threads (you can change it with --runTreadN parameter) and consume RAM memory as following:
*hg19:       29918 MB
*hg38:       30574 MB
*mm10:       27532 MB
*danRer10:   23523 MB
*tair11:     4301 MB
*tair10:     4282 MB
*sl3:        17663 MB
#======================================================================================================================================

$HOST_MOUNT/utap-meta-data/softwares/bin/STAR --runThreadN 30 --runMode genomeGenerate --genomeDir $HOST_MOUNT/utap-meta-data/genomes/Homo_sapiens/UCSC/hg19/STAR_index/ --genomeFastaFiles $HOST_MOUNT/utap-meta-data/genomes/Homo_sapiens/UCSC/hg19/gemone_hg19.fa
$HOST_MOUNT/utap-meta-data/softwares/bin/STAR --runThreadN 30 --runMode genomeGenerate --genomeDir $HOST_MOUNT/utap-meta-data/genomes/Homo_sapiens/UCSC/hg38/STAR_index/ --genomeFastaFiles $HOST_MOUNT/utap-meta-data/genomes/Homo_sapiens/UCSC/hg38/gemone_hg38.fa
$HOST_MOUNT/utap-meta-data/softwares/bin/STAR --runThreadN 30 --runMode genomeGenerate --genomeDir $HOST_MOUNT/utap-meta-data/genomes/Mus_musculus/UCSC/mm10/STAR_index/ --genomeFastaFiles $HOST_MOUNT/utap-meta-data/genomes/Mus_musculus/UCSC/mm10/gemone_mm10.fa
$HOST_MOUNT/utap-meta-data/softwares/bin/STAR --runThreadN 30 --runMode genomeGenerate --genomeDir $HOST_MOUNT/utap-meta-data/genomes/Danio_rerio/UCSC/danRer10/UCSC/danRer10/STAR_index/ --genomeFastaFiles $HOST_MOUNT/utap-meta-data/genomes/Danio_rerio/UCSC/danRer10/gemone_danRer10.fa
$HOST_MOUNT/utap-meta-data/softwares/bin/STAR --runThreadN 30 --runMode genomeGenerate --genomeDir $HOST_MOUNT/utap-meta-data/genomes/Arabidopsis_thaliana/ARAPORT/tair11/STAR_index/ --genomeFastaFiles $HOST_MOUNT/utap-meta-data/genomes/Arabidopsis_thaliana/ARAPORT/tair11/gemone_tair11-araport.fa
$HOST_MOUNT/utap-meta-data/softwares/bin/STAR --runThreadN 30 --runMode genomeGenerate --genomeDir $HOST_MOUNT/utap-meta-data/genomes/Arabidopsis_thaliana/NCBI/tair10/STAR_index/ --genomeFastaFiles $HOST_MOUNT/utap-meta-data/genomes/Arabidopsis_thaliana/NCBI/tair10/gemone_tair10.fa
$HOST_MOUNT/utap-meta-data/softwares/bin/STAR --runThreadN 30 --runMode genomeGenerate --genomeDir $HOST_MOUNT/utap-meta-data/genomes/Solanum_lycopersicum/SGN/sl3/STAR_index/ --genomeFastaFiles $HOST_MOUNT/utap-meta-data/genomes/Solanum_lycopersicum/SGN/sl3/gemone_sl3.fa



After the extracting of the fasta files and building the STAR index, you can delete the fasta and .2bit files:
==============================================================================================================
rm $HOST_MOUNT/utap-meta-data/genomes/Homo_sapiens/UCSC/hg19/gemone_hg19.fa
rm $HOST_MOUNT/utap-meta-data/genomes/Homo_sapiens/UCSC/hg38/gemone_hg38.fa
rm $HOST_MOUNT/utap-meta-data/genomes/Mus_musculus/UCSC/mm10/gemone_mm10.fa
rm $HOST_MOUNT/utap-meta-data/genomes/Danio_rerio/UCSC/danRer10/gemone_danRer10.fa
rm $HOST_MOUNT/utap-meta-data/genomes/Arabidopsis_thaliana/ARAPORT/tair11/gemone_tair11-araport.fa
rm $HOST_MOUNT/utap-meta-data/genomes/Arabidopsis_thaliana/NCBI/tair10/gemone_tair10.fa
rm $HOST_MOUNT/utap-meta-data/genomes/Solanum_lycopersicum/SGN/sl3/gemone_sl3.fa

rm $HOST_MOUNT/utap-meta-data/2bit_genomes/*


Load the image into docker engine:


docker pull refaelkohen/utap:v1.0.0


Run UTAP:

For running UTAP on local server, run the command (all parameters all mandatory). The command will create docker container called "utap". 


$HOST_MOUNT/utap-meta-data/installation/utap-install.sh -a DNS_HOST -b HOST_MOUNT -c REPLY_EMAIL -d MAIL_SERVER -e HOST_APACHE_PORT -f HOST_SSH_PORT -g ADMIN_PASS -h USER -i INSTITUTE_NAME -j IMAGE_NAME -k DB_PATH -l MAX_UPLOAD_SIZE -m local  

For running UTAP compute cluster run the command:

$HOST_MOUNT/utap-meta-data/installation/utap-install.sh -a DNS_HOST -b HOST_MOUNT -c REPLY_EMAIL -d MAIL_SERVER -e HOST_APACHE_PORT -f HOST_SSH_PORT -g ADMIN_PASS -h USER -i INSTITUTE_NAME -j IMAGE_NAME -k DB_PATH -l MAX_UPLOAD_SIZE -m CLUSTER_TYPE -n CLUSTER_QUEUE -o CONDA -p AUTH_KEYS_FILE


After the run, you can access to application in the address: http://DNS_HOST:HOST_APACHE_PORT (according your choices for values of these parameters)


Parameters:

DNS_HOST - DNS address of the host server. For examplehttp://servername.ac.il or servername.ac.il

HOST_MOUNT - Mount point from the docker on the host (full path of the folder). Notice:this is the folder where you located within the utap-meta-data folder. All input and output data of the users will be written into this folder. 

REPLY_EMAIL -  Support email for users. Users can reply to this email. 
MAIL_SERVER - Domain name of the mail server (For examplemg.weizmann.ac.il)

HOST_APACHE_PORT - Any available port on the host server for Apache of docker. For example: 8081
HOST_SSH_PORT - Any available port on the host server for ssh server of docker. For example: 2222

USER - user in host server that have permmistion to run cluster commands and write into $HOST_MOUNT folder (cannot be root).
INSTITUTE_NAME  - Your institute name or lab (string contains only A-Za-z0-9 characters without whitespaces).

IMAGE_NAME - the name of docker image. For example: utap:v1.0.0
ADMIN_PASS - Any password of the admin in the djnago database (string contains only A-Za-z0-9 characters without whitespaces).

DB_PATH - Full path to folder where the DB will be located. $USER need have write permission to this folder. It is highly recommended to create a new folder on / (non-mount folder), The DB is very small, so it is will not create to disk space problems. For example: mkdir /utap-db; chown -R $USER /utap-db;  Notice that mounted folder can cause to problem.

MAX_UPLOAD_SIZE - Maximum file/folder size that user can to upload at once (Mb). For example: 314572800 (i.e. 300* 1024*1024 = 314572800 = 300Gb)


Additional parameters for installing on cluster:

CLUSTER_TYPE - Type of the cluster or local. If you select "local" the commands of utap application will be run on the local server (and you no need to supply the parameters CLUSTER_QUEUE, CONDA, AUTH_KEYS_FILE). Else, the commands will be sent to the cluster. For now UTAP support in lsf or pbs cluters. For example: lsf/pbs/local 

CLUSTER_QUEUE - Queue name in the cluster (The $USER need to have permissions to run on this queue)

CONDA - Full path to root folder of miniconda. For example: /miniconda2

AUTH_KEYS_FILE - Full path to .ssh/authorized_keys (or .ssh/authorized_keys2) file of $USER (The docker will add its public key to this file).



Important:

Within $DB_PATH folder will be created file called: db.sqlite3

The db.sqlite3 file is the database of the application and contains the details of the users and links to its results on $HOST_MOUNT folder.

The $HOST_MOUNT contains all data of the users (input and output files).

The db.sqlite3 database and $HOST_MOUNT folder are located on the disc of the host server (out of the docker container).

When you stop/delete the "utap" container the database and $HOST_MOUNT folder are not deleted.

So, always you can run again the docker with utap-install.sh script and use with the same database and the same $HOST_MOUNT folder.