INSTALLATION AND IMPLEMENTATION OF GPU-I-TASSER SUITE (Copyright 2020 by Zhang Lab, University of Michigan, All rights reserved) (Version 1.0, 2020/11/10) 1. What is GPU-I-TASSER Suite? GPU-I-TASSER Suite is a GPU based composite package of programs for protein structure prediction and function annotations. The Suite includes the following programs: a) GPU-I-TASSER: A hierarchical GPU based program for protein structure prediction b) COACH: A function annotation program based on COFACTOR, TM-SITE and S-SITE c) COFACTOR: A program for ligand-binding site, EC number & GO term prediction d) TM-SITE: A structure-based approach for ligand-binding site prediction e) S-SITE: A sequence-based approach for ligand-binding site prediction f) MUSTER: A threading program for protein template identification g) LOMETS: A meta-server approach consisting of multiple threading programs h) SPICKER: A clustering program for structure decoy selection i) HAAD: Quickly adding hydrogen atoms to protein heavy atom structure j) EDTSurf: Construct triangulated surfaces of protein molecules k) ModRefiner: Construct and refine atomic model from C-alpha traces l) NWalign: Protein sequence alignments by Needleman-Wunsch algorithm m) PSSpred: A program for Protein Secondary Structure PREDiction n) ResQ: An algorithm to estimate B-factor and residue-level error of models 2. How to install the GPU-I-TASSER Suite? a) download the GPU-I-TASSER Suite 'GPU-ITASSER1.0.tar.bz2' from http://zhanglab.dcmb.med.umich.edu/I-TASSER/download and unpack 'GPU-ITASSER1.0.tar.bz2' by > tar -xvf GPU-ITASSER1.0.tar.bz2 The root path of this package is called $pkgdir, e.g. /home/yourname/I-TASSER5.0. You should have all the programs under this directory. You can install the package at any location on your computer. b) Download GPU-I-TASSER and COACH library files from http://zhanglab.dcmb.med.umich.edu/library/ http://zhanglab.dcmb.med.umich.edu/BioLiP/ A script 'download_lib.pl' is provided in the package for automated library download and update of the libraries. We recommend putting the library files under the path /home/yourname/ITLIB. c) Third-party software installation: While the majority of programs in the package 'I-TASSER5.0.tar.bz2' are developed in the Zhang Lab herein the permission of use is released, there are some programs and databases (including blast, nr and GOparser) which were developed by third-party groups. A default version of blast and nr are included in the package. It is user's obligation to obtain license permission from the developers for all the third-party software before using them. A detailed list of addresses and guidance for install these programs can be seen at http://zhanglab.dcmb.med.umich.edu/I-TASSER/addition. In addition, your system needs to have Java installed. 3. Bug report: Please report and post bugs and suggestions at I-TASSER message board: http://zhanglab.dcmb.med.umich.edu/forum ####################################################### # # # 4. Installation and implementation of GPU-I-TASSER # # # ####################################################### 4.1. Introduction of I-TASSER GPU-I-TASSER is an integrated package for protein structure and function predictions. For a given sequence, GPU-I-TASSER first identifies template proteins from the Protein Data Bank (PDB) by multiple threading techniques (LOMETS). The continuous fragments excised from the template alignments are used to assemble full-length models by iterative Monte Carlo simulations. The best models are then selected from the Monte Carlo trajectories by decoy clustering. The final atomic models are rebuilt from the structure clusters by atomic-level structural refinements. To run GPU-I-TASSER, For function annotation, the I-TASSER structure model is matched through the function library (BioLiP) to identify functional template. The biological insights (including ligand-binding, enzyme classification, and gene ontology) are inferred from the functional templates by COACH based on the consensus of predictions from COFACTOR, TM-SITE and S-SITE. 4.2. How to run GPU-I-TASSER? a) Main script for running I-TASSER is $pkgdir/I-TASSERmod/runI-TASSER.pl. Run it directly without arguments will output the help information. b) The following arguments must be set (mandatory arguments). One example is: "$pkgdir/I-TASSERmod/runI-TASSER.pl -libdir /home/yourname/ITLIB -seqname example -datadir /home/yourname/I-TASSER5.0/example -runstyle gnuparallel" -libdir means the path of the template libraries -seqname means the unique name of your query sequence -datadir means the directory which contains your sequence -runstyle means the style in which to run jobs, whether "parallel", "serial" or "gnuparallel" Runstyle "gnuparallel" or "parallel" must be specified when running GPU-I-TASSER and GPU nodes must be available to support the GPU runs. The default runstyle is "serial" and that means running sequential I-TASSER simulations. "parallel" means running parallel GPU simulation jobs in the cluster using PBS/torque job scheduling system. "gnuparallel" means running GPU parallel simulation jobs on one GPU enabled computer with multiple cores using GNU parallel c) Other arguments are optional whose default values have been set. User can reset one or more of them. One example of command line is: "$pkgdir/I-TASSERmod/runI-TASSER.pl -pkgdir /home/yourname/I-TASSER5.0 -libdir /home/yourname/ITLIB -seqname example -datadir /home/yourname/I-TASSER5.0/example -runstyle parallel -homoflag benchmark -idcut 0.3 -LBS true -EC true -GO true -java_home /usr" -pkgdir means the path of the I-TASSER package. default is to guess by the location of runI-TASSER.pl script -java_home means the path contains the java executable "bin/java" (your system needs to have Java installed) -homoflag [real, benchmark],"real" will use all templates, "benchmark" will exclude homologous templates -idcut sequence identity cutoff for "benchmark" runs, default value is 0.3, range is in [0,1] -ntemp number of top templates output for each threading program, default is 20, range is in [1,50] -nmodel number of final models output by I-TASSER, default value is 5, range is in [1,10] -LBS [false or true], whether to predict ligand-binding site, default is false -EC [false or true], whether to predict EC number, default is false -GO [false or true], whether to predict GO terms, default is false -restraint1 specify distance/contact restraints (read more at http://zhanglab.dcmb.med.umich.edu/I-TASSER/option1.html ) -restraint2 specify template with alignment (read more at http://zhanglab.dcmb.med.umich.edu/I-TASSER/option4.html ) -restraint3 specify template name without alignment (read more at http://zhanglab.dcmb.med.umich.edu/I-TASSER/option2.html ) -restraint4 specify template file without alignment (read more at http://zhanglab.dcmb.med.umich.edu/I-TASSER/option3.html ) -temp_excl exclude specific templates from template library (read more at http://zhanglab.dcmb.med.umich.edu/I-TASSER/option6.html ) -traj this option means to deposit the trajectory files -light this option means to run I-TASSER in fast mode (each simulation runs by default 5 hours maximum) -hours specify maximum hours of simulations (default=5 when -light=true) -outdir where the final results should be saved (default value is set to data_dir) d) To make HTML webpage for GPU-I-TASSER suite output, follow document at $pkgdir/file2html/readme NOTE: a) Outline of steps for running GPU-I-TASSER by 'runI-TASSER.pl': a1) standardize 'seq.fasta' to 'seq.txt' and get the sequence length a2) run 'psiblast' to generate 'chk', 'out', 'pssm', 'mtx' files run 'PSSpred' to get 'seq.dat', 'seq.dat.ss' run 'solve' to get 'exp.dat' run 'pairmod' to get 'pair1.dat' and 'pair3.dat' a3) run threading programs sequentially run 'mkinit.pl' to generate restraints a4) run I-TASSER simulation a5) run SPICKER clustering program run 'get_cscore.pl' to get confidence score run 'EMrefinement.pl' to get full-atomic models run 'get_rsq_bfp.pl' to get local accuracy and B-factor estimations a6) run 'runCOACH.pl' to generate ligand-binding sites, EC number and GO terms predictions. b) 'seq.fasta' is the query sequence file in FASTA format, which is the only needed input file for running I-TASSER. This file should be put in $datadir before running this job. c) I-TASSER structure assembly simulations contains 14 independent runs by default. This number can be modified if the user wants to run more simulations, especially for big protein without good templates. d) If working on a cluster with multiple nodes, it is recommended to set $runstyle="parallel". You need have PBS server installed in your system. Parallel jobs will run faster since jobs are distributed among different nodes. The default setting $runstyle="serial" will run all the jobs on a single computer. e) If the job has been executed partially and encounter some error, you can rerun the main script without modification. It will check the existing files and start from the correct position. 4.3 System requirement: a) x86_64 machine, Linux kernel OS, Free disk space of more than 60G. b) GPU nodes for running GPU-I-TASSER. c) Perl and java interpreters should be installed. GO:Parser should be installed if you want to predict GO terms d) Basic compress and decompress package should be installed to support: tar and bunzip2. e) If you are using computer clusters, job management software PBS server should support 'qsub' and 'qstat'. If using other job management software, such as SGE and Slurm, some changes should be made following the instructions at: http://zhanglab.dcmb.med.umich.edu/bbs/?q=node/3561 4.4. How to cite I-TASSER and I-TASSER Suite? If you are using the GPU-I-TASSER package, you can cite: E MacCarthy, C Zhang, Y Zhang, D KC. GPU-I-TASSER: a GPU accelerated I-TASSER protein structure prediction tool. Bioinformatics 2021.