INSTALLATION AND IMPLEMENTATION OF DEMO SUITE (Copyright 2021 by Zhang Lab, University of Michigan, All rights reserved) (Version 2.0, 2021/12/2) 1. What is DEMO Suite? The DEMO Suite is a composite package of programs for multi-domain protein structure assembly. The Suite includes the following programs: a) DEMO2: A program for multi-domain protein structure assembly b) DeepMSA: A program for multiple sequence alignmnet generation c) DeepPotential: A deep residual neural-network algorithm for inter-residue spatial restraints prediction d) FASPR: A program for protein side-chain packing 2. How to install the DEMO Suite? a) download the DEMO Suite 'DEMO-2.0.tar.gz' from https://zhanggroup.org/DEMO2/download/ and unpack 'DEMO-2.0.tar.gz by > tar -zxvf DEMO-2.0.tar.gz The root path of this package is called $pkgdir, e.g. /home/yourname/DEMO2. You should have all the programs under this directory. You can install the package at any location on your computer. b) Download DEMO2 library files from https://zhanggroup.org/DEMO2/download/ The library needs about 120GB of the disk space. c) Third-party software installation: While the majority of programs in the package 'DEMO-2.0.tar.gz' are developed in the Zhang Lab herein the permission of use is released, there are some programs and databases (including blast, nr, uniclust30, uniref90 and metaclust) which were developed by third-party groups. A default version of blast and nr are included in the package. It is user's obligation to obtain license permission from the developers for all the third-party software before using them. In addition, your system needs to have python3 (which supports pytorch >1.1.0) installed. To use DeepMSA, you need download uniclust30, uniref90 and metaclust from http://gwdu111.gwdg.de/~compbiol/uniclust/2017_04/uniclust30_2017_04_hhsuite.tar.gz , ftp://ftp.uniprot.org/pub/databases/uniprot/uniref/uniref90/uniref90.fasta.gz , and https://metaclust.mmseqs.org/2017_05/metaclust_2017_05.fasta.gz. after you unpack them, put the entire folder to the DEMO2 library folder. Then rename the folder uniclust30_xxx_xxx to uniclust30, uniref90_xxx to uniref90, metaclust_xxx to metaclust. Then use $pkgdir/external/hhsuite2/bin/esl-sfetch to create .ssi index for uniref90 and metaclust, here $pkgdir means the path where you put the DEMO suite package. For example, if the uniref90 database in uniref90 folder is named as uniref90.fasta, then go to uniref90 folder, run $pkgdir/external/hhsuite2/bin/esl-sfetch --index uniref90.fasta, you will find a new file named as uniref90.fasta.ssi after the command done. Then do the same thing to metaclust database. If you use different version of uniclust30, uniref90 or metaclust, please go to $pkgdir/run_DEMO2.py, change the variables: hhblitsdb = "$libdir/uniclust30_2017_04" jackhmmerdb = "$libdir/uniref90.fasta" hmmsearchdb = "$libdir/metaclust_2017_05.clean.fasta" 3. Bug report: Please report and post bugs and suggestions at the message board: https://zhanggroup.org/forum/ ####################################################### # # # 4. Installation and implementation of DEMO # # # ####################################################### 4.1. Introduction of DEMO2 DEMO2 (Domain Enhanced MOdeling, version 2.0) is an improved version of DEMO for automated assembly of full-length structural models of multi-domain proteins by integrating deep-learning predicted inter-domain spatial restraints. Starting from individual domain structures, quaternary structure templates that have similar component domains are identified by domain-level structural alignments using TM-align. Meanwhile, inter-domain spatial restraints are predicted by the deep residual neural-network-based predictor DeepPotential. Full-length models are then created by a fast quasi-Newton optimization for rigid-body domain structure assembly, which are guided by the DeepPotential predicted inter-domain restraints, inter-domain distance profiles collected from the top-ranked quaternary templates, and physics-based steric potentials. The final models are selected from the low energy conformations and further refined with fragment-guided molecule dynamics simulations. Large-scaled benchmark tests showed that the performance is significantly beyond its predecessor. 4.2. How to run DEMO2? a) Main script for running DEMO2 is $pkgdir/run_DEMO2.py, where "$pkgdir" is the location of run_DEMO2.py script. Run it directly without arguments will output the help information. b) The following arguments must be set (mandatory arguments). One example is: "$pkgdir/run_DEMO2.py protein_name input_dir sequence [Options]" 'protein_name' is the name of the folder containg the protein sequence and domain models 'input_dir' is the directory which contains the query folder 'sequence' is the full-chain sequence in FASTA format c) Other arguments are optional whose default values have been set. User can reset one or more of them. One example of command line is: "$pkgdir/run_DEMO.py protein_name input_dir sequence -template XXX.pdb" -template Provide the template strcuture to guide the domain assembly. The tmeplate should be in PDB format. -deepdist [no or yes], flag of predicted distance by DomainDist to guide the assembly. The default value is "yes". -EMmap The cryo-EM density map in MRC or CCP4 format. -reso The resolution of the density map. -CLink The cross link data (follw the format provided on websever). -run [real, benchmark],"real" will use all templates, "benchmark" will exclude homologous templates d) Where are the final predicted results?     The following results are included in "/input_dir/protein_name": "fmodel*.pdb" the final model assembled by DEMO "cscore" the confidence score, estimated TM-score, and estimated RMSD of the final model NOTE: a) Outline of steps for running DEMO2 by 'run_DEMO2.py': a1) Prase user provided information a2) run 'DeepPotential' to predict inter-residue spatial restraints of the full-chain a3) run 'DEMO' to assemble all domain models into a full-length model b) The domain pdb file should be named as dom1.pdb, dom2.pdb, dom3.pdb... in order. They be put in "./input_dir/protein_name" before running this job. c) 'seq.fasta' is the query sequence file in FASTA format. This file should be put in "./input_dir/protein_name" before running this job. c) If working on a cluster with multiple nodes, it is recommended to set $runstyle="parallel". You need have PBS server installed in your system. Parallel jobs will run faster since jobs are distributed among different nodes. The default setting $runstyle="serial" will run all the jobs on a single computer. d) If the job has been executed partially and encounter some error, you can rerun the main script without modification. It will check the existing files and start from the correct position. e) If you want to provide the cryo-EM density data to guide the assembly, please use the option "-EMmap" and "-reso" and follw the explanation and example at https://zhanggroup.org/DEMO2/explanation_EM.html f) If you want to provide the cross link data or contact/distance to guide the assembly, please use the option "CLink" and follw the explanation and example at https://zhanggroup.org/DEMO2/explanation_CL.html 4.3 System requirement: a) x86_64 machine, Linux kernel OS, Free disk space of more than 150G. b) Perl and python interpreters should be installed. c) Basic compress and decompress package should be installed to support: tar and bunzip2. d) If you are using computer clusters, job management software PBS server should support 'qsub' and 'qstat'. If using other job management software, such as SGE and Slurm, some changes should be made following the instructions at: https://zhanggroup.org/bbs/?q=node/3561 4.4. How to cite DEMO2 or DEMO Suite? Xiaogen Zhou, Chunxiang Peng, XXX, Guijun Zhang, and Yang Zhang. DEMO2: Multidomain protein structures assembly by coupling structural analogous templates with deep-learning inter-domain restraints. Submitted, 2021. Xiaogen Zhou, Jun Hu, Chengxin Zhang, Guijun Zhang, and Yang Zhang. Assembling multidomain protein structures through analogous global structural alignments. Proceedings of the National Academy of Sciences, 116: 15930-15938 (2019) ####################################################### # # # 5. Installation and implementation of DeepMSA # # # ####################################################### 5.1. Introduction of DeepMSA DeepMSA is a new open-source method for sensitive MSA construction, which has homolo- gous sequences and alignments created from multi-sources of whole-genome and metagenome databases through complementary hidden Markov model algorithms. 5.2. How to install DeepMSA program? When you unpack the DEMO Suite, DeepMSA program is already installed. 5.3. How to run DeepMSA program? The DeepMSA main script is $pkgdir/external/hhsuite2/scripts/build_MSA.py. By running the program without argument, you can print all the running options. 5.4. How to cite DeepMSA? If you are using the DeepMSA program, you can cite: C Zhang, W Zheng, S M Mortuza, Y Li, Y Zhang. DeepMSA: constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins. Bioinformatics 36:2105-2112 (2020). ####################################################### # # # 6. Installation and implementation of DeepPotential# # # ####################################################### 6.1. Introduction of DeepPotential DeepPotential is a method to predict the inter-residue spatial restraints including distances, inter-residue torsion angles, and hydrogen-bonding networks based on the ensemble of two complementary coevolution features coupling with deep residual networks. 6.2. How to install DeepPotential? When you unpack the DEMO Suite, the DeepPotential program is already installed in $pkgdir/external/restriplet3. 6.3. How to run DeepPotential program? Usage: runDistPre.pl -s protein_name -outdir input_dir [Options] To run DeepPotential, you need to prepare following input files: 'protein_name'--Mandatory, the name of the folder containg the sequence named as "seq.txt" 'input_dir'-----Mandatory, the directory which contains the query folder Output file of DeepPotential include: 'distance_pca_*.txt'---The predicted CA atom distance 'distance_pcb_*.txt'---The predicted CB atom distance 'distance_pomg_20.txt, distance_pphi_20.txt, and distance_ptheta_20.txt'---The predicted torsion angles 'distance_paa_.txt, distance_pbb_.txt, distance_pcc_.txt'---The predicted hydrogen-bonding networks 'distance_ca_contact.txt'---The predicted CA contact 'distance_cb_contact.txt'---The predicted CB contact 'distance_20.npz'---The predicted restraints (distances and orientations) in npz format A detailed readme can be found in th package. 6.4. How to cite DeepPotential? If you are using the DeepPotential program, you can cite: Li Yang, Zhang Chengxin, Zheng Wei, Zhou Xiaogen, Bell W. Eric, Yu Dongjun and Zhang Yang, Protein inter-residue contact and distance prediction by coupling complementary coevolution features with deep residual networks in CASP14. Proteins: Structure, Function, and Bioinformatics, 89: 1911-1921, 2021. ####################################################### # # # 7. Installation and implementation of FASPR # # # ####################################################### 7.1. Introduction of FASPR FASPR is a method for structural modeling of protein side-chain conformations. Starting from a backbone structure, FASPR samples the side-chain rotamers for each amino acid from the Dunbrack 2010 rotamer library with the atomic interaction energies calculated using an optimized scoring function extended from EvoEF2, where side-chain packing search is performed using a deterministic searching algorithm combining self-energy checking, dead-end elimination theorems, and tree decomposition. 7.2. How to install FASPR program? When you unpack the DEMO Suite, FASPR program is already installed at $pkgdir/bin/FASPR 7.3. How to run FASPR program? Usage: FASPR input.pdb output.pdb To run FASPR, you need to prepare following input files: 'input.pdb' Mandatory, input pdb file for side-chain packing. '-s' Optional, the sequence of the input.pdb Output files of FASPR include: 'output.pdb' output pdb file of the FASPR with side-chain packaged. A detailed readme file can be found in the FASPR package 7.4. How to cite FASPR? If you are using the FASPR program, you can cite: Xiaoqiang Huang, Robin Pearce, Yang Zhang. FASPR: an open-source tool for fast and accurate protein side-chain packing. Bioinformatics (2020) 36: 3758-3765.