Home Research COVID-19 Services Publications People Teaching Job Opening News Forum Lab Only
Online Services


TM-score TM-align US-align MM-align RNA-align NW-align LS-align EDTSurf MVP MVP-Fit SPICKER HAAD PSSpred 3DRobot MR-REX I-TASSER-MR SVMSEQ NeBcon ResPRE TripletRes DeepPotential WDL-RF ATPbind DockRMSD DeepMSA FASPR EM-Refiner GPU-I-TASSER


[Back to DEMO-EM Homepage]

About DEMO-EM Pipeline

What is DEMO-EM?

    DEMO-EM is a method specially designed for constructing high-resolution structures for proteins containing multiple domains, starting from amino acid sequence and cryo-EM density maps.

How does DEMO-EM generate multi-domain protein structure predictions?

    When user submits an amino acid sequence and the corresponding cryo-EM density map, the server first predicts the domain boundaries by FUpred and ThreaDom two locally installed protein domain boundaries prediction methods. Meanwhile, the inter-domain distances are predicted with a deep convolutional neural-network predictor DomainDist extended from TripletRes.

    In the second step, the model of each inividual domain (or the full-length protein if it is predicted as a signle-domain protein by FUpred) is generated by I-TASSER, which also can be locally installed.

    In the third step, each of the individual domain models is independently fit into the density map by quasi-Newton searching to create initial full-length models.

    In the fourth step, the initial full-length models are optimized by a two-round rigid-body replica-exchange Monte Carlo (REMC) simulation to minimize the density correlation score (DCS) between the density map and the full-length model. In the first round, domains are treated as particles where a quick REMC simulation is made to quickly adjust the individual domain positions based on the global model-density correlations. The second round of rigid-body REMC simulation is to fine-tune the domain poses with a more detailed energy force field.

    In the fifth step, the lowest DCS model selected from the rigid-body assembly simulations undergoes a flexible assembly with atom-, segment-, and domain-level refinements using REMC simulation guided by the DCS and DomainDist predicted inter-domain distance profiles coupled with a knowledge-based force field, with the resulting decoy conformations clustered by SPICKER to obtain a centroid model.

    In the last setp, the flexible assembly simulation is performed again for the full-atomic model with constraints from centroid models clustered by SPICKER adding to the energy, and the final model is created from the lowest energy model after side-chain repacking with FASPR and FG-MD.

    Figure 1. Pipeline of DEMO-EM for multi-domain protein structruces modeling from cryo-EM density maps.

What are the performances of DEMO-EM server compared with other methods?

    The DEMO-EM was compared with two widely used methods MDFF and Rosetta for cryo-EM density map guided modeling over 357 multi-domain proteins using synthesized density maps. Since MDFF and Rosetta need to start from full-length models, we built the initial full-length models by fitting each domain model into density maps using Situs. As shown in Figure 2, DEMO-EM obviously outperformed both control methods, with the average TM-score of the full-length models with experimental domains 15.1% and 25.3% higher than that of MDFF and Rosetta (Figure 2a), respectively. When D-I-TASSER domains were used, the TM-score improvement of DEMO-EM increases to 63.5% relative to MDFF and 88.9% to Rosetta (Figure 2b), respectively. We also compared DEMO-EM with an de novo method, MAINMAST, for cryo-EM density maps modeling. DEMO-EM achieves an average TM-score 142.8% higher than that of MAINMAST for the final full-length model.

    Figure 2. (a) Mean and distribution of TM-score for models by DEMO-EM, MDFF, Rosetta, and MAINMAST using synthesized density maps, respectively. (b) Boxplot and distribution for RMSD of models by DEMO-EM, MDFF, Rosetta, and MAINMAST using synthesized density maps, respectively. (c) TM-score of full-length models constructed by DEMO-EM, MDFF, Rosetta, and MAINMAST using experimental density maps.

    Figure 2c is the summary of TM-scores of models created by DEMO-EM, MDFF, Rosetta, and MAINMAST over 51 cases using experimental density maps. Their domain bounaries are predicted by FUpred and ThreaDom, and the domain structruces modelled by D-I-TASSER. DEMO-EM also outperformed the control methods, with the average TM-score of the full-length models 60.0%, 87.2%, and 144.4% higher than that of MDFF, Rosetta, and MAINMAST, respectively.
What are the output of the DEMO-EM server if you submit a seqeunce?

    The output of the DEMO-EM server include:
    • The top five full-length atomic models (ranked based on the energy)
    • The correlation coefficient score (CC-score) and the Fourier shell correlation score (FSC-score) between the full-length model and the density map
    • The estimated TM-score following the correlation observed between CC-score and FSC-score
    • The CC-score of each residue in the full-length model
    • All invidual domain models predicted by D-I-TASSER
    • The domain definition predicted by FUpred
    • The top 5 full-length models with side-chain repacked by FASPR and FG-MD
    An illustrative example of the DEMO-EM output can be seen from here.

How to interpret the output data generated by the DEMO-EM server?

    The outputs of the DEMO-EM modeling results are generally summarized in a webpage, the link of which is sent to the users by email after the modeling is completed (see an example of DEMO-EM output).

    • What is the 'top 5 models constructed by DEMO-EM'?

      For each target, DEMO-EM reports the top five models ranked by the total energy. Since the top 5 models are ranked by the energy, it is possible that the lower-rank models have a higher CC-score or FSC-score. Although the first model has a higher CC-score or FSC-score and a better quality in most cases, it is not unusual that the lower-rank models have a better quality than the higher-rank models.

    • What is CC-score?

      CC-score is the correlation and coefficient score between the experimental density and the density probed from a model.

    • What is FSC-score?

      FSC-score measures the normalised cross-correlation coefficient between the experimental volumes and the volumes probed from a model over corresponding shells in Fourier space.

    • What is TM-score?

      TM-score is a recently proposed scale for measuring the structural similarity between two structures (see Zhang and Skolnick, Scoring function for automated assessment of protein structure template quality, Proteins, 2004 57: 702-710). The purpose of proposing TM-score is to solve the problem of RMSD which is sensitive to the local error. Because RMSD is an average distance of all residue pairs in two structures, a local error (e.g. a misorientation of the tail) will arise a big RMSD value although the global topology is correct. In TM-score, however, the small distance is weighted stronger than the big distance which makes the score insensitive to the local modeling error. A TM-score >0.5 indicates a model of correct topology and a TM-score<0.17 means a random similarity. These cutoff does not depends on the protein length.

      Here the "Estimated TM-score" is an estimated value of TM-score over the correlation between TM-score and CC-score/FSC-score which is observed by a nonredundant training set.

How long does it take for DEMO-EM to generate the predictions for your protein?

    It usually takes server hours to 1~2 days from submitting a sequence to receiving the prediction results. But if too many sequences are accumulated in the queue, the procedure may take a longer time. The time also depends on the protein size and a smaller protein takes shorter time than a larger protein.

    In addition, if you choose to use inter-domain distances predicted by deep learning or structural analogous multi-domain templates to guide the assembly, it will require much extra time to complete the job. Because the mutiple sequence alignment generation needs to detect a huge library, and the multi-domain templates identification needs to evaluate the whole multi-domain library.

    However, it will cost less time if you provide the domain models since the program does not need to predict the domain bounaries and domain structruces.

How to cite DEMO-EM

    You are requested to cite following article when you use the DEMO-EM server:

    • Xiaogen Zhou, Yang Li, Chengxin Zhang, Wei Zheng, Guijun Zhang, Yang Zhang. Progressive assembly of multi-domain protein structures from cryo-EM density maps. Nature Computational Science, 2: 265-275 (2022).

Funding support

    The development of DEMO-EM server is supported by the National Institute of General Medical Sciences (GM136422 and S10OD026825), the National Institute of Allergy and Infectious Diseases (AI134678), the National Science Foundation (IIS1901191 and DBI2030790). This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by the National Science Foundation (ACI1548562).

Contact information

    The DEMO-EM server is in active development with the goal to provide the most accurate multi-domain protein structure modeling from cryo-EM density maps. Please help us achieve the goal by sending your questions, feedback, and comments to yangzhanglab@umich.edu.

yangzhanglabumich.edu | (734) 647-1549 | 100 Washtenaw Avenue, Ann Arbor, MI 48109-2218