Home Research COVID-19 Services Publications People Teaching Job Opening News Forum Lab Only
Online Services

I-TASSER D-I-TASSER I-TASSER-MTD C-I-TASSER CR-I-TASSER QUARK C-QUARK D-QUARK DRfold DRfold2 LOMETS MUSTER CEthreader SEGMER DeepFold DeepFoldRNA FoldDesign COFACTOR COACH MetaGO TripletGO ATGO IonCom FG-MD ModRefiner REMO DEMO DEMO-EM DMFold SPRING COTH Threpp PEPPI BSpred ANGLOR EDock BSP-SLIM SAXSTER FUpred ThreaDom ThreaDomEx EvoDesign BindProf BindProfX SSIPe GPCR-I-TASSER MAGELLAN ResQ STRUM DAMpred TCRfinder

TM-score TM-align US-align MM-align RNA-align NW-align LS-align TM-search EDTSurf MVP MVP-Fit SPICKER HAAD PSSpred 3DRobot MR-REX I-TASSER-MR SVMSEQ NeBcon ResPRE TripletRes DeepPotential WDL-RF ATPbind DockRMSD DeepMSA DeepMSA2 rMSA FASPR EM-Refiner GPU-I-TASSER

BioLiP HPmod E. coli GLASS GPCR-HGmod GPCR-RD GPCR-EXP Tara-3D TM-fold DECOYS POTENTIAL RW/RWplus EvoEF HPSF THE-DB ADDRESS Alpaca-Antibody CASP7 CASP8 CASP9 CASP10 CASP11 CASP12 CASP13 CASP14

D-I-TASSER is an advanced protein structure prediction method built upon the I-TASSER framework, designed for high-accuracy modeling of protein structures and functions. It constructs full-length protein models by integrating deep learning-derived spatial restraints with state-of-the-art iterative threading assembly simulations. This page provides an overview of the D-I-TASSER methodology, usage guidelines, and its performance in large-scale benchmark evaluations and blind CASP experiments.

  1. Methods
  2. D-I-TASSER is a hierarchical pipeline for hybrid deep-learning and threading-assembly based protein structure prediction. Compared to its predecessor I-TASSER, the major advancement in D-I-TASSER lies in the integration of deep-learning-based spatial restraints with traditional I-TASSER threading and assembly simulations.

    As shown in Figure 1 (upper panel), starting from a query sequence, the pipeline begins with multiple sequence alignment (MSA) generation using DeepMSA2. Structural templates and alignments are then identified by LOMETS3, which combines multiple contact- and profile-based threading algorithms. In parallel, multiple deep learning models, including AttentionPotential, DeepPotential, and AlphaFold2 (optional), are extended to predict spatial restraints, including contact/distance maps and hydrogen-bonding networks. These predictions are merged with LOMETS3-derived restraints into a unified potential termed "DeepPotential".

    Guided by both the deep-learning and threading-based restraints, along with the I-TASSER energy force field, D-I-TASSER performs Replica Exchange Monte Carlo (REMC) simulations to generate a large ensemble of structural decoys. The lowest free energy models are selected through SPICKER clustering. These models are further refined by fragment-guided molecular dynamics (FG-MD) simulations with sidechain repacked via FASPR. Model quality is estimated by the estimated TM-score (eTM-score).

    Figure 1 (lower panel) outlines the extension of D-I-TASSER for multi-domain protein modeling. Starting from the full-length sequence, a consensus of FUpred and ThreaDom is used to identify domain boundaries, where domain-level models are independently constructed using the single-domain D-I-TASSER protocol. Meanwhile, full-length MSA, spatial restraints, and template libraries are built from the full-chain sequence. The final full-chain structure is assembled using full-length D-I-TASSER simulations, guided by both domain-specific and global spatial restraints. Technically, domain-level folding of the full-length model is primarily driven by domain-specific threading and deep learning models, while the relative orientations between domains are determined by full-chain deep-learning restraints, inter-domain threading alignments, and the I-TASSER knowledge-based force field.




    Figure 1. Protocol of D-I-TASSER for single-domain (upper panel) and multi-domain (lower panel) protein structure prediction.


  3. Performance of D-I-TASSER server
    1. CASP14:
    2. CASP (or Critical Assessment of Techniques for Protein Structure Prediction) is a community-wide experiment for testing the state-of-the-art of protein structure prediction, which has taken place every two years since 1994. The experiment is strictly blind because the structures of the test proteins are unknown to the predictors. The D-I-TASSER program was first tested (as "Zhang-Server") in the 14th CASP experiment. Figure 2 presents the accumulative Z-score of GDT-TS for all partipant servers, where D-I-TASSER ranked as No 1 method over all 96 testing targets.


      Figure 2. Ranking of automated server methods in CASP14. Data was taken from https://www.predictioncenter.org/casp14/zscores_final.cgi?gr_type=server_only. D-I-TASSER participated in CASP14 as "Zhang-Server", while "QUARK" is another program from the Zhang Lab, which used D-I-TASSER restraints for guiding structure assembly for the TBM targets.

    3. CASP15:
    4. D-I-TASSER (as "UM-TBM") participated in 15th CASP experiment, which contained two sections of single-domain and multi-domain structure prediction. Figure 3 summarizes the structure prediction result of D-I-TASSER compared to other servers in CASP15, where D-I-TASSER ranked as No 1 in both single-doma and multi-domain sections.


      Figure 3. Performance of D-I-TASSER in CASP15. (A) Ranking of single-domain section in CASP15 (https://predictioncenter.org/casp15/zscores_final.cgi?formula=assessors&gr_type=server_only). (B) Ranking of multi-domain section in CASP15 (https://predictioncenter.org/casp15/zscores_interdomain.cgi). (C) Head-to-head TM-score comparison between D-I-TASSER and AlphaFold2 (Left for single-domain and Right for multi-domain targets); (D) Head-to-head TM-score comparison between D-I-TASSER and Wallner Group which built models by massive sampling using AlphaFold2. (E) TM-score of D-I-TASSER on 50 FM domains and 20 multi-domain CASP15 targets, compared to different versions of AlphaFold programs. Since some AlphaFold programs (e.g., AlphaFold 3) did not participate in CASP15, the AlphaFold models were created after CASP15 experiment in Figure 3E.

    5. Large-scale Benchmark Tests:
    6. To systematically examine D-I-TASSER, we collected a set of 500 non-redundant ‘Hard’ domains collected from SCOPe, PDB, and the CASP 8-14 experiments, for which no significant templates can be detected by LOMETS3 from the PDB after excluding homologous structures with a sequence identity >30% to the query sequences. Figure 4 summarizes the results of D-I-TASSER compared to the classic I-TASSER and different AlphaFold programs.


      Figure 4. Structure prediction results on 500 Hard non-redundant protein domains. (A) Head-to-head TM-score comparison of D-I-TASSER and classic I-TASSER programs. (B) Head-to-head TM-score comparison of D-I-TASSER and AlphaFold2.2. (C) Average TM-score by D-I-TASSER versus I-TASSER with different deep learning potentials, as well as different versions of AlphaFold programs.

  4. Server inputs
  5. The user needs to paste the fasta-formatted amino acid sequence into the input box, or upload the amino acid sequence of the query protein using the "Choose file" button.


    Figure 5. Input of D-I-TASSER.


  6. Server outputs
  7. The output of the D-I-TASSER server includes:
    • Up to five full-length atomic models (ranked based by cluster size)
    • Estimated accuracy of the predicted models (including an estimated TM-score for each model)
    • Predicted secondary structures
    • Predicted solvent accessibility
    • Predicted contact map, distance map and hydrogen bond networks
    • Top 10 threading alignments from LOMETS3
    • Top 10 proteins in PDB which are structurally closest to the predicted models
    • Predicted Enzyme Classification and the confidence score (if you check the "Predict protein function based on structure model (running time may be doubled)." option of the input)
    • Predicted GO terms and the confidence score (if you check the "Predict protein function based on structure model (running time may be doubled)." option of the input)
    • Predicted ligand-binding sites and the confidence score (if you check the "Predict protein function based on structure model (running time may be doubled)." option of the input)

    An illustrative example of the D-I-TASSER output can be seen from below:

    1. Secondary structure, solvent accessibility, contact map,distance Map, and hydrogen bond networks information:

      Figure 6. Secondary structure, solvent accessibility, contact map,distance Map, and hydrogen bond networks information in the D-I-TASSER output.

    2. Templates, final models, and analog information:

      Figure 7. Templates, final models, and analog information in D-I-TASSER output.

    3. Gene Ontology (GO) Term prediction information:

      Figure 8. Gene Ontology (GO) Term prediction information in the D-I-TASSER output. This information is output only after you check the "Predict protein function based on structure model (running time may be doubled)." option of the input.

    4. Enzyme Commission (EC) and ligand binding site prediction information:

      Figure 9. Enzyme Commission (EC) and ligand binding site prediction information in the D-I-TASSER output. This information outputs only after you check the "Predict protein function based on structure model (running time may be doubled)." option of the input.


    5. Output guide and interpretation tips
      The output of the D-I-TASSER modeling results are generally summarized in a webpage, the link of which is sent to the user by their registered email after the modeling is completed. In the following, we present answers to several most frequently asked questions in interpreting the D-I-TASSER results:

      • What are the 'top 10 threading templates used by D-I-TASSER'?

        D-I-TASSER modeling starts from the structure templates identified by LOMETS3 from the PDB library. LOMETS3 is a meta-server threading approach containing multiple threading programs, where each program can generate tens of thousands of templates. D-I-TASSER only uses the templates of the highest significance in the threading alignments, which are measured by the Z-score (the difference between the raw and average scores in the unit of standard deviation). The top 10 templates are the 10 templates selected from the LOMETS3 threading programs. Usually, one (or two) template with the highest Z-score is selected from each threading program, where the threading programs are sorted by the average performance in the large-scale benchmark test experiments.

      • What are the 'top final models from D-I-TASSER'?

        For each target, D-I-TASSER simulations generate tens of thousands of conformations (called decoys). To select the final models, D-I-TASSER uses the SPICKER program to cluster all the decoys based on pair-wise structure similarity, and report up to five models which correspond to the five largest structure clusters. In Monte Carlo theory, the largest clusters correspond to the states of the largest partition function (or lowest free energy) and therefore have the highest confidence. The confidence of each model is quantitatively measured by eTM-score (see below). Since the top 5 models are ranked by the cluster size, it is possible that the lower-rank models have a higher eTM-score. Although the first model has a higher eTM-score and a better quality in most cases, it is not unusual that the lower-rank models have a better quality than the higher-rank models. If the D-I-TASSER simulations converge, it is possible to have less than 5 clusters generated. This is usually an indication that the models are high quality because of the converged simulations.

      • What are 'Proteins with similar structure'?

        After the structure-assembly simulation, D-I-TASSER uses the TM-align program to match the first D-I-TASSER model to all structures in the PDB library. This section reports the top 10 proteins from the PDB which have the closest structural similarity (i.e. the highest TM-score) to the predicted D-I-TASSER model. Due to their structural similarity, these proteins often have similar function to the target. However, users are encouraged to use the function prediction in D-I-TASSER output to obtain the biological function of the target protein, since D-I-TASSER predicts the function using COACH and COFACTOR, which have been extensively trained to derive function from many sequence and structure features, and as a result, these programs have a much higher accuracy than function annotations derived only from the global structure comparison.

      • How can I know if my model is successfully folded?

        Since the experimental structures are unknown for the user input sequence, we have designed an estimated TM-score (eTM-score) to quantitatively estimate the quality of the D-I-TASSER models. The eTM-score is a linear combination of three components: significance of the LOMETS3 threading alignments, satisfaction rate of the predicted contact-maps, the model fitting rate of predicted distance-maps, and the decoy convergence degree of the D-I-TASSER simulations. Based on benchmark testing, the eTM-score had a Pearson correlation coefficient (PCC) of 0.757 with TM-score. As a result of this high correlation, we were able to select an eTM-score cutoff of 0.5, corresponding to an estimated TM-score=0.5, and attain a Matthews correlation coefficient (MCC) on the benchmark dataset of 0.644 and a false discovery rate (FDR) of only 2.71%. Therefore, the D-I-TASSER models with eTM-score > 0.5 are considered to be successfully folded.

      • What is TM-score?

        TM-score is a metric for measuring the structural similarity between two structures (see Zhang and Skolnick, Scoring function for automated assessment of protein structure template quality, Proteins, 2004 57: 702-710). The purpose of proposing TM-score is to solve the problem of RMSD which is sensitive to local errors. Because RMSD is an average distance of all residue pairs in two structures, a local error (e.g. a misorientation of the tail) will result in a big RMSD value although the global topology is correct. In TM-score, however, the small distance is weighted stronger than the big distance, which makes the score insensitive to local modeling errors. A TM-score > 0.5 indicates a model of correct topology and a TM-score < 0.17 means a random similarity. These cutoffs are not dependent on the protein length.

      • What is eTM-score?

        eTM-score is designed to quantitatively evaluate the quality of the D-I-TASSER models. It is derived from a linear combination of 4 components, including the significance of LOMETS threading alignments, the satisfaction rate of predicted contact-maps, the model fitting rate of predicted distance-maps, and the decoy convergence degree of D-I-TASSER simulations. An eTM-score of higher value signifies a model of high confidence.

      • What is difference and relationship between eTM-score and TM-score?

        TM-score (or RMSD) is a known standard for measuring structural similarity between two structures and is typically used to measure the accuracy of structure modeling when the native structure is known. eTM-score is a metric that was developed for D-I-TASSER to estimate the confidence of modeling. In the case where the native structure is not known, it becomes necessary to use the eTM-score predict the quality of the modeling prediction, i.e. the distance between the predicted model and the native structures.

      • In a benchmark test set of 797 proteins, we found that eTM-score is highly correlated with TM-score. The correlation coefficient of the eTM-score of the first model with the TM-score to the native structure is 0.757. These data lay the base for the reliable prediction of the TM-score using eTM-score. In the output section, D-I-TASSER reports the eTM-scores of all predicted models for reference.

How to cite D-I-TASSER?

[Back to server]

zhanglabzhanggroup.org | +65-6601-1241 | Computing 1, 13 Computing Drive, Singapore 117417