DeepMSA (version 2) is a hierarchical approach to create high-quality multiple sequence alignments (MSAs)
for monomer and multimer proteins.
The method is built on iterative sequence database searching followed by fold-based
MSA ranking and selection.
For protein monomers, MSAs are produced with three iterative MSA searching pipelines (dMSA, qMSA and mMSA)
through whole-genome (Uniclust30 and UniRef90) and
metagenome (Metaclust, BFD, Mgnify, TaraDB, MetaSourceDB and JGIclust) sequence databases.
For protein multimers, a number of hybrid MSAs are created by pairing the sequences from
monomer MSAs of the component chains, with the optimal multimer MSAs selected based on a combined score of
MSA depth and folding score of the monomer chains.
Large-scale benchmark data show significant advantage of DeepMSA2 in generating accurate MSAs
with balanced depth and alignment coverage which are most suitable for deep-learning based
protein and protein complex stucture and function predictions. To directly predict the structure model,
please use
DMFold server.
[Example output for monomer]
[Example output for multimer]
[Standalone package]
[DeepMSA v1]
[Help]
[Forum]
Online server
References:
- Wei Zheng, Qiqige Wuyun, Yang Li, Chengxin Zhang, P Lydia Freddolino, Yang Zhang.
Improving deep learning protein monomer and complex structure prediction using DeepMSA2 with huge metagenomics data.
Nature Methods, (2024). https://doi.org/10.1038/s41592-023-02130-4.
- Chengxin Zhang, Wei Zheng, S M Mortuza, Yang Li, Yang Zhang.
DeepMSA: constructing deep multiple sequence alignment to improve
contact prediction and fold-recognition for distant-homology proteins. Bioinformatics,
36: 2105-2112 (2020).
[PDF]
[Supporting Information]