DeepMSA2: deep multiple sequence alignment generation based on huge metagenome and structure model-based ranking system

Online Services

●I-TASSER ●I-TASSER-MTD ●C-I-TASSER ●CR-I-TASSER ●QUARK ●C-QUARK ●LOMETS ●MUSTER ●CEthreader ●SEGMER ●DeepFold ●DeepFoldRNA ●FoldDesign ●COFACTOR ●COACH ●MetaGO ●TripletGO ●IonCom ●FG-MD ●ModRefiner ●REMO ●DEMO ●DEMO-EM ●DMFold ●SPRING ●COTH ●Threpp ●PEPPI ●BSpred ●ANGLOR ●EDock ●BSP-SLIM ●SAXSTER ●FUpred ●ThreaDom ●ThreaDomEx ●EvoDesign ●BindProf ●BindProfX ●SSIPe ●GPCR-I-TASSER ●MAGELLAN ●ResQ ●STRUM ●DAMpred

●TM-score ●TM-align ●US-align ●MM-align ●RNA-align ●NW-align ●LS-align ●EDTSurf ●MVP ●MVP-Fit ●SPICKER ●HAAD ●PSSpred ●3DRobot ●MR-REX ●I-TASSER-MR ●SVMSEQ ●NeBcon ●ResPRE ●TripletRes ●DeepPotential ●WDL-RF ●ATPbind ●DockRMSD ●DeepMSA ●FASPR ●EM-Refiner ●GPU-I-TASSER

●BioLiP ●E. coli ●GLASS ●GPCR-HGmod ●GPCR-RD ●GPCR-EXP ●Tara-3D ●TM-fold ●DECOYS ●POTENTIAL ●RW/RWplus ●EvoEF ●HPSF ●THE-DB ●ADDRESS ●Alpaca-Antibody ●CASP7 ●CASP8 ●CASP9 ●CASP10 ●CASP11 ●CASP12 ●CASP13 ●CASP14

DeepMSA (version 2) is a hierarchical approach to create high-quality multiple sequence alignments (MSAs) for monomer and multimer proteins. The method is built on iterative sequence database searching followed by fold-based MSA ranking and selection. For protein monomers, MSAs are produced with three iterative MSA searching pipelines (dMSA, qMSA and mMSA) through whole-genome (Uniclust30 and UniRef90) and metagenome (Metaclust, BFD, Mgnify, TaraDB, MetaSourceDB and JGIclust) sequence databases. For protein multimers, a number of hybrid MSAs are created by pairing the sequences from monomer MSAs of the component chains, with the optimal multimer MSAs selected based on a combined score of MSA depth and folding score of the monomer chains. Large-scale benchmark data show significant advantage of DeepMSA2 in generating accurate MSAs with balanced depth and alignment coverage which are most suitable for deep-learning based protein and protein complex stucture and function predictions. To directly predict the structure model, please use DMFold server.

[Example output for monomer] [Example output for multimer] [Standalone package] [DeepMSA v1] [Help] [Forum]

Online server

Cut and paste your sequence (in FASTA format) below (The server accepts both monomer and complex sequences with monomer length in [30, 1500 AA], and complex length in [30, 3000 AA] with < 20 chains):
[Example input of monomer] [Example input of complex]

Or upload the sequence from your local computer:

Email: (mandatory, where results will be sent to)

ID: (optional, your given name of the protein)

Advanced options

MSA generation method:
Utilizing Uniclust30, Uniref90 and Metaclust databases (fast, same with DeepMSA v1 pipeline).
Utilizing Uniclust30, Uniref90, Metaclust, MGnify and BFD databases (medium).
Utilizing Uniclust30, Uniref90, Metaclust, MGnify, BFD, TaraDB, MetaSourceDB and JGIclust databases (slow).

References:

Wei Zheng, Qiqige Wuyun, Yang Li, Chengxin Zhang, P Lydia Freddolino, Yang Zhang. Improving deep learning protein monomer and complex structure prediction using DeepMSA2 with huge metagenomics data. Nature Methods, (2024). https://doi.org/10.1038/s41592-023-02130-4.
Chengxin Zhang, Wei Zheng, S M Mortuza, Yang Li, Yang Zhang. DeepMSA: constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins. Bioinformatics, 36: 2105-2112 (2020). [PDF] [Supporting Information]