SADA

SADA is a structural analogue-based protein structure domain assembly method assisted by deep learning, which includes 5 steps. (1) detects structural analogues of the full-chain from the constructed multi-domain protein structure database (MPDB) according to the input protein domain models; (2) Constructs an initial model based on the detected 1st-ranked analogue; (3) Utilizes a deep learning network to predict the inter-residue distance distribution; (4) Builds multi-domain protein specificity force field for guiding domain assembly based on the predicted residues distance distribution and the property of multi-domain protein; (5) Assembles the domain models to generate final full-chain model by the proposed two-stage differential evolution algorithm from the initial model. (see example for a SADA assembly result of 2-domains protein ).

SADA also provides other 2 functions. (1) Structural analogues detection; (2) Culling the whole MPDB according to input criteria.

Functions:

Download

Multi-domain protein structure database_current, Current version: September, 2024(~9.8G).

Domain_boundary_current.txt,This file contains domain boundary annotations for 255,330 domain templates.

Multi-domain protein structure database_v1, version: September, 2021(~3.4G).

Multi-domain protein structure database_v2, version: September, 2022(~3.8G).

SADA news

2021-09: The Multi-domain protein structure database (with 48,225 entries) was released by Dr. Guijun Zhang's research group at Zhejiang University of Technology.

2022-09: The Multi-domain protein structure database was update, where these diverse multi-domain proteins with same sequence are added in MPDB (4,900 proteins were added).

2022-12: The inter-domain distance prediction network with attention mechanisms was developed, in which the novel features and strategy for multi-domain proteins are designed.

2024-09: The Multi-domain protein structure database was update,adding 50,000 strictly deredundant multidomain proteins.

guijunzhanglab@163.com

References:

Chun-Xiang Peng, Xiao-Gen Zhou, Yu-Hao Xia, Jun Liu, Ming-Hua Hou and Gui-Jun Zhang. Structural analogue-based protein structure domain assembly assisted by deep learning, Bioinformatics, 2022, 38(19): 4513-4521. download

Culling proteins from MPDB

Multi-domain protein structure database (MPDB) is constructed through 3 steps. (1) CD-HIT is used to remove redundancy of protein structures with a sequence identity cutoff 100% in PDB, and then protein structures with sequence identity less than 100% are fetched from PDB; (2) DomainParser is next used to determine whether these proteins are multi-domain proteins or not; (3) the single-domain proteins determined by DomainParser are further confirmed by CATH and SCOPe on whether they are multi-domain proteins; All the multi-domain proteins selected in the above 3 steps are finally collected to construct the MPDB.

Until September, 2021, MPDB contains 48225 multi-domain proteins, in which 37495 proteins with 2 domains, 7539 proteins with 3 domains, 2182 proteins with 4 domains, 1009 proteins with more than 4 domains.

In this function, users can cull the whole MPDB according to input criteria, and produce subsets of multi-domain proteins structures from MPDB and info.txt file. The pdbinfo.txt file includes corresponding protein size, experiment type, resolution, R-factor, number of domains, methods for decomposing domains and domains boundary (see example).

Criteria
Protein chain length: to (Default: 40 to 400)

resolution: to (Default: 0.0 to 3.0)

Number of domains: to (Default: 2 to 5)

Maximum R-value: (Default: 0.25)

Sequence identity: (Default: 0.7)

User information

Email: (mandatory, where results will be sent to)

Job name: (optional, your given name to this job)

References:

Chun-Xiang Peng, Xiao-Gen Zhou, Yu-Hao Xia, Jun Liu, Ming-Hua Hou and Gui-Jun Zhang. Structural analogue-based protein structure domain assembly assisted by deep learning, Bioinformatics, 2022, 38(19): 4513-4521.download

<< Back to SADA home

Detecting structural analogues from MPDB

In this function, users can detect the structural analogues for query domains. In detection, Query individual domains are aligned on each protein of the whole MPDB, with no overlap allowed in the alignments of different domains. The harmonic mean of the TM-score of all domains is defined as the global score (LSscore) for each protein in MPDB, and top 200 multi-domain protein structural analogues with the highest LSscores are output. The query domains and related information of the 200 structural analogues are recorded in a *.txt file (see example for query domains of 1fx7A).

Input the full-chain sequence:

Please copy and paste your sequence file here in FASTA format (mandatory). Sample input

Or upload the sequence file:

Input the domain structures in order:

Input the structure of domain 1 in PDB format (mandatory):
Please copy and paste your structure file here. Sample input

Or upload the sequence file:

Input the sequence identity: (Default: 1.0)

User information

Email: (mandatory, where results will be sent to)

Job name: (optional, your given name to this job)

References:

Chun-Xiang Peng, Xiao-Gen Zhou, Yu-Hao Xia, Jun Liu, Ming-Hua Hou and Gui-Jun Zhang. Structural analogue-based protein structure domain assembly assisted by deep learning, Bioinformatics, 2022, 38(19): 4513-4521.download

<< Back to SADA home

SADA domain assembly

Input the full-chain sequence:

Please copy and paste your sequence file here in FASTA format (mandatory). Sample input

Or upload the sequence file:

Input the domain structures in order:

Input the structure of domain 1 in PDB format (mandatory):
Please copy and paste your structure file here. Sample input

Or upload the seqence file:

Input the structure of domain 2 in PDB format (mandatory):
Please copy and paste your structure file here. Sample input

Or upload the seqence file:

Input the sequence identity: (Default: 1.0)

Note: here, sequence generated by the input domain models is used to calculate sequence identity.

User information

Email: (mandatory, where results will be sent to)

Job name: (optional, your given name to this job)

References:

Chun-Xiang Peng, Xiao-Gen Zhou, Yu-Hao Xia, Jun Liu, Ming-Hua Hou and Gui-Jun Zhang. Structural analogue-based protein structure domain assembly assisted by deep learning, Bioinformatics, 2022, 38(19): 4513-4521.download

<< Back to SADA home