DPIC is a protein complex inter-chain residue-residue distance prediction method that is trained on domain-level data. Given two protein sequences, the method first uses HHblits to search the Uniref30 sequence database to build the multiple sequence alignment (MSA) for each monomer sequence, with the monomer structures predicted using AlphaFold2. Then, sequence features are extracted, including one-hot encoding, physical and chemical properties. The MSA transformer is employed to extract the attention map and vector embedding from the MSA, along with position-specific scoring matrix (PSSM) as MSA features. From the structure, interface residue propensity, ultrafast shape recognition, and intra-chain distance are extracted as structural features. Finally, a deep learning network that integrates the MCNN module and the triangular interaction module is used to predict the inter-chain residue-residue distance.
|