Multi-layer sequential network analysis improves protein 3D structural classification

 

Contact: Tijana Milenkovic, tmilenko AT nd DOT edu

Abstract: Protein structural classification (PSC) is a supervised problem of assigning proteins into pre-defined structural (e.g., CATH or SCOPe) classes based on the proteins' sequence or 3D structural features. We recently proposed PSC approaches that model protein 3D structures as protein structure networks (PSNs) and analyze PSN-based protein features, which performed better than or comparable to state-of-the-art sequence or other 3D structure-based PSC approaches. However, existing PSN-based PSC approaches model the whole 3D structure of a protein as a static (i.e., single-layer) PSN. Because folding of a protein is a dynamic process, where some parts (i.e., sub-structures) of a protein fold before others, modeling the 3D structure of a protein as a PSN that captures the sub-structures might further help improve the existing PSC performance. Here, we propose to model 3D structures of proteins as multi-layer sequential PSNs that capture 3D sub-structures of proteins, with the hypothesis that this will improve upon the current state-of-the-art PSC approaches that are based on single-layer PSNs (and thus upon the existing state-of-the-art sequence and other 3D structural approaches). Indeed, we confirm this on 72 datasets spanning ~44,000 CATH and SCOPe protein domains.

Reference: Khalique Newaz, Jacob Piland, Patricia Clark, Scott Emrich, Jun Li, and Tijana Milenkovic (2021), Multi-layer sequential network analysis improves protein 3D structural classification, PROTEINS: Structure, Function, and Bioinformatics, 90(9):1721-1731, 2022.

The source code with relevant data, along with the installation and usage instructions can be downloaded from github.