Modeling multi-scale data via a network of networks

Shawn Gu, Meng Jiang, Pietro Hiram Guzzi, and Tijana Milenkovic

Prediction of node and graph labels are prominent tasks in network science. The data analyzed in these tasks are sometimes related: entities represented by nodes in a higher-level (i.e., higher-scale) network can themselves be modeled as networks at a lower level. So, we argue that systems involving such entities should be integrated with a "network of networks" (NoN) representation. Then, we ask whether entity label prediction using integrated, multi-level NoN data is more accurate than using each of single-level node and graph data alone, i.e., than node label prediction on the higher-level network and graph label prediction on the lower-level networks. In this study, we design a novel framework to investigate this question. To obtain data, we develop the first synthetic NoN generator that can control a variety of network structural properties, and we construct a real biological NoN. We extend traditional single-level node and graph label prediction approaches to their NoN counterparts, and we propose a novel, integrative graph neural network model for performing label prediction directly on NoNs. We evaluate the accuracy of each approach on the synthetic (predicting artificial labels) and biological (predicting proteins' functions) NoNs. For the synthetic NoNs, we find that our NoN approaches outperform or are as good as node- and network-level ones depending on the NoN properties. For the biological NoN, we find that our NoN approaches outperform the single-level approaches for a little under half of the protein functions, and for 30% of the protein functions, only our NoN approaches make meaningful predictions, while node- and network-level ones achieve random accuracy. As such, NoN-based data integration is an important and exciting research direction.

Reference: Shawn Gu, Meng Jiang, Pietro Hiram Guzzi, and Tijana Milenkovic (2021), Modeling multi-scale data via a network of networks, submitted.

Contact: tmilenko [at] nd [dot] edu

Software: The source code and data are available for download, along with detailed usage instructions.