NIH 1R01GM120733-01A1 ($1.1M, 2016-2020, with Dr. Clark, Dr. Emrich, and Dr. Li)
AFOSR YIP FA9550-16-1-0147 ($360K, 2016-2019)
NSF CAREER CCF-1452795 ($540K, 2015-2020)
NSF CCF-1319469 ($453K, 2013-2017)
NSF EAGER CCF-1243295 ($208K, 2012-2014)
NIH 1R01HL117757-01A1 ($2.9M, 2014-2018, with 6 other PIs/co-Is across three departments)
Google Faculty Research Award ($33K, 2013-2014, with Dr. Striegel)
As key personnel:
NIH 1R21AI111286-01A1 ($418K, 2014-2016, with Dr. Ferdig)
NIH R01 Supplement 3R01GM074807-07S1 ($249K, 2012-2014, with Dr. Clark and Dr. Emrich)
For more details, see my CV.
Networks have been invaluable models for studying an enormous array of real-world phenomena in many research domains, such as ecological, chemical, social, technological, and transportation systems. Graph theory is an elegant framework for studying not only individual constituents of a system but also complex events that emerge from interactions among them, thus enabling extraction of functional information about the system that is not encoded in any of the constituents alone. Despite the exponential growth of currently available network data, we have only begun collecting the data and hence, the available data are noisy and incomplete. Also, we can hardly even describe the data mathematically, much less understand it theoretically. Since real-world networks are complex, it is impossible to understand them without computational analyses. To date, tools for network analysis are in their infancy. Hence, CoNe lab develops novel, mathematically rigorous graph theoretic and computational methods for network analysis, modeling, clustering, and alignment/comparison.
Given a network, the first step in network analysis is to characterize its structure (or topology) via network properties, such as the network size, number of connected components, degree distribution (which measures the fraction of nodes in a network having a given degree), subgraph counts, etc. Next, one needs to understand mechanisms by which a network with an observed property can be generated, i.e., find a network model that closely replicates the structure of the network. Evaluating the fit of a model to the data requires comparing model with real-world networks. However, exact comparisons of large networks are computationally infeasible, owing to the underlying subgraph isomorphism problem, which asks if a graph exists as an exact subgraph of another graph. Hence, to evaluate the fit of a model to the data, the above mentioned heuristics, network properties, are used. The choice of a property is non-trivial: different models might be identified as best-fitting to the data with respect to different properties. The more stringent the property, the more credible the fit of a model to the data. Hence, our goal is to devise new network model generators that will be more descriptive of the existing network data (keeping in mind that we need to revise the models in the light of new, more complete data), as well as design more constraining network properties for evaulating the fit of a model to the data.
Another challenge in network analysis is to exploit functional homogeneity of groups of nodes that show some "coherence" in the network, by partitioning the network into clusters or communities, groups of densely interconnected or topologically similar nodes, and assigning the entire cluster with a function based on functions of the majority of its members. A variety of clustering (or community detection) approaches exist, none of which can be universally used to solve all problems, since each has its own (dis)adantages. Our goal is to devise new clustering algorithms that would enable more efficient extraction of functional information from network structure.
In addition to analyzing an individual real-world network, one of our major research interests is development of methods for contrasting/comparing two or more networks. Sometimes, one may know the "mapping" (i.e., correspondence) between nodes of different networks that are to be compared, e.g., if the networks represent the same system at different time points. The goal in this case is to understand network evolution by quantifying the structural change between the compared networks and understand how the network changes affect the function of the system at different time points. Other times, one does not know the mapping between the nodes of different networks that are to be compared, e.g., if the networks represent biological networks of different species that have different sets of proteins. In this context, network alignment contrasts such networks to identify topologically and functionally (dis)similar network regions. In biology, networks of different species can be aligned to transfer biological knowledge from simpler, easy-to-manipulate, and well characterized model species to more complex, hard-to-manipulate, and poorly characterized species. Alignment of biological networks is expected to have at least as valuable impact on our biological understanding as sequence alignment has had. However, unlike sequence alignment, due to non-linear nature of networks, network alignment is computationally infeasible to solve exactly, owing again to computational infeasibility the subgraph isomorphism problem. Hence, approximate algorithms are sought. Our goal is to devise state-of-the-art algorithms for network comparisons and alignments. Network alignment has applications outside biomedical domain as well. For example, in social networks, network alignment has been used to de-anonymize Flickr data and win a machine learning challenge on link prediction (see here for details).
In biomedical domain, we at the CoNe lab apply our methods to identification of novel protein functions, disease and pathogen-interacting genes, drug targets, evolutionary conserved interactions and protein complexes, and missing and wrongly reported interactions, reconstruction of species phylogeny, and understanding mechanisms of pathogenicity, drug resistance, and phenotypic variation in general. We apply the methods to other research domains as well. In computational chemistry and synthetic biology, the proposed methods can analyze network representations of proteins structures and drug compounds, hence deepening our understanding of protein stability and folding and leading to more efficient drug design. In the context of ecological systems and climate change, the methods can study population dynamics, interactions between species, and ecological processes responsible for species' environmental adaptation to global (climate) change. In technological networks, they can analyze network patterns of attacks of malicious Internet users, hence aiding computer security. In social networks, they can study computer-mediated social behavior or targeted advertising/recommender systems.