Network clustering is a very popular topic in the network science field. Its goal is to divide (partition) the network into groups (clusters or communities) of "topologically related" nodes, where the resulting topology-based clusters are expected to "correlate" well with node label information, i.e., metadata, such as cellular functions of genes/proteins in biological networks, or age or gender of people in social networks. Even for static data, the problem of network clustering is complex. For dynamic data, which is what we focus on, the problem is even more complex, due to an additional dimension of the data - their temporal (evolving) nature. Since the problem is computationally intractable, heuristic approaches need to be sought. Existing approaches for dynamic network clustering (DNC) have drawbacks. First, they ignore temporal information in their early steps, and when they do consider this information later on, they do so implicitly. We hypothesize that capturing temporal information earlier in the clustering process and doing so explicitly will improve results. Second, the existing approaches assume that nodes should be in the same cluster if they are densely interconnected within the network. We hypothesize that in some applications, it might be of interest to cluster) nodes that are topologically similar to each other instead of or in addition to requiring the nodes to be densely interconnected. We test these two hypotheses via our new approach called ClueNet. We evaluate ClueNet against six existing DNC methods on both social networks capturing evolving interactions between individuals (such as interactions between students in a high school) and biological networks capturing interactions between biomolecules in the cell at different ages. We find that ClueNet is superior in 75% of all evaluation tests. As more real-world dynamic data are becoming available, DNC and thus ClueNet will only continue to gain importance.
Reference: Joseph Crawford and Tijana Milenković (2017), ClueNet: Clustering a temporal network using dynamic, under review.