Social & Information Networks

Link prediction is the task of predicting previously unobserved relationships between entities. There are many exciting applications of this particular area of network science. Most research in the area of link prediction has been restricted to scoring based on a single measure within network topologies. Our work is developing a powerful new measure and placing existing measures in the context of a machine learning task. We are also casting the problem as a high class imbalance task.

Influence Drives the Emergence and Growth of Social Networks

Network Influence

Social influence has been a widely accepted phenomenon in social networks for decades. This includes influence maximization, influence selection and quantification, and influence validation. Different from existing work, our research focuses on the effects of social influence on the evolution of social networks, aiming to answer that whether social influence is a strong force shaping the network dynamics. The problem is explored from both microscopic and macroscopic perspectives. In microscopic level, we try to answer the question that whether the model derived from social influence propagation mechanism can yield high precision in the link prediction problem. While from macroscopic perspective, we are also interested to know whether the model hypothesized from social influence spreading is able to explain popular scaling-laws in social networks. Our objective is to unveil the significant factors with a great degree of precision than has heretofore been possible, and shed new light on networks evolution.

Longitudinal Analysis and Modeling of Large-Scale Social Networks

Longitudinal Network

The growth in information technology systems is generating new sources of data on human behavior that are only now beginning to be analyzed. Digital communications systems log communication events and therefore contain valuable information on usage patterns that can be used to map social networks and analyze human behaviors within them. The availability of this data of over millions of individuals provides the potential to induce transformative changes in the way we analyze and understand human behavior. The data generated by digital communication technologies has five key traits that have the potential to transform the way researchers study social networks: 1) quality of statistics (the data comes from millions of users), 2) purely observational (non-obtrusive measurement), 3) complete network data (not just information on the ego networks of a sample of people) 4) longitudinal (spanning several years), and 5) spatial information (e.g., cell-phones can be geographically located). Data of such extent and longitudinal character brings with it novel challenges which can only be tackled by a well orchestrated multidisciplinary approach involving network social science, physics methods developed for large-scale interacting particle systems, mathematical statistics and data analysis, and computer science methods of data mining, community detection algorithms and agent-based modeling.

Understanding Peace Processes through Social Media

Peace Process

Colombia's final peace agreement was a culmination of a decade-long peace process that outlines significant social, political and economic reforms to end the longest fought armed conflict in the Western Hemisphere. Peace processes are complex, protracted, contentious and dynamic systems which involve significant bargaining and compromising among various societal and political stakeholders. Social media yields tremendous power in peace processes as a tool for dialogue, debate, organization, and mobilization thereby adding more complexity by opening the peace process to public influence. Various indicators such as renunciation of violence during talks, establishing a negotiating agenda and its sequences, public support, and external guarantees can enable us to better understand peace process dynamics and predicting their outcomes. In this paper, we study two important indicators: inter-group polarization and public sentiment towards the Colombian peace process. We present a detailed linguistic analysis to detect inter-group polarization and understand differences in signals emerging from polarized groups. We also present a predictive model which leverages tweet-based, content-based and user-based features to predict public sentiment towards the Colombian peace process as observed through social media.

Connecting the Dots to Infer Followers' Topical Interest on Twitter

Topical Interest

Twitter provides a platform for information sharing and diffusion, and has quickly emerged as a mechanism for organizations to engage with their consumers. A driving factor for engagement is providing relevant and timely content to users. We posit that the engagement via tweets offers a good potential to discover user interests and leverage that information to target specific content of interest. To that end, we have developed a framework that analyzes tweets to identify the interests of current followers and leverages topic models to deliver a personalized topic profile for each user. We validated our framework by partnering up with a local media company and analyzing the content gap between them and their followers. We also developed a mobile application that incorporates the proposed framework.

Representing big data as networks: the higher-order network approach


Network-based representation has quickly emerged as the norm in representing rich interactions in complex systems. For example, given the trajectories of ships, a global shipping network can be constructed by assigning port-to-port traffic as edge weights. However, the conventional first-order (Markov property) networks thus built captures only pairwise shipping traffic between ports, disregarding the fact that ship movements can depend on multiple previous steps. The loss of information when representing raw data as networks can lead to inaccurate results in the downstream network analyses. We have developed Higher-order Network (HON), which remedies the gap between big data and the network representation by embedding higher-order dependencies in the network. Click here to view the project website, which shows how existing network algorithms including clustering, ranking, and anomaly detection can be directly used on HON without modification, and influence observations in interdisciplinary applications such as modeling global shipping and web user browsing behavior. Video demo, source code in Python and testing data are also available.

Health & Wellness

Faced by enormous health care costs and an unsustainable system, more efficient medical practices are needed. Our work addresses this problem from both ends of the healthcare informatics spectrum. On one end, we have focused on the development of analytical models and statistical analysis ranging from lowest levels of personalized care, clinical data, to the highest level of population data to gain additional insights and perspectives into a clinical environment. On the other end of the spectrum our work focuses on the development of technologies aimed understanding technology’s role in addressing community based health and wellness problems.

Childhood Obesity


There are many isolated interventions dealing with childhood obesity today. Some focus on educating children, while others focus on getting kids active. This work, however, aims to use a collective impact intervention to unite these different areas. As a pioneer in this type of collective impact programming, the United Way aims to leverage many different community programs, some previously validated, such as CATCH, and others new and upcoming, such as prescription to play. Our work is to create a social wellness platform that will allow children to set and track wellness goals, as well as provide them feedback for progress and information pertaining to their specific interests. The ability to monitor progress is central, with the goal of showing users information on their improvement, not just their successes or failures. The platform also encourages users to join controlled social groups within classes and friends to challenge each other for improved performance and to reinforce positive behaviors.

Diabetes Risk and Management


Chronic diseases such as diabetes take a great deal of personal commitment and awareness to manage effectively. We understand that every individual is unique, and there may be many causes for these difficulties. However the current practice of retroactively treating this issues is both expensive and less effective than early action treatment. However we understand that in the challenging healthcare environment today creating wide spread interventions for all diabetic patients is not a practical solution. We believe that through the integration of technology and data mining into patient care we can augment the move away from this reactive paradigm to a preventative care model. Through a combination of personalized features we aim to identify those individuals at high risk for management issues. We then intend to determine a personalized course of action based on the resources available to that individual.

Population-Level Analysis

Population-Level Analysis

Utilizing population-level data we have undertaken a higher-level data science analysis drawing on the Center for Medicare and Medicaid Service (CMS) national public physician dataset. We aimed to open the discussion into how data from multiple sources, such as the CMS Medicare release, existing CMS datasets as well as additional external public data can be utilized to generate insights into new and interesting questions around clinical practice. This work focused on the concept of knowledge transfer and how experiences during education can shape a physician over the course of their career, posing the question: does a physician’s past experience in medical school shape their practicing decisions?



As healthcare becomes increasingly digitalized, we have been working to blend technology with society by developing a healthcare application that can help seniors live better. Our tablet-based application, aimed at enhancing the physical health, vitality, and brain fitness of seniors residing in independent living communities, is a patient-centric framework for medication, nutrition, and pain management designed specifically for senior patients. To help patients manage chronic diseases, the application provides alerts for daily medications and information on medical appointments. The application can also be used as a medium to provide community health workers with discharge summaries. In collaboration with a local Aging in Place program, we have been conducting a study of the application and its effects on senior well-being. Through the study, we investigate conditions indicative of risks or trends in patient health, including questions relating to exercise, diet, mood, and sleep patterns.

Online Health and Wellness Information Consumption

Online Health

Users are rapidly leveraging the Internet as a viable source of health information. In this research, we study the health-seeking behavior of users on a national health and wellness-based knowledge sharing online platform. We begin by identifying the topical interests of users from different content consumption sources. Using these topical preferences, we explore information consumption and healthseeking behavior across three contextual dimensions: user-based demographic attributes, time-related features, and community-based socioeconomic factors. We then study how these context signals can be used to infer specific user health topic preferences. Our findings suggest that linking demographic features to user profiles is more effective in predicting health preferences than other features. Our work demonstrates the value of using contextual factors to characterize and understand the content consumption of users seeking health and wellness information online.



The NetHealth Study is exploring the extent to which healthy behaviors can be promoted through social networks. This is currently being conducted by using smartphones to gather information on people’s social networks and Fitbit activity trackers to gather information on people’s physical activity and sleep patterns. Over seven hundred Notre Dame students are currently enrolled in the study who entered as first-years in the 2015/16 academic year.



MomLink is a research project consisting of a web and mobile application that will help first-time moms access pregnancy-related educational resources and acquire timely and personalized information related to their pregnancy. The application will also allow them to communicate directly with their prenatal care coordination team, receive information, and track their progress.

Environment & Climate

The network structure and evolution over time provide interesting insights into the behavior of the Earth's environmental and climatic systems. For example, ocean climate indicators extracted from the networks have proven to be good predictors of climate variables over land. Currently, we are studying the dynamic behavior and stability of the network over time.

Patterns of Ship-Borne Species Spread

Invasive Species Flow

The spread of non-indigenous species (NIS) through the global shipping network (GSN) has enormous ecological and economic cost throughout the world. Previous attempts at quantifying NIS invasions have mostly taken "bottom-up" approaches that eventually require the use of multiple simplifying assumptions due to insufficiency and/or uncertainty of available data. By instead modeling implicit species exchanges via a graph abstraction that we refer to as the Species Flow Network (SFN), we pursue a different approach that exploits the power of network science methods in extracting knowledge from largely incomplete data.


Predicting STEM Student Retention


As providers of higher education begin to harness the power of big data analytics, one very fitting application for these new techniques is the prediction of student attrition. The ability to pinpoint students who might soon decide to drop out of a given academic program allows those in charge to not only understand the causes for this undesired outcome, but also provides room for the development of early intervention systems. While making such inferences based on academic performance data alone is certainly possible, we claim that in many cases there is no substantial correlation between how well a student performs and his or her decision to withdraw. To address this issue, we aim to derive measurements of engagement from students' electronic portfolios and use these features to augment the predictions of student attrition.

Identifying Students of Concern in The First Year of Studies Course

The First Year of Studies (FYS) is a required course offered in the flipped classroom format that undergraduate students entering the university need to take and pass in the first year. Thus, it is important to identify students who might fail the course or drop out of it so that the instructors can intervene and help them. Since the classroom is in the flipped format, students have to access the reading content and attempt homework quizzes online before attending the lecture. While pedagogical methods relying on grades exist to identify such students by the middle of the semester, with the power of big data we can now find better predictive markers by leveraging not only their grades but also their clickstream data, ePortfolio and homework submissions.

Predicting Student Dropout in MOOCs

In an effort to bring education to people without access to a teacher or the time to attend formal classes, Massive Open Online Courses (MOOCs) are being offered by universities on various MOOC platforms like edX. However, the problem of student attrition in MOOCs is very persistent across MOOCs with 90% or more students who end up dropping out of the course. In order to understand this phenomenon, we use machine learning techniques to study the behavior of students in discussion forums, patterns in video clickstream, their performance in homework assignments and exams, as well as the emotions of students through surveys.

Loading ...