Social Sensing Lab at the University of Notre Dame


Dong's Photo

Dr. Dong Wang

Assistant Professor
Department of Computer Science and Engineering

Interdisciplinary Center for Network Science and Applications (iCeNSA)
University of Notre Dame
Email: dwang5 at nd dot edu

Group Members

Chao Huang (Ph.D student, started in Fall 2014)
Jermaine Marshall (Ph.D student, started in Fall 2015)
Daniel Zhang (Ph.D student, started in Fall 2016)
Benjamin Cunning (Undergraduate Student)
Brian Mann (Undergraduate Student)
Steven Mike (Undergraduate Student)

Research Theme

The advent of online social media (e.g., Twitter and Flickr), the ubiquity of wireless communication capabilities (e.g., 4G and WiFi), and the proliferation of a wide variety of sensors in the possession of common individuals (e.g., smartphones) allow humans to create a deluge of unfiltered, unstructured, and unvetted data about their physical environment. This opens up unprecedented challenges and opportunities in the field of social sensing, where the goal is to distill accurate and credible information from social sources (e.g., humans) and devices in their possession that accurately describes the state of the physical world. The problem requires multi-disciplinary solutions that combine data mining, statistics, network science and cyber physical computing. My research addresses the aforementioned needs by building theories, techniques and tools for accurately extracting high quality information from data generated with humans in the loop, and for reconstructing the correct "state of the world" both physical and social. I believe my research can lead to the next generation of information distillation services, where predictable, reliable, and timely answers are found from the huge amount of real-time and heterogeneous data feeds, empowering humans to better understand, utilize and make sound decisions from such data.

Research Projects

The primary research focus of the lab lies in the emerging area of Social Sensing and Cyber-Physical Systems in Social Spaces, where data are collected from human sources or devices on their behalf. Social sensing systems are one example of information distillation systems in current era of Big Data. I carreid out a set of projects to address several key challenges in social sensing and I beleived the theories, algorithms, frameworks and systems developed in these projects are useful in building future information distillabtion systems in general.

Social Sensing

From Big Data to Small Data: Taming the two V's of Veracity and Variety

Data of questionable quality has led to significantly negative economic and social impacts on organizations, leading to overrun in costs, lost revenue, and decreased efficiencies. The issues on data reliability, credibility, and provenance has become even more daunting when dealing with the variety of data, especially data that are not directly collected by an organization, but from the third-party sources such as social media, data brokers, and crowdsourcing. To address such issues, this project aims to develop a Data Valuation Engine (DVE) that solves the critical problem of data reliability, credibility and provenance, and provides accountability and quality processes right from data acquisition. The DVE leverages and innovates techniques in estimation theory, data fusion and machine learning to fill a critical gap in data accountability and quality, thereby providing a transformative step in countering the ubiquitous data quality issues found in almost every application domain from business to environment to health to national security. The DVE will be integrated in the Hadoop ecosystem and will be agnostic to the data source, application or analytics, and provided as a hosted solution to the community. The results have been published in IEEE BigData 16 , ACM Recsys 16 , IEEE Transanction on Big Data.

Social Sensing

Social-aware Interesting Place Finding

This project studies an interesting place finding problem in social sensing, in which the goal is to correctly identify the interesting places in a city (e.g., parks, museums, historic sites, scenic trails, etc.). Important challenges exist in solving this problem: (i) the interestingness of a place is not only related to the number of users who visit it, but also depends upon the travel experience of the visiting users; (ii) the user's social connections could directly affect their visiting behavior and the interestingness judgment of a given place. In this project, we develop a new Social-aware Interesting Place Finding framework that addresses the above challenges by explicitly incorporating both the user's travel experience and social relationship into a rigorous analytical framework. Our framework can find interesting places not typically identified by traditional travel websites (e.g., TripAdvisor, Expedia). We valid the effectiveness of our framework through real-world datasets collected from location-based social network services. The results have been published in IEEE Smart City 15 , IEEE DCoSS 16 , Elsevier KBS .

Social Sensing

Truth Discovery in Social Sensing

This project solves a fundamental problem in information distillation in social sensing where data are collected from human sources or devices in their possession: how to ascertain the credibility of information and estimate reliability of sources, as the information sources are usually unvetted and potentially unreliable. We call this problem truth discovery. Current research in data mining and machine learning (e.g., fact-finding) solves similar problems with important limitations on analysis semantics and suboptimal solutions. In contrast, our research presented, for the first time, an optimal truth discovery framework and system that provides accurate and quantifiable conclusions on both information credibility and source reliability without prior knowledge on either. Our work provides a new generic foundation for distilling reliable and quantifiable information from unreliable sources (e.g., humans). The results have been published in Fusion 11 , ACM/IEEE IPSN 12 , ACM ToSN , IEEE SECON 15, ACM/IEEE IPSN 16 , IEEE DCoSS 16.


Quality of Information (QoI) Assurance in Social Sensing

This project investigates another critical problem in social sensing: how to accurately assess the quality of the truth discovery results by quantifying estimation errors and providing confidence bounds. This guaranteed quality analysis is immensely important in any practical settings where errors have consequences. However, this is largely missing in current literature. We successfully derived the first performance bound that is able to accurately predict the estimation errors of the truth discovery results. Our work allows real world applications to assess the quality of data obtained from unreliable sources to a desired confidence level, in the absence of independent means to verify the data and in the absence of prior knowledge of reliability of sources. Our work was mentioned explicitly in the National Academies Press as a "good example of Army's cross-genre research" in 2013. The research results have been published in DMSN 11 , IEEE SECON 12 , IEEE JSAC .


Link Analysis across Multi-genre Networks

Social sensing data is generated through the complicated interactions of information, social and physical networks. The interdisciplinary network systems are so complex that link analysis across multi-genre networks is essential. However, link analysis taking into account the three networks altogether is rare in current research. In this project, we generalized the truth discovery framework to jointly analyze links across multi-genre networks and developed a new information distillation system, called Apollo. Apollo has been continuously tested through real world case studies using large-scale datasets collected from open source media and smart road applications. The results showed good correspondence between observations deemed correct by Apollo and ground truth, demonstrating the power of using link analysis across multi-genre networks for efficient information distillation. The results have been published in IEEE RTSS 13 .


Real-Time Information Distillation from Streaming Data

Social sensing data usually come at such large volume and high speed (e.g., more than 200k tweets are uploaded to Twitter every minute) that they must be processed in real-time in order to maximize their value. However, the truth discovery studies in current research are mostly batch algorithms that generally cannot scale with the streaming data or do not exploit all data available. In this project, we developed the first on-line truth estimation approach to determine the quality of information and the reliability of sources in real-time for social sensing applications. The results demonstrated that our approach was able to analyze the data at a speed 10-100 times faster than the state-of-arts while keeping the estimation accuracy approximately the same. The results have been published in ICDCS 13 .


Using Humans as Sensors: The Uncertain Data Provenance Challenge

The explosive growth in social network content suggests that the largest "sensor network" yet might be human . Extending the social sensing model, this project explores the prospect of utilizing social networks as sensor networks, which gives rise to an interesting reliable sensing problem. From a networked sensing standpoint, what makes this sensing problem formulation different is that, in the case of human participants, not only is the reliability of sources usually unknown but also the original data provenance may be uncertain. Individuals may report observations made by others as their own. The contribution of this project lies in developing a model that considers the impact of such information sharing on the analytical foundations of reliable sensing, and embed it into our tool Apollo that uses Twitter as a "sensor network" for observing events in the physical world. Evaluation, using Twitter-based case-studies, shows good correspondence between observations deemed correct by Apollo and ground truth. The results have been published in ACM/IEEE IPSN 14 , ACM/IEEE IPSN 16.


Provenance-assisted Classification in Social Networks

Signal feature extraction and classification are two common tasks in the signal processing literature. This project investigates the use of source identities as a common mechanism for enhancing the classification accuracy of social signals . We define social signals as outputs, such as microblog entries, geotags, or uploaded images, contributed by users in a social network. While the design of such classifiers is application-specific, social signals share in common one key property: they are augmented by the explicit identity of the source. This motivates investigating whether or not knowing the source of each signal allows the classification accuracy to be improved. We call it provenance-assisted classification. This project answers the above question affirmatively, demonstrating how source identities can improve classification accuracy, and derives confidence bounds to quantify the accuracy of results. Evaluation is performed in two real-world contexts: (i) fact-finding that classifies microblog entries into true and false, and (ii) language classification of tweets issued by a set of possibly multi-lingual speakers. The results show that provenance features significantly improve classification accuracy of social signals. This observation offers a general mechanism for enhancing classification results in social networks. The results of this work are going to appear in IEEE J-STSP 14 .


Dong Wang, Tarek Abdelzaher, and Lance Kaplan. "Social Sensing: Building Reliable Systems on Unreliable Data," 1st Edition, Elsevier, 2015.

Book Chapter

Referred Papers

Paper Code: [J]: Journal paper; [C]: Conference paper; [W]: Workshop paper;

Note: In Computer Science, conference publications are as competitive as journals.

Tool and Demo

Our research work also generated a reliable information distillation tool called Apollo that is used to summarize the flow of important events that are fundamentally changing our world and lives (e.g., Egyptian uprising, Japanese nuclear disaster, Hurricane Sandy, Boston Marathon Explosion etc.). Apollo was demoed to very high-level army personnel (e.g., Mr. Gary Martin, the Executive Deputy to the Commanding General, Dr. Thomas Russell, the Director of Army Research Lab), as well as the United States Army Intelligence and Security Command (INSCOM). Apollo was selected as one of very few top showcases of the Network Science Collaborative Technology Alliance founded by the U.S. Army Research Laboratory (ARL) in 2011, 2012, and 2013. Now Apollo is now used by different branches at ARL.

Apollo-Inoformation Distillation Tool for Social (Human-Centric) Sensing