Research Projects

The primary research focus of the lab lies in the emerging area of Social Sensing and Cyber-Physical Systems in Social Spaces, where data are collected from human sources or devices on their behalf. Social sensing systems are one example of information distillation systems in current era of Big Data. We carreid out a set of projects to address several key challenges in social sensing and I beleived the theories, algorithms, frameworks and systems developed in these projects are useful in building future information distillabtion systems in general. An overview of social sensing can be found in IEEE Computer Perspective Paper.

Towards Reliable and Optimized Crowdsensing based Cyber-Physical Systems (C-CPS)

This project is motivated by the challenges in data and predictive analytics and in control for participatory science data collection and curation (known as crowdsensing) in cyber-physical systems (CPS). This project focuses on data-driven frameworks to address these challenges in CPS-enabled participatory science that builds on statistics, optimization, control, and natural language processing. Our framework tightly combines the underlying methods and techniques, especially focusing on physical sensors, mobility, and model-based approaches, to improve efficiency, effectiveness, and accountability. This project also closely integrates education and training with foundational research and public outreach that enhances interdisciplinary thinking about CPS systems, engages the public through participatory science, and broadens participation in science, technology, engineering, mathematics, and computer science.

Social Sensing

Scalable, Dynamic and Constraint-Aware Social Sensing System

While significant progress has been made to build reliable social sensing system, some important challenges have not been well addressed yet. First, existing social sensing systems did not fully solve the dynamic truth estimation problem where the ground truth of claims changes over time. Second, many current solutions are not scalable to large-scale social sensing events because of the centralized nature of their data analytics algorithms. Third, the transition of true values of measured variables is constrained by some physical rules that must be followed to ensure correct estimation. In this project, we developed a scalable streaming social sensing system with explicit considerations of the physical constraints on the measured variables to address the above challenges. We evaluated our framework through real world social sensing applications. The evaluation results show that our system is scalable and outperforms the state-of-the-art solutions in terms of both effectiveness and efficiency. The results have been published in IEEE ICDCS 2017 , IEEE BigData 2017, IEEE Transanction on Big Data.

Social Sensing

From Big Data to Small Data: Taming the two V's of Veracity and Variety

Data of questionable quality has led to significantly negative economic and social impacts on organizations, leading to overrun in costs, lost revenue, and decreased efficiencies. The issues on data reliability, credibility, and provenance has become even more daunting when dealing with the variety of data, especially data that are not directly collected by an organization, but from the third-party sources such as social media, data brokers, and crowdsourcing. To address such issues, this project aims to develop a Data Valuation Engine (DVE) that solves the critical problem of data reliability, credibility and provenance, and provides accountability and quality processes right from data acquisition. The DVE leverages and innovates techniques in estimation theory, data fusion and machine learning to fill a critical gap in data accountability and quality, thereby providing a transformative step in countering the ubiquitous data quality issues found in almost every application domain from business to environment to health to national security. The DVE will be integrated in the Hadoop ecosystem and will be agnostic to the data source, application or analytics, and provided as a hosted solution to the community. The results have been published in IEEE BigData 16 , ACM Recsys 16 , IEEE Transanction on Big Data.

Social Sensing

Location-based Social Sensor Profiling

While many social sensing studies focus on sensing and recovering the status of the physical world, this project investigates the problem of profiling the social sensors (i.e., humans). In particular, we study the problem of accurately inferring the localness and the home locations of people from the noisy and sparse social sensing data they contribute. In this study, we propose a new method to accurately infer the home locations of people by explicitly exploring the localness of people and the dependency between people based on their check-in behaviors under a rigorous analytical framework. We perform extensive experiments to evaluate the performance of our scheme and compared it to the state-of-the-art techniques using three real world data traces collected from Foursquare. The results showed the effectiveness of our scheme in accurately profiling the home locations of people. The results have been published in IEEE ASONAM 16 , IEEE BigData 17 , IEEE INFOCOM 17 , IEEE ASONAM 17 .

Social Sensing

Social-aware Interesting Place Finding

This project studies an interesting place finding problem in social sensing, in which the goal is to correctly identify the interesting places in a city (e.g., parks, museums, historic sites, scenic trails, etc.). Important challenges exist in solving this problem: (i) the interestingness of a place is not only related to the number of users who visit it, but also depends upon the travel experience of the visiting users; (ii) the user's social connections could directly affect their visiting behavior and the interestingness judgment of a given place. In this project, we develop a new Social-aware Interesting Place Finding framework that addresses the above challenges by explicitly incorporating both the user's travel experience and social relationship into a rigorous analytical framework. Our framework can find interesting places not typically identified by traditional travel websites (e.g., TripAdvisor, Expedia). We valid the effectiveness of our framework through real-world datasets collected from location-based social network services. The results have been published in IEEE Smart City 15 , IEEE DCoSS 16 , Elsevier KBS .

Social Sensing

Who to Choose: Source/Sensor Selection in Social Sensing

This project investigates a new problem of critical source selection in social sensing applications. The goal of this problem is to identify a subset of critical sources that can help effectively reduce the computational complexity of the information distillation problem and improve the accuracy of the analysis results. In this project, we propose a new scheme, Critical Source Selection (CSS), to find the critical set of sources by explicitly exploring both dependency and speak rate of sources. We evaluated the performance of our scheme and compared it to the state-of-the-art baselines using data traces collected from a real world social sensing application. The results showed that our scheme significantly outperforms the baselines by finding more truthful information at a higher speed. The results have been published in IEEE DCoSS 17 , Elsevier KBS .

Social Sensing

Truth Discovery in Social Sensing

This project solves a fundamental problem in information distillation in social sensing where data are collected from human sources or devices in their possession: how to ascertain the credibility of information and estimate reliability of sources, as the information sources are usually unvetted and potentially unreliable. We call this problem truth discovery. Current research in data mining and machine learning (e.g., fact-finding) solves similar problems with important limitations on analysis semantics and suboptimal solutions. In contrast, our research presented, for the first time, an optimal truth discovery framework and system that provides accurate and quantifiable conclusions on both information credibility and source reliability without prior knowledge on either. Our work provides a new generic foundation for distilling reliable and quantifiable information from unreliable sources (e.g., humans). The results have been published in Fusion 11 , ACM/IEEE IPSN 12 , ACM ToSN , IEEE SECON 15, ICWSM 16 , IEEE DCoSS 16, IEEE MASS 17 .

Social Sensing

Quality of Information (QoI) Assurance in Social Sensing

This project investigates another critical problem in social sensing: how to accurately assess the quality of the truth discovery results by quantifying estimation errors and providing confidence bounds. This guaranteed quality analysis is immensely important in any practical settings where errors have consequences. However, this is largely missing in current literature. We successfully derived the first performance bound that is able to accurately predict the estimation errors of the truth discovery results. Our work allows real world applications to assess the quality of data obtained from unreliable sources to a desired confidence level, in the absence of independent means to verify the data and in the absence of prior knowledge of reliability of sources. Our work was mentioned explicitly in the National Academies Press as a "good example of Army's cross-genre research" in 2013. The research results have been published in DMSN 11 , IEEE SECON 12 , IEEE JSAC .


Link Analysis across Multi-genre Networks

Social sensing data is generated through the complicated interactions of information, social and physical networks. The interdisciplinary network systems are so complex that link analysis across multi-genre networks is essential. However, link analysis taking into account the three networks altogether is rare in current research. In this project, we generalized the truth discovery framework to jointly analyze links across multi-genre networks and developed a new information distillation system, called Apollo. Apollo has been continuously tested through real world case studies using large-scale datasets collected from open source media and smart road applications. The results showed good correspondence between observations deemed correct by Apollo and ground truth, demonstrating the power of using link analysis across multi-genre networks for efficient information distillation. The results have been published in IEEE RTSS 13 , Journal of Real-time Systems .


Using Humans as Sensors: The Uncertain Data Provenance Challenge

The explosive growth in social network content suggests that the largest "sensor network" yet might be human . Extending the social sensing model, this project explores the prospect of utilizing social networks as sensor networks, which gives rise to an interesting reliable sensing problem. From a networked sensing standpoint, what makes this sensing problem formulation different is that, in the case of human participants, not only is the reliability of sources usually unknown but also the original data provenance may be uncertain. Individuals may report observations made by others as their own. The contribution of this project lies in developing a model that considers the impact of such information sharing on the analytical foundations of reliable sensing, and embed it into our tool Apollo that uses Twitter as a "sensor network" for observing events in the physical world. Evaluation, using Twitter-based case-studies, shows good correspondence between observations deemed correct by Apollo and ground truth. The results have been published in ACM/IEEE IPSN 14 , ACM/IEEE IPSN 16.


Provenance-assisted Classification in Social Networks

Signal feature extraction and classification are two common tasks in the signal processing literature. This project investigates the use of source identities as a common mechanism for enhancing the classification accuracy of social signals . We define social signals as outputs, such as microblog entries, geotags, or uploaded images, contributed by users in a social network. While the design of such classifiers is application-specific, social signals share in common one key property: they are augmented by the explicit identity of the source. This motivates investigating whether or not knowing the source of each signal allows the classification accuracy to be improved. We call it provenance-assisted classification. This project answers the above question affirmatively, demonstrating how source identities can improve classification accuracy, and derives confidence bounds to quantify the accuracy of results. Evaluation is performed in two real-world contexts: (i) fact-finding that classifies microblog entries into true and false, and (ii) language classification of tweets issued by a set of possibly multi-lingual speakers. The results show that provenance features significantly improve classification accuracy of social signals. This observation offers a general mechanism for enhancing classification results in social networks. The results of this work are going to appear in IEEE J-STSP 14 .


Tool and Demo

Our research work also generated a reliable information distillation tool called Apollo that is used to summarize the flow of important events that are fundamentally changing our world and lives (e.g., Egyptian uprising, Japanese nuclear disaster, Hurricane Sandy, Boston Marathon Explosion etc.). Apollo was demoed to very high-level army personnel (e.g., Mr. Gary Martin, the Executive Deputy to the Commanding General, Dr. Thomas Russell, the Director of Army Research Lab), as well as the United States Army Intelligence and Security Command (INSCOM). Apollo was selected as one of very few top showcases of the Network Science Collaborative Technology Alliance founded by the U.S. Army Research Laboratory (ARL) in 2011, 2012, and 2013. Now Apollo is now used by different branches at ARL.

Apollo-Inoformation Distillation Tool for Social (Human-Centric) Sensing