CSE 40437/60437 / Project

Course Project for Social Sensing

Project Ideas

The project in this course will be open ended. You will propose, carry out, and report upon a project in groups of up to three students. The following are some rough ideas for possible projects centered on social sensing. These are just some examples to help you get started. You can choose from these examples or come up with your own ideas!
  • Trust and Credibility Analysis. The online social media (e.g., Twitter, Flickr, Facebook, Foursquare, etc.) is designed as an open data-sharing platform for average people. This creates an ideal scenario for unreliable content from a large amount of unvetted human sources. Given the massive amount of twitter users (e.g., 284 million monthly active users) and tweets they make (e.g., half billion tweets per day), it is not simple to figure out the trustworthiness of sources and the credibility of their tweets. Therefore, it would be interesting and important to develop new trust and credibility analysis tools to obtain accurate and credible information from noisy and unfiltered social sensing data.

  • Disaster Report and Event Tracking. Due to the popularity and penetration of the online social media, people now use them to report the status of disasters and emergency events. For example, in the Boston Marathon Bombing event in April 2013, the first "report" of the bombing event actually came from a tweet made by a witness who was at the scene of the bombing. The timestamp of that particular tweet is the exact moment the first explosion happened. The rich set of social sensing data in the disaster scenarios offers us great opportunities to develop some real-time situation awareness tools that can efficiently detect and track the status of disasters in a reliable and timely fashion. Such tools could greatly assist the government to effectively dispatch rescue team, allocate important resources and get useful feedback from common citizens in the aftermath of a disaster.

  • Social Media Command Center for Business Intelligence. Large companies (e.g., Dell, Cisco, Wells Fargo) and airlines (e.g., Delta, Southwest) recently start to build a dedicated business intelligence team called social media command center (SMCC). In SMCC, the company's social media team monitor the online social media and engage social conversation around their brand and market. SMCC allows the real-time monitoring of trends regarding marketing efficiency, customer service and feedback, and risk management, making it easy for passing execs to gauge the social health of the brand at a glance. Therefore, it would be an interesting task to build your own version of the social media command center for your favorite brand or company using freely available online social media data.

  • A New Personalized Information Subscription Service. Much like Google News aggregates headlines from relatively reliable news sources (e.g., popular news website) to provide readers a personalized subscription service for news reading, it will be very interesting to develop a new information subscription service that leverages the rich set of real-time information embedded in online social media and explore the collective wisdoms of common individuals. One major challenge to provide this service is how to efficiently distill and organize information contributed by diversified and unreliable sources and summarize such information to an optimized degree that each subscriber feels comfortable to read and trust.

  • Real-time Data Analytics. Making sense of huge volumes of social sensing data streams coming from a complex and highly dynamic environment in a timely manner is a big challenge. It would be very interesting to build a new data analysis engine that efficiently organizes a firehose of streaming and heterogeneous data feeds and delivers reliable information with real-time guarantees. Some important problems need to be addressed in order to develop this real-time data analysis engine. For example, how to distribute data streams over clusters and compute results in a way that optimizes the estimation accuracy while minimizing the analysis time? How to develop an efficient distributed data analysis algorithm that outputs almost the same results as the centralized version but at a much faster speed?

  • Multi-genre Network Analysis. Comprehensive understandings of multi-genre networks (e.g., social network, information network, and physical network) play a critical role in the future social sensing applications. For example, a recent heavy traffic jam on a major southern California freeway detected by the deployed sensor network (i.e., physical network) co-occurred with unusual bursts of traffic on Twitter (i.e., social network) around the same location. The contents of tweets actually offered a very clear and first-time explanation of the traffic jam as a local protest demonstration for purposes of tax. It would be interesting to develop new techniques that will automatically unearth new information by exploring the data correlation across multi-genre networks and provide more effective solutions for decision makers.

  • Big Data Processing and Storage. In just one minute, more than 350,000 new tweets are made on Twitter, 700,000 status updates happen on Facebook, more than 3500 images are added on Flickr, and 100 hours of video are uploaded to YouTube. The online social media is creating a deluge of information that greatly exceeds the capability of our humans to consume it. This information deluge motivates an urgent need of big data related techniques to efficiently process and store the data from online social media in an efficient and effecitve way. It would be interesting to develop novel algorithms and schemes that leverage state-of-the-art distributed systems and cloud computing paradigms (e.g., Hadoop, Amazon EC2, etc) to tack the big data challenge in social sensing.

  • Detect and Reduce Redundant Information. Given the large amount of data made in social sensing applications, the amount of duplicate content and the demand for the redundant information reduction is increasing tremendously. For example, Twitter users can easily repeat the information from others by using a simple "Retweet" function. Alternatively, some users may rephrase what they have read/learned and make a "new" tweet in a slightly different form. Such redundant information puts a heavy burden on users of micro-blogging services when searching for new content. It would be interesting to develop some duplicate detection and redundacy reduction schemes for social sensing applications that can dramatically reduce various kinds of duplicates and diversify the search results.

  • Geo-location and Spatial Distribution Problem. Understanding the spatial-temporal distribution of the social sensing data is very important in many real world applications (e.g., disaster tracking, Geotagging, crowdsensing). However, many participants choose to disable geo-location features of their social sensing apps due to the sensitiveness of the location data (especially when it is coupled with the temporal information). For example, there are normally less than 1% of tweets that actually have the accurate geo-location information (i.e., GPS coordinates) embedded. Approximately 25% of users have listed a user location as granular as a city name, which also contain non-trivial amount of errors and ambiguities (e.g., confusion about the same city name in different states). Therefore, it would be very interesting to develop some location inference systems that can accurately estimate possible locations of the social sensing data by doing a deeper content analysis (e.g., text mining) in addition to some background knowledge available (e.g., mapping from specific words to given locations).

  • Assembling Information from Structured and Unstructured Data. Data generated in social sensing can be heterogeneous in modalities (i.e., both structured and unstructured.). For example, structured data could be the numerical readings from the sensors on the participant's smartphones. The unstructured data could be a piece of free text or an image that a user uploads to Twitter or Flickr describing the current situation in her/his surroundings. Different tools and techniques have been developed to process and analyze structured and unstructured data respectively. However, it remains a big challenge to explore the correlations across data types and assemble/fuse useful information from both structured and unstructured data. It would be interesting to develop new data processing and inference systems that are capable of assembling information from both structured and unstructured data for our social sensing applications.

  • Come Up with Your Own. The above examples are only ideas to get you thinking! You are encouraged to come up with your own idea, or modify one of those above.
  • Datasets Resource

  • Notre Dame Apollo Twitter Datasets (Please contact the instructor for username and password if you are interested in downloading some of these datasets)

  • ASU Social Computing Datasets

  • Stanford Large Network Datasets

  • ASU Foursquare Datasets

  • Chicago Smart City Datasets

  • Milestones

  • Friday January 30th, Noon. - Send the Project Title, Abstract and Member List as a single PDF file to the instructor and also upload a copy to your dropbox under /afs/nd.edu/coursesp.15/cse/cse40437.01/dropbox/YOURNAME/Abstract.

  • Week of February 2nd. - Sign up your slots on doodle to schedule a project kick-off meeting with the instructor to discuss your project ideas, the resources you might need, and the expected outcome of the project. The instructor will give you feedback to make sure that the project is of appropriate size and level of difficulty. If multiple groups propose substantially similar projects, we may ask you to adjust your work slightly.

  • Friday February 20th, Noon. - Send a two-page project proposal as a single PDF file to the instructor and also upload it to your dropbox under /afs/nd.edu/coursesp.15/cse/cse40437.01/dropbox/YOURNAME/Proposal. The proposal needs to describe an overview of the project (preferably with a diagram), a brief review of the state-of-the-art in the related field, a credible set of initial project results if available, a list of further proposed milestones, and a plan of action for the rest of the semester.

  • Week of March 2nd. - Sign up your slots on doodle to meet with the instructors for a mid-term project meeting to give a demo on what you have working so far. At this point, you should have installed (or have access to) the appropriate software and systems, have collected substantial amount of data you need, have developed the first version of your algorithm/system, and have generated some initial results. We will discuss the plan for finishing the remaining parts of the project in a timely way, and make any necessary corrections or adjustments.

  • Week of March 16th. - Each group is responsible for a short (12 minutes) mid-term project presentation in class. The presentation will allow the instructor and classmates to comment on the initial results and current state of the project and also give constructive feedback to the group members. Each project partner should speak for a portion of the time. Your talk should be accompanied by 8-10 carefully designed and edited slides.

  • Friday March 20th, Noon. - Send a four-page project mid-term report as a single PDF file to the instructor and also upload it to your dropbox /afs/nd.edu/coursesp.15/cse/cse40437.01/dropbox/YOURNAME/Midterm. The mid-term report should include a reasonable amount of preliminary results, a description of finished milestones, a discussion of encountered problems and relevant solutions, and any modifications to the plan (if there are) to finish the remaining tasks.

  • Week of April 27th. - Each group will give a 9-minute in-class final presentation (including a 2 min Q&A session) on your project. For details, please refer to Final Project Presentation.

  • Monday May 4th, Noon. - Turn in your final paper (both the pdf file and source file) and your code to the dropbox under /afs/nd.edu/coursesp.15/cse/cse40437.01/dropbox/YOURNAME/Final .

    The final project paper should follow the standard technical paper format. A typical paper of such format contains the following components: abstract, introduction, related work, problem statement, solution, evaluation, discussion/limitation, and conclusion. There is no hard length requirement. The paper should be long enough to explain all of the necessary details. That said, anything less than 8 pages is probably too short; anything longer than 12 pages is probably too long. All elements of the paper should be prepared with care and attention to proper English. Please follow the standard IEEE paper format. Here is the template: IEEE Latex or Word Template .

    Please send your final project paper to the instructor by email before the deadline. Also please turn in your paper (both the source files in latex or word) and the PDF file to your dropbox directory.

    All relevant code should also be turned into your dropbox directory, including source code, configuration files, scripts, etc. The code should be complete enough that the grader can build and run your work in the appropriate environment. Please also turn in a README file to include the instructions to run your code/tool. If there are important elements that cannot be turned in as code for whatever reason (e.g. too big or expensive to download) then turn in links, screenshots, or other similar evidence of the completed work.

  • 40437/60437 / Project