The following is a listing of demos of student projects done in the Fall 2023 Principles of Computing course. The students found their choice of data source from resources online and decided how they want to examine, analyze and visualize the data.
Project Number | Section | Project title |
---|---|---|
17 | 2 | Olympic Games and Medal Performance |
18 | 2 | Demographical Analysis of US Covid Deathsn |
19 | 2 | Olympic Athletes |
20 | 2 | Chicago Car Crash Data |
21 | 2 | Olympic medals |
22 | 2 | Olympic Athletes |
23 | 2 | Potential Educational Redlining in NYC |
24 | 2 | NYC Crime |
26 | 2 | Disney Movies Analysis |
27 | 2 | Regime-Type and Ward |
28 | 2 | Los Angeles Crime |
29 | 2 | CPIA gender equality rating and GDP |
30 | 2 | USA Real Estate Characteristics and Prices |
The purpose of our project was to uncover valuable insights and trends within the dynamic landscape of YouTube, catering to both existing Youtube channels and aspiring Youtubers. Our primary data source was a CSV file provided by Kaggle from 2023 that categorized the top 1,000 global YouTubers and their respective subscriber count, video views, upload frequencies, creation of platform year, country of origin, earnings, category of their channel, channel rank, and other comprehensive data. We cleaned and extracted data from this dataset due to the lack of insight some columns of data provided in answering our initial questions and for organization in our analysis. Our visualizations and consequential insights focus on the geographical makeup of these top 1,000 global YouTubers as well as their specific followings, activity on the platform, and other useful insights for our targeted audience. Some of our initial questions include: What are the most popular channel categories on YouTube? Where is the highest concentration of successful YouTubers when looking at a global map? What level of global and regional impact do influential YouTube creators have on the world and the region that they reside in? Are there geographical trends in terms of channel categories, video viewers, subscribers, and amount of Youtubers in specific geographical regions? In conclusion, our comprehensive analysis illustrates that successful YouTubers can emerge from any part of the world, but certain regions, most notably the United States and India, exhibit higher concentrations. Further, we could infer some relationships and trends between these content creators’ subscriber count, video view count, uploads, channel type and category, and so forth, as seen in all of our visualizations. Implied throughout our data, a successful formula for an aspiring top YouTuber is to be from the United States, produce content categorized as Entertainment and have a video upload frequency on the higher end to harness the most video views and overall subscribers.
"Noodles, Health, and Wealth" is an innovative data analytics project that explores the intriguing correlations between instant noodle consumption and key societal indicators: health, happiness, and Gross Domestic Product (GDP) across various countries. This project aims to uncover patterns and insights from global data, offering a unique perspective on how a simple, ubiquitous food item like instant noodles relates to broader economic and health trends
Our topic focused on Chicago public schools across elementary, middle, and high schools. Our data sources were two data sets compiled by the Chicago Public School system using information from 2011 and 2013. We focused on four main topics for our visualizations: Academic, Misconduct, School Levels, and Location. For example, we graphed the percent of students passing algebra by teacher quality. In another visualization, we showed how suspensions per 100 students changed based on the school's location. One of our major insights from our visualizations include that performance, attendance, and suspensions are affected by location, specifically if the school is north or south of the city. Another insight was that family involvement was a greater indicator of student performance than teacher quality.
Our data source dives into the pervasive global health challenge of obesity, aiming to estimate individuals' obesity levels in Mexico, Peru, and Colombia. The dataset is made up of 17 attributes and 2111 participants/inputs collected through a public online survey. The analysis centers on factors like physical activities, eating habits, and family history of being overweight. The dataset, labeled with the NObesity variable, allows classification into categories such as Insufficient Weight, Normal Weight, Overweight Level I and II, and Obesity Types I, II, and III. Notably, 77% of the data was synthetically generated, introducing potential biases that should be taken into account when viewing the data. Despite these considerations, the dataset emerges as an invaluable resource for unraveling the intricate interplay of factors contributing to obesity. Beyond the conventional metrics of physical activities and eating habits, the analysis explores nuanced elements such as modes of commuting, alcohol consumption, family weight history, and other health habits. This holistic approach sheds light on how these seemingly disparate factors collectively impact individuals' health, revealing that their cumulative effect may pose more significant health challenges than initially perceived. This nuanced understanding underscores the dataset's significance in uncovering hidden correlations and emphasizing the need for comprehensive health interventions that address the multifaceted nature of obesity in populations across Mexico, Peru, and Colombia.
Our topic was wine consumption, production, and market trends throughout time. We wanted to understand global consumption and production and then also create visualizations to understand how the sales of wine fluctuate throughout time. We display multiple choropleths to demonstrate the changes in consumption and production throughout each region in the world and throughout different time periods. A major insight from our project is that the majority of production and consumption of wine is in Europe, and the COVID-19 pandemic in 2020 caused a major drop in wine sales.
Our website uses Python visualizations to perform an industry analysis before and after the COVID-19 pandemic of the movie theater industry and the full service restaurant industry. We sourced our data from Statista, visualizing key aspects such as revenue over time, percent change year-to-year, market growth, consumer preferences, and more. Upon deciding which industry has struggled more to recover from the pandemic, we pulled for more data sets to find out why. Finally, we make suggestions as to what the industry can do to return to their pre-covid popularity.
This project deals with the demographics of different areas in the Chicagoland area and how it impacts the Chicago Public Schools. With our first data set we looked at poverty data and other demographics by zip code. Our second data set focused more on the different characteristics of Chicago Public Schools. This is important because Chicago is a very segmented city and there are a lot of problems with the CPS system.
This project offers an in-depth analysis of New York City's crime statistics, exploring aspects such as geographical distribution, gender disparity, and time-based trends. Utilizing data from New York's repositories (found in Kaggle), the study presents visualizations highlighting disparities across boroughs, with a notable focus on Brooklyn's high crime rates, and reveals significant gender dynamics in crime perpetration and victimization. The findings underscore socio-economic influences on crime patterns and advocate for tailored, community-specific strategies in crime prevention and law enforcement, emphasizing the role of continuous learning and inclusive dialogue in understanding and addressing urban crime.
The project data set gives information about Disney movies from 1937 to 2017. It makes available comparisons between movie qualities like genre, rating, gross income, and date released. Using this data set, we produced visualizations that reveal information about Disney movies, such as the most popular movie season, the distribution of maturity ratings, the highest-grossing genres, etc. This information is useful because it can inform movie producers of the different aspects of high-producing movies and influence decision making about what types of movies to produce and when to release them. Through data analysis, we have been able to identify trends and patterns in high-grossing movies.
Our team has completed a thorough analysis on recent levels of Chicago criminal activity and investigated a multitude of possible patterns shown. We used Chicago's city official page and found data in a file for our insights. We have investigated crime as it pertains to occurrences within different Chicago neighborhoods or locations, different times of the day and/or night, and the nature of crime occurring. We were able to use the insights gained from this analysis to reach our conclusions, and we hope that Chicago city officials can use our data and the associated takeaways to make impactful change, more efficiently allocate scarce resources, and improve the city’s safety.
Our project delves into the historical evolution of wage differentials based on educational attainment and gender spanning the years 1973 to 2022. Leveraging comprehensive data from reputable sources, the study meticulously dissects income disparities between individuals with high school degrees and bachelor's degrees, while further distinguishing between male and female demographics. Through a series of dynamic visualizations, the project aims to highlight trends, shifts, and persistent gaps in earnings over the decades. Notable insights from the analysis include a nuanced understanding of how educational choices correlate with income, the trajectory of gender-based wage disparities, and an examination of the relationship between wage changes and the unemployment rate. By presenting this information in an accessible and interactive format, the project seeks to contribute to informed discussions on policy-making, educational planning, and efforts toward fostering greater economic and gender equity in the workforce.
Chicago's government releases a yearly report card on its public schools detailing the student's accomplishments and performance of individual schools in Chicago. We used a CSV of this report card from the year 2011-2012. Some of the aspects they highlight are districts, safety scores, attendance across school distinctions, and family involvement scores. We learned about the discrepancies between certain schools scores in safety and family involvement and could look at the different areas and districts these schools fall in.
In our project, we used a CSV dataset of over 1500 movies spanning from about 1920 to almost 2000 in order to try to determine which factors were most influential in making a movie popular. We analyzed variables such as a movie's genre, length, release date, and level of experience of those working on it, but we found that more than anything, certain actors, actresses, and directors stood out as being the most likely indicators of a popular movie.
Our topic takes a deep dive into the different statistical categories that measure team performance in the NFL across the 2022 season. After compiling data for each of the 32 NFL teams across the 17 games that they played, we created visualizations to determine which statistical categories have great importance in team performance, and which ones had little impact on team performance. Some of the integral statistics that contribute to team success include passing yards, red zone efficiency, total touchdowns, and red zone chances. The statistics that we found to have little impact on team success were rushing yards, as well as the amount of takeaways compared to turnovers for each team.
Our project was about cereals and breakfasts. Our data sources came from a cereal csv published by Calvin University. Our breakfast foods came from a pdf of tables of breakfast nutritional analysis. Our age_gender dataset came from the USDA.
In our project, we chose to explore the correlation between educational attainment and the resulting effect on wages and employment rates. We pulled from data sources which included those factors, and additionally included variables such as race and gender.
In this project we wanted to see if there were any trends within Athletes in Olympic Competition to better understand and predict who will win. Using our Athlete data we drew insights on Age, Gender, and Country’s; we found many interesting insights when looking through the data but one area we did some additional research on is how the difference in an athletes home temperature and the temperature of cities where Olympics were held affected their performance and if that was an indicator for the winners of the Olympics.
Our topic examines the coronavirus death counts for the 6 midwestern states in 2020: Illinois, Indiana, Michigan, Minnesota, Ohio, Wisconsin. Within this region we examined the death counts by age group and ethnicity. In addition to this we also looked at the political leanings of the states to reason how different political parties reacted to the pandemic, as well as the actions they took to see how it affected the death counts. Overall, our project is aimed at policymakers so that they would have appropriate information in allocating resources to affected communities.
We are looking at the Olympic history of athletics, aquatics, and gymnastics events. The data we are using is many rows of athlete event occurrences in the Olympics from the 19th century until 2012. Our analysis is broken into two parts: a country analysis and an athlete analysis. We examine athletes sent to the Olympics, medals won (country and athlete level), and look at some physical attributes of athletes, broken down by gender and sport.
This project is focuses on analyzing car crash risk factors from Halloween 2022 in Chicago. The data is from the city of Chicago's official data portal, and was procured as JSON data through an API link. The project visualizes trends in vehicle type, vehicle license plate state, the year the vehicle was made, age of crash victims, sex of crash victims, and types of injuries sustained. The main takeaways from the project are that Toyotas and Chevrolets are the most "dangerous" cars to drive, and men in their late 20s are at the highest risk for being involved in a car crash. Additionally, it is not significantly more dangerous to be driving a car on Halloween than it is on another day.
The analysis of factors contributing to Olympic athletes' success is a captivating subject. We utilized three comprehensive datasets from Kaggle, encompassing extensive information about Olympic athletes and their nations over a substantial time frame. Our study particularly highlighted how a country's existing conditions, such as economic status, infrastructure, and sports culture, can significantly predict its athletes' medal rate. Focusing on both country-specific and athlete-specific data, we sought to understand the correlations influencing success, defined by the number of medals won. To convey our findings, we crafted multiple data visualizations, each employing different datasets. These visualizations aim to provide an insightful and practical understanding of the elements that drive Olympic achievements.
Our topic was exploring how do various underlooked factors like the percentage of agricultural land and birth rate, as well as factors one would likely consider such as GDP, influence Olympic success? This topic offers a unique perspective on the Olympics, going beyond just the sports statistics and delving into how a country's characteristics might influence its athletes' performances. Some visualizations include Total Medals vs Land Area / % of Agricultural Land scatterplot and a heatmap that shows the correlation of our key Olympics performance metric, "Average Rank" and "Total Medals" with the socioeconomic metrics including GDP, unemployment rate, etc. Major insights are that while wealth definitely can play a factor, it is not the sole source of success, and other factors like inudstrialization and economic diversification perform a role.
Our project investigates the possibility of modern redlining through education in New York City by cross-referencing all five boroughs' demographics and a high school's average SAT score. Our data sources come from studies conducted by NYC Open Data and NYU's Furman Center via CSV and Excel files. Our website displays visualizations that cover educational issues such as the average SAT score per borough and school enrollment sizes with their SAT scores. Additionally, we provided visualizations to provide context to the economic and sociological lenses of each borough such as race breakdown and poverty rates. From our research, we've discovered that there are traces of educational redlining due to the massive discrepancy in SAT scores between the Bronx and Staten Island. This directly correlates with the large discrepancy between median household incomes between the two boroughs. We've also noticed that low-academic performance and poorer boroughs were more likely to have higher black or POC populations
This project shows the different types of crimes in NYC and focuses on shootings, hate crimes, and car crashes. We analyze the trends over time and based on the five boroughs of NYC which include Manhattan, Brooklyn, The Bronx, Staten Island, and Queens. To create our visualizations, we extracted data from JSON data sources using APIs, cleaned/parsed them, and created DataFrames. Using DataFrames, we created plotly express visualizations to gain insights from our data. One prominent issue on crimes in NYC that we found were the amount of hate crimes aimed towards Jewish people. Another major insight from our visualizations were the amount of shootings in Brooklyn and The Bronx. Both of these boroughs had more instances of shootings than the other three boroughs.
Our project topic is the 10 leading causes of death in the United States, beginning in 1999 to 2017. The data is based on information from all resident deaths. This data also analyzes the information at the state level. Our main data source is a CSV file published by the United States Department of Health. The data source contains columns for Year, 113 Cause Name, Cause Name, State, Deaths, and Age-adjusted Death Rate. The data source has 10,689 rows of data, so it is a very sufficient source of information. We also added data sources for state populations and inactivity prevalences in each state. We visualized trends in leading causes of death across time, death rates per capita for all causes of death across all states, and unique factors across states that contribute to death rates. We learned that West Virginia had the highest death rate per capita out of all states, and we learned that Utah has very low deaths per capita due to the fact that their state population is very young with respect to age. Health services spending in each state did not seem to have any correlation to death rate per capita. Physical activity trends closely correlated with per capita deaths for Chronic Lower Respiratory Disease and heart disease.
Disney movies have been a mainstay of the entertainment industry for the past half-century. Very few other movie studios have had the same prolonged success or cultural impact as Disney. So what makes Disney movies so much more successful than others? Is there a secret to Disney’s success, or is it the culmination of trial and error coupled with a bit of luck? This data visualization project analyzes Disney movie performance from 1960 to 2016, investigating the categories of movie genre, MPAA rating, studio divisions, and relationships to the performance of the US economy, all in comparison to total gross income.
Our project aims to describe the impact and interplay of regime-type (e.g. democracy, autocracy) on a variety of unit-level variables. Additionally, we look at the cross sections between many of these other variables to identify correlations in the data that may be independent of our variable of interest. We use three datasets, both independently, and merged together. The Correlates of War (COW) dataset provides information about war initiation, war duration, casualties, and war outcomes. The National Military Capabilities (NMC) dataset provides year-by-year data for each country regarding the size and composition of a given state's military. The Polity dataset provides scores for how democratic a country is in a given year. By merging these three datasets together, we were able to compare material capabilities with democracy values and war outcomes. We use a variety of variables like military expenditures, democracy scores, military personnel countrs, GDP, population, etc. The visualizations we generated revealed Four, major insights. First, countries have become more democratic over time. Second, while the size of a country is correlated with military expenditure, the number of military personnel is not. Third, military personnel counts jump more rapidly during times of war as compared with military expenditures. Fourth, the size of a country's urban population (which is itself a proxy for development), is modestly correlated with democratization. By identifying trends in the data, we hope to provide clarity so as to better predict war-related outcomes.
Los Angeles is the second largest city in the United States, and it’s a globally renowned hub for entertainment, tourism, and business. However, much like other metropolises, the city is vulnerable to many different types of crime due to its sheer size and social disparities. The issue of crime in Los Angeles is a multifaceted topic that must be handled with great care and nuance in order to better understand the city’s problems and develop possible solutions to counter them. The approach we took to understand crime in Los Angeles was organizing and analyzing a large, 800,000+ row csv file that contains the city’s official crime reports data from 2020 to the present in order to generate insights and discover potential patterns. In our analysis, we aimed to answer the following exploratory questions: * What are the most common types of crimes among both violent and nonviolent offenses respectively? * What percentage of total crimes are violent? * What are the most common types of weapons used in violent crimes? (focusing on homicides, sexual assaults, assaults/batteries, and robberies) * What LAPD Districts have the highest crime rates for the following four types of crimes?: homicides, sexual assaults, assaults/batteries, and robberies * Which racial demographics are more likely to be victimized in violent crime incidents? (again, focusing on homicides, sexual assaults, assaults/batteries, and robberies) * What times of the year contain the highest crime incidence?
Our topic is the gender inequality index and gender development index of different countries in the world. For the gender inequality index (GII), this project specifically looks at the results in 2021 and analyzes the relationships between different indicators that contribute to the overall gender inequality index values. For the gender development index (GDI), this project looks at the changes in GDI over the years, compares the trend in changes with the changes in GDP, generates insights into some of the specific indicators, and also connects the results with government types. The data source comes from the United Nations Development Program and World Bank National Accounts. For GII, we visualize the GII values of different countries on the world map, the relationships between different indicators (to check for correlation), and compare the distribution and variations of GII (including indicators) among regions (continents). For GDI, we map the distribution of GDI values by government types, the trend of GDI changes and GDP changes over the years, as well as the countries with the most changes over the years. To conclude, the GII itself, as well as its indicators, have large variations among countries of the world as, in general, every indicator shows a general positive correlation with the GII values. The GDI, like the GII, has a large fluctuation between countries and time frames. It is interesting to investigate the various factors that can contribute to GDI. Looking at the increase in the GDP and GDI over time shows a decent positive relationship.
We explore the United States Real Estate Market with the hopes of bringing insight into how the market has functioned in the past and continues through the present and into the future. In doing so, we uploaded information from two separate data sets that gave us information about several economic factors, as well as house prices in different areas of the country. After analyzing the relationship between home prices and several economic factors like the federal funds rate, inflation, and the 30-year mortgage rate, we found correlations of varying degrees between every economic factor. Furthermore, when comparing home prices across U.S. states, we found that prices vary dramatically depending on the state. Using several different methods of visualizing the state data, it is clear that these prices also fluctuate considerable over time, as well.