Biography

Hey there! My name is Bryan Relampagos, and I'm a senior Computer Science major at the University of San Francisco. I grew up in San Jose, California, but I currently live in San Francisco. If you'd like to get in touch with me, shoot me an email: barelampagos@dons.usfca.edu

Original Dataset Information

For this visualization, I am using the Food Inspections - LIVES Standard dataset, which was found on SF OpenData. The dataset itself includes 52663 rows, one row per Health violation found at a restaurant. Many restaurants span a couple rows, as they may have several health violations. There are multiple columns, but the ones of particular interest are:

  1. business_name: Name of the Business
  2. business_address: Street Address of the business
  3. business_postal_code: Business postal code
  4. business_location: Latitude and Longitude of the business in the format - (Lat, Long)
  5. inspection_date: Date and time a specific restaurant's inspection was conducted
  6. inspection_score: Overall score of the inspection on a scale from 0-100 (0-70 Poor, 71-85 Needs Improvement, 86-90 Adequate, 91-100 Good)
  7. violation_description: Short description of restaurant violation.
  8. risk_category: Scale of risk from Low Risk to High Risk

The other columns are: business_id, business_city, business_state, business_latitude, business_longitude, business_phone_number, inspection_id, inspection_type, and violation_id.

Data Processing

For this dataset, I did all my data preprocessing all in d3. In terms of filtering, I filtered out any of the data entries that:

  1. Did not have a valid zip code (e.g 00000 and 941033148)
  2. Did not have values for coordinates
  3. Had no health safety score assigned to it
  4. Contained schools (I wanted the emphasis to be on the restaurants)
For each restaurant in this dataset, I also took the average of each score that was listed in this dataset, and assigned the restaurant an Operating Condition Category based on this score. This criteria was taken from the San Francisco Department of Public Health, and that can be found here.

In addition, there were specific things that needed to be done for each visualization. For the maps (symbol and choropleth), I decided to divide the map of San Francisco by the zip code regions. I was able to download a Shapefile containing all of the coordinates/boundaries for these regions at SF OpenData. Unfortuantely, this data was encoded in an ESRI format, and in order for it to be useful for my visualization, I had to translate these coordinates to a EPSG:4326 encoding. I used QGIS, an open source geographic software to encode this Shapefile to EPSG:4326, and then I used mapshaper to convert this Shapefile into a usable GeoJSON format. For the bar chart, I filtered out any of the zipcodes that had less than 10 restaurants located within them.

Motivation

My motivation for choosing to visualize this dataset comes from the fact that many restaurants don't necessarily provide the details about their health safety scores. According to the San Francisco Department of Public Health, food establishments are required to post their inspection reports in a clearly visible place such that the general public is able to easily see it. However, not all restaurants abide by this rule, so my hope is that this visualization will make the every day person more aware of the scores. If more people are aware of restaurants health scores, hopefully they will somewhat boycott those businesses, which could lead to the business stepping up their game on their health safety regulations.

Visualizations/Findings

The visualizations can be accessed either through the navigation bar at the top of this page, or by clicking the appropriate images on the slider. Or, you can click the images below to see each visualization.

Symbol Map

For the symbol map, it's primary intention is to provide the user an easy way to visually see the restaurants in the dataset in terms of both location and average restaurant score. One immediate take away from this visualization is that a majority of restaurants fall into the "Good" category, as well as a very small amount of restaurants fall into the "Poor" category. This means that most San Francisco restaurants are abiding by the health standards put out by the SF Dept. of Public Health. Another interesting thing to see is that there is a large clump of "Needs Improvement" and "Poor" restaurants that can be seen in the downtown area, particularly the 94102, 94108, 94109, and 94133 areas. These areas are located around the Tenderloin (94102), Chinatown (94108), Nob Hill (94109), and North Beach/Chinatown (94133).

Stacked Bar Chart

For the bar chart, it's intention is to provide the user an easy way to visualize the amounts of restaurants in each category that are located in each zip code. Similar to the symbol map, we can clearly see that a majority of restaurants do fall into the "Good" category, and very few fall into the "Poor" category. While a majority of zip codes have an larger percentage of "Good" restaurants, some have a somewhat skewed distribution of restaurants. For example, 94133 has 151 "Needs Improvement" and "Good" restaurants, while there are 72 "Adequate" restaurants. Another interesting zip code worth taking a look at is the 94122 area, where there are actually more "Needs Improvement" restaurants than "Good" ones.

Choropleth Map

For the choropleth map, it's intention is similar to the stacked bar chart of showing the restaurant distribution, but it allows the user to get a better sense of where these zip codes are geographically in San Francisco. Some things worth noting about this visualization is that a majority of "Good" Restaurants can be found in the 94103 (South of Market), 94107 (Potrero Hill), and 94110 (Inner Mission/Bernal Heights) areas. This makes sense, as a majority of restaurants can be found in the downtown/SoMa area. On the other side of things, the 94102 or Tenderloin area has the most "Poor" restaurants, which can be somewhat, but not completely, attributed to the seedy reputation of this neighborhood.