Case 2 - Air Pollution and Mortality

Researchers at General Motors collected data on 60 U.S. Standard Metropolitan Statistical Areas (SMSA's) in a study of whether air pollution contributions to mortality. The data include variables measuring demographic characteristics of the cities, variables measuring climate characteristics, and variables recording the pollution potential of three different air pollutants: hydrocarbons (HC), nitrous oxide (NOx), and sulfur dioxide (SO2).

The Task

Use the data and multiple regression to produce a thoughtful analysis of the question of the extent of the influence, if any, of air pollution on mortality in SMSA's in the United States. The data, described below, is available on the web in Excel at the link Air Pollution and Mortality Case Data. It allows the inclusion of other likely predictors of mortality, so the regressions can "account for" or "control for" their effects while assessing the influence of air pollution components.

Write up your analyses and results as a memo containing:

  • a statement of the question and a quick overview summary of your findings
  • a summary of your analyses of important and relevant relations in the data, including a few scatterplots
  • Based on this sample report the average level of air pollutants (HC, NOx, SO2) in the US with 95% confidence interval.
  • summaries of your most revealing simple and multiple regression analyses, explaining what the numbers mean for the questions you are investigating
  • your conclusions on the question: Does the study suggest air pollution contributes to mortality?
  • any observations you have made along the way, if any, about interesting features of the data or interesting subtleties in the use of multiple regression revealed by this investigation
  • appendices containing summaries of supportive statistical detail that you think is better placed in an appendix rather than the body of the memo.


What is important for you is the mortality ratio that is Mortality/population. You can make a new column that calculates this ratio. Then try to see if other variables correlate with this variable or not. Variables like (HCPot, NOxPot, Rain, Income per capita –that is income divided by population, etc). You can find about these issues by drawing a simple scatter plot.


Then follow systematic investigations. Report the average level of hydrocarbons (HC), nitrous oxide (NOx), and sulfur dioxide (SO2) in the US based on this sample. Don’t forget to report your 95% confidence interval.


Finally conduct simple and multiple regression analysis, where Mortality/population is your dependent variable. And a few other variables (or one when you are doing a simple regression) are you independent variables. Make sure all of the main three variables that we are interested to analyze will be included in your multiple regression analysis (i.e., HC, NOx, SO2).


[If you have used your 5 day bonus before, or you have not submitted your first case, you should submit the case on time!]

Source: The Data and Story Library of Carnegie Melon University

Datafile Name: SMSA

Datafile Subjects: Environment

Story Names: Air Pollution and Mortality

Reference: U.S. Department of Labor Statistics

Authorization: free use

Description: Properties of 60 Standard Metropolitan Statistical Areas (a standard Census Bureau designation of the region around a city) in the United States, collected from a variety of sources.

The data include information on the social and economic conditions in these areas, on their climate, and some indices of air pollution potentials.

Number of cases: 59 [one of the 60 deleted because of missing data]

Data definitions:




City name


Mean January temperature (degrees Fahrenheit)


Mean July temperature (degrees Fahrenheit)


Relative Humidity


Annual rainfall (inches)


Age adjusted mortality


Median education


Population density


Percentage of non whites


Percentage of white collar workers




Population per household


Median income


Hydrocarbon pollution potential


Nitrous Oxide pollution potential


Sulfur Dioxide pollution potential