3 Using Models to Interpret Data

What is this chapter about? It’s about taking data, possibly thousands of numbers, and finding a few measures (values) that help you make sense of the data and represent it effectively. The main tools you will use are the mean, the standard deviation, and Pivot Tables (an Excel feature; there are similar tools in other software, such as reshapeGUI in R). The mean turns out to be the simplest and most commonly used model of data. The standard deviation can be thought of as a measure for how closely this model fits the data (or equivalently, how appropriate the mean is in modeling the data). Thus, we have the two basic pieces of a model: the model itself (the mean) and a measure of how well the model fits (standard deviation). Another way to think about this process is that we are taking a huge amount of information (the original data) and compressing it, reducing it to fewer pieces of information that give us a sense of the entire data set. Of course, we lose some of the information in the process, but we gain efficiency and a way of communicating and making decisions that would be extremely difficult using only the data itself. In this sense, the mean is the simplest possible model we can produce: we take all of the numerical data, no matter how numerous, and reduce it to one number for each of the numerical variables in the data set. In order to evaluate the quality of this model for each variable, we then compute the standard deviation of that variable.

Section 3.1 of the chapter shows you how to use the mean as a model for the data, and how the standard deviation is a measure of how well this model represents the data. Section 3.2 of the chapter shows you how to reduce data that has several variables, some of which are categorical, to several means using Pivot Tables.

As a result of this chapter, students will learn

As a result of this chapter, students will be able to

√: What a mean is and how it can be used to model the average or typical data point
√: How to use the standard deviation as a tool for determining how well the mean represents the data
√: What pivot tables are and how they are useful

√: Compute means and standard deviations by hand and with spreadsheet tools
√: Make a Pivot Table that cross-sections your data in order to help you analyze it

Chapter 3Using Models to Interpret Data1

Chapter 3
Using Models to Interpret Data¹