What is this chapter about? It’s about taking data, possibly thousands of numbers, and finding a few measures (values) that help you make sense of the data and represent it effectively. The main tools you will use are the mean, the standard deviation, and Pivot Tables (an Excel feature; there are similar tools in other software, such as reshapeGUI in R). The mean turns out to be the simplest and most commonly used model of data. The standard deviation can be thought of as a measure for how closely this model fits the data (or equivalently, how appropriate the mean is in modeling the data). Thus, we have the two basic pieces of a model: the model itself (the mean) and a measure of how well the model fits (standard deviation). Another way to think about this process is that we are taking a huge amount of information (the original data) and compressing it, reducing it to fewer pieces of information that give us a sense of the entire data set. Of course, we lose some of the information in the process, but we gain efficiency and a way of communicating and making decisions that would be extremely difficult using only the data itself. In this sense, the mean is the simplest possible model we can produce: we take all of the numerical data, no matter how numerous, and reduce it to one number for each of the numerical variables in the data set. In order to evaluate the quality of this model for each variable, we then compute the standard deviation of that variable.
Section 3.1 of the chapter shows you how to use the mean as a model for the data, and how the standard deviation is a measure of how well this model represents the data. Section 3.2 of the chapter shows you how to reduce data that has several variables, some of which are categorical, to several means using Pivot Tables.
As a result of this chapter, students will learn | As a result of this chapter, students will be able to |
|
|