Profiling Your Data

One of the most important questions a detective asks about those involved in a criminal investigation is ”What did the suspect look like?” Without a physical description, detectives will have difficulty finding the suspect. Likewise, they ask about the suspect’s habits and personality. Eventually, they build a profile of the suspect. Such profiles describe the suspect physically and psychologically. They are based on statistical analyses of criminals and are extremely helpful in locating the suspect before more crimes can be committed. In order for you to study data from a business setting, you will also need to develop a profile of the data. We have begun this in chapter three with a discussion of central tendency. In chapter four we described the way the data is spread out using various tools. Along the way we got a blurry picture of the data, the boxplot. Now it’s time to sharpen the picture and get more detail. The best tool for this is called a histogram. It will help answer the question ”What does your data look like?”

A histogram is basically a graph that breaks the observations of a single variable into intervals called bins. By counting the number of observations in each bin we can generate a frequency table of the data which is then turned into a type of bar chart, with one bar for each bin and the height of each bar indicating the number of observations it contains. Usually histograms have eight to twelve bins. This means that we get a more detailed picture of the data that from a boxplot. With each step, we get more information about the data to help us make decisions.


Representation	What it is	What it tells you

Raw Data	Many observations, lots of information	Hard to make sense out of

⇓

Averages	Single number (mean, median, or mode)	Tells what is ”typical”

⇓

Boxplot	Seven pieces of information (min, Q1, median, mean, Q3, max, outliers)	Shows where the data is bunched together

⇓

Histogram	Ten to fourteen pieces of information usually (min, bin width, frequencies)	More detailed profile of the data

Most histograms can be classified into one of five types: uniform, symmetric, bimodal, positively skewed, or negatively skewed. Each type has certain characteristics that make it easy to recognize. Being able to classify the data as one of these types helps you analyze the data in much the same way that a good profile of a suspect tells the detectives a lot about how to catch him or her. In this section, you will learn to recognize each of these classic histograms and will learn what each one tells you about the data. As you learn how to make, read, and interpret histograms, keep in mind that real data will never exactly look like any of the ”perfect examples”. Many times you will be required to make a judgment call as to which type of distribution the data fits.

Another important detail about histograms to remember: depending on what bins you use to make the histogram, the data may look different. It’s a good idea to look at the data in several ways before drawing any conclusions.

5.2 Profiling Your Data