Correlation and Regression Analysis – A Primer

Welcome back to Making Molehills out of Mountains University. For years data analytics have been my passion. I have spent years looking at human behavior and applying statistical analysis techniques to answer two primary business questions every CEO has, “Should I do X” and “If I do X what will happen?”  There is a third question they often ask, “I did X, what happened? It was not what I expected.” But that’s usually asked when something like New Coke flops, uh, I mean, doesn’t meet expectations.

My favorite tool, admitting my bias, is the mTab suite of analysis tools.  In the past ten years, mTab has become the standard in the automotive industry and has contributed, in my considerable professional opinion, to have a profound effect on the industry’s recovery.  After all, they’re now producing cars people are excited to buy.

Sorry, I digress. This is the 2nd class in Market Research Data Analysis 101. I teach in plain English, or as plain as possible considering the subject matter. In later classes we can do the math.  So, put away your smart phones, get out your tablets and learn something.

Today I introduce you to the lovely world of Correlation and Regression analysis which are two of the most commonly used techniques for determining the relationship between two quantitative variables.

Correlation Analysis

Assuming you’ve collected your data the first step is to create a scatter diagram.  Variable 1 is the X-axis and the other is the Y axis. The resulting diagram indicates the linear relationship between the two variables.  The closer they are to a straight line the stronger the relationship.  The linear relationship is defined as positive, negative or null and is expressed by a correlation coefficient or +1, -1, or 0.

A positive relationship means that a change in one variable has a positive effect (increase marketing budget = increase in sales). The converse is true for a negative relationship (increase in price = decrease in sales).

Coefficient = 0                              =+1                    Between 0 & -1

Seems straightforward. But, remember, we are not talking about causation here.  There may be a third variable that accounts for the relationship (e.g. Tax refund check came through at the time of increased marketing).

Regression Analysis

Now that you know there is a relationship between two variables what do you do with that?  As future high falutin analysts you’ll want to predict the Key Drivers and report them to your CEO.  She’ll want to know, “If I decrease price will I sell more product?”

Enter linear and non-linear regression.  Simply put, if a change in X (independent variable) equals a consistent change in Y (dependent variable), then the relationship is linear.  If the change in Y is inconsistent then the relationship is nonlinear. For Regression analysis there is an assumption of linearity.  IF the scatter diagram indicates a nonlinear relationship there are mathematical techniques that can be used to obtain linearity.

Assuming price and units sold is a linear relationship, using standard regression analysis techniques, the analyst should be able to predict the number of units sold at a particular price point.  This also assumes, for the sake of this exercise, that the relationship is positive and the correlation coefficient is +1 or close to +1.  The stronger the coefficient the better predictive quality of the data under regression.

I know.  I said, no math. But you should be able to handle this:

Y= a+bX

A and b are the intercept and slop  (unknown constants).

In this case, X = Price and Y = units sold.  As the equation suggests a change in Y will equal a change in X.

Careful!  If you write the equation backwards, X= c+dY then you might tell your CEO that price is affected by the number of people buying cars and not the other way around!

What? You say that if I sell more cars I can lower the price due to cost efficiencies in production?  Of course, that is true, but that does not change the reality that, without an external action, price does not change by itself as production increases. But quantity sold can change as price is changed without any additional action.

Conclusion

That’s it for today.  There is a whole lot more to study regarding correlation and regression but we’ll save that for another day.  Now that you know that correlation and regression are impressive tools for identifying relationships between variables and for determining the strength of that relationship, go get some data, create a scatter graph, do a little algebra and impress your boss how knowledgeable you are as an analyst.

Mind Your Measurement Scales in Market Research

Welcome to the Making Molehills out of Mountains University (MMoM U) Market Research Data Analysis 101 or MARDA 1 as we like to call it in the halls of academia.  Today we discuss the four different types of scales used in measuring behavior.  Open your books and let’s get started…

The four scales, in order of ascending power are:

  • Nominal
  • Ordinal
  • Interval and
  • Ratio

Nominal Scale

Nominal is derived from the Latin nominalis meaning “pertaining to names”.  But, seriously, who cares? That tells us nothing except how much academics love showing off.  The Nominal Scale is the lowest measurement and is used to categorize data without order.  For your market research data analysis exercise a typical nominal scale is derived from simple Yes/No questions.

How the nominal scale (and all these scales) is used statistically is for the next lecture.  For now, just know the behavior measured has no order and no distance between data points. It is simply “You like? Yes or no?”

Ordinal Scale

From the Latin ordinalis, meaning “showing order”… Enough of that.  An Ordinal Scale is simply a ranking.  Rate your preference from 1 to 5.  Careful!  There’s no distance measurement between each point.  A person may like sample A a lot, sample B a little, and C not at all and you would never know.  Here we have gross order only, learning that the subject likes A best, then B, then C.  Determining relative positional preference is a matter for the next scale.

Interval

Ah, the Interval Scale.  It’s the standard scale in market research data analysis.  Here is the 7 point scale from Dissatisfied to Satisfied, from Would Never Shop Again to Would Always Shop,  etc.  The key element in an Interval Scale is the assumption that data points are equidistant.  I realize savvy market analysts might say, “Hold on Professor. What about logarithmic metrics where the points are not equidistant?” To which I say, “Correct! but the distances are strictly defined depending on the metric used, so don’t get ahead of yourself. This is MARDA 101.”

For now, understand that with the Interval Scale, we can interpret the difference between orders of preference.  Now we can glean that Subject 1 Loves A, Somewhat Likes B and Sorta Kinda Doesn’t Like C.

Subject 2 Somewhat Likes A , Sorta Kinda Doesn’t Like B and Hates C.  Both subjects ranked the samples A, B, & C on an Ordinal Scale but for very different reasons as discovered by using the Interval Scale. Got it?  Good.

Moving on.

Ratio

Similar to the Interval Scale it’s not often used in social research.  Like Interval, it has equal units but it’s defining characteristic is the true zero point.  Ratio, at its simplest, is a measurement of length. Even though you cannot measure 0 length; a negative length is impossible, hence, the true zero point.

To sum up, I leave you with the the chart below, indicating various measures for each scale.


Difference
Direction of Difference
 Amount of Difference
Absolute Zero
 Nominal  X
 Ordinal  X  X
 Interval  X  X X
 Ratio  X  X X X




The Rise of Infographics in Presenting Data Analysis

In an earlier post we talked about the difference between graphics used for visualization of data points and graphics used for presentation.  We concluded that the point of an analyst’s effort when analyzing survey data was to communicate the results to busy decision makers in a format they could understand.

Enter The Infographic

Using information graphics to convey an idea or meaning has been around since the earliest cave paintings. Today, infographics are an essential part of the survey analysts tool box because they convey complex data in an easy to follow and visually appealing format.

From blog posts and web articles to glossy brochures and of course, data analysis presentation, infographics are a ubiquitous part of the information landscape. But why have they become so prevalent?

Infographics Are Easier Than Ever To Create

Modern computers and sophisticated software can easily render thousands, even millions, of data points into a visual representation, often with nothing more than a mouse click.  What used to take hours to create by hand (and yes, most graphics used to be done by hand, I’m talking 1980s and 90s, not the 1800s!) can now be done as a matter of course.

Decision Makers Have Faster Access To More Data than Ever Before

The trend toward greater use of infographics results in part from the speed at which information is available to decision makers. The Internet and the World Wide Web have transformed not only how we receive our information but how fast we have access to it.  Also, our expectations regarding how much information we are willing to absorb has changed.  When was the last time anyone picked up a 2000 page reference book and actually read it?

It’s Easier To Look At An Infographic Than It Is To Read About The Same Thing

Today data and information comes at us in packets.  This blog post is an excellent example.  It’s short, concise, and to the point.  The title and sub headings tell you most of what you want to know regarding the topic and they provide key information you might need to justify a decision to use more infographics in your next data analysis presentation.  The rest of these words are written to support the headings but the important information might’ve been rendered visually rather than in prose.  If it were, you might have spent half the time absorbing the information.

Now, if I could present this post using only an infographic…

The Difference Between Causation and Correlation Research

Correlational research can tell you who buys your products, but it may or may not tell you why. For example: Let’s say that you are trying to sell instant meals. If your research tells you that working mothers buy more instant meals, you cannot draw the conclusion that being a working mother causes people to purchase instant meals. The purchase of instant meals could be due to a third factor such as being too busy to cook, having extra money, or trying to control calorie intake.

More complex correlational research studies can help you narrow down the causation, but you still cannot draw a conclusion from them with absolute certainly. In fact, the more factors you add, the more confused you may become. Let’s say you discover that only single working mothers purchase instant meals and that married working mothers rely on their spouses to cook. Should you try to market instant meals to the spouses of working mothers, or could there be yet another factor involved that is still eluding you?

Knowing for sure

If you really want to know whether being a working mother causes people to purchase more instant meals, you need to conduct a controlled experiment. This means that you would need to have a large pool of subjects, randomly assign them to be working mothers, and monitor their purchasing habits. Unfortunately, we cannot ethically force people to be or not to be working mothers. This is a lifestyle choice, and an experiment where we let the subjects choose their own treatment would not be an experiment at all.

A good way to dig deeper in your research without dealing with ethical issues is to ask an open-ended question. It may take you more time to read through all of the responses, but with the right software you can make the process much faster. You can find out how often a certain keyword or phrase is used to determine why people purchase instant meals. If a common response is, “I am a working mother,” you can be reasonably certain that being a working mother causes people to purchase instant meals. While this is still technically a correlational research study, it can give you more useful information than a survey in which you only allow simple, discrete responses.