Correlation and Regression Analysis – A Primer

Welcome back to Making Molehills out of Mountains University. For years data analytics have been my passion. I have spent years looking at human behavior and applying statistical analysis techniques to answer two primary business questions every CEO has, “Should I do X” and “If I do X what will happen?”  There is a third question they often ask, “I did X, what happened? It was not what I expected.” But that’s usually asked when something like New Coke flops, uh, I mean, doesn’t meet expectations.

My favorite tool, admitting my bias, is the mTab suite of analysis tools.  In the past ten years, mTab has become the standard in the automotive industry and has contributed, in my considerable professional opinion, to have a profound effect on the industry’s recovery.  After all, they’re now producing cars people are excited to buy.

Sorry, I digress. This is the 2nd class in Market Research Data Analysis 101. I teach in plain English, or as plain as possible considering the subject matter. In later classes we can do the math.  So, put away your smart phones, get out your tablets and learn something.

Today I introduce you to the lovely world of Correlation and Regression analysis which are two of the most commonly used techniques for determining the relationship between two quantitative variables.

Correlation Analysis

Assuming you’ve collected your data the first step is to create a scatter diagram.  Variable 1 is the X-axis and the other is the Y axis. The resulting diagram indicates the linear relationship between the two variables.  The closer they are to a straight line the stronger the relationship.  The linear relationship is defined as positive, negative or null and is expressed by a correlation coefficient or +1, -1, or 0.

A positive relationship means that a change in one variable has a positive effect (increase marketing budget = increase in sales). The converse is true for a negative relationship (increase in price = decrease in sales).

Coefficient = 0                              =+1                    Between 0 & -1

Seems straightforward. But, remember, we are not talking about causation here.  There may be a third variable that accounts for the relationship (e.g. Tax refund check came through at the time of increased marketing).

Regression Analysis

Now that you know there is a relationship between two variables what do you do with that?  As future high falutin analysts you’ll want to predict the Key Drivers and report them to your CEO.  She’ll want to know, “If I decrease price will I sell more product?”

Enter linear and non-linear regression.  Simply put, if a change in X (independent variable) equals a consistent change in Y (dependent variable), then the relationship is linear.  If the change in Y is inconsistent then the relationship is nonlinear. For Regression analysis there is an assumption of linearity.  IF the scatter diagram indicates a nonlinear relationship there are mathematical techniques that can be used to obtain linearity.

Assuming price and units sold is a linear relationship, using standard regression analysis techniques, the analyst should be able to predict the number of units sold at a particular price point.  This also assumes, for the sake of this exercise, that the relationship is positive and the correlation coefficient is +1 or close to +1.  The stronger the coefficient the better predictive quality of the data under regression.

I know.  I said, no math. But you should be able to handle this:

Y= a+bX

A and b are the intercept and slop  (unknown constants).

In this case, X = Price and Y = units sold.  As the equation suggests a change in Y will equal a change in X.

Careful!  If you write the equation backwards, X= c+dY then you might tell your CEO that price is affected by the number of people buying cars and not the other way around!

What? You say that if I sell more cars I can lower the price due to cost efficiencies in production?  Of course, that is true, but that does not change the reality that, without an external action, price does not change by itself as production increases. But quantity sold can change as price is changed without any additional action.


That’s it for today.  There is a whole lot more to study regarding correlation and regression but we’ll save that for another day.  Now that you know that correlation and regression are impressive tools for identifying relationships between variables and for determining the strength of that relationship, go get some data, create a scatter graph, do a little algebra and impress your boss how knowledgeable you are as an analyst.

Leave a Reply

Your email address will not be published. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>