Big Data: Correlation Is Not Causation

An interesting question came up recently on Simply Statistics. I'll confess that I'm not a regular reader of the site, but the post looked at the way that people are being faced with data more and more. In effect, even laypeople (those who aren't statisticians) are forced to interpret and make decisions based on vast amounts of information. The site where I found this discussion came from the point of view of those already in the know, so talk leaned more towards educating others and honing in on the craft they already had. I, however, fall in the camp of those individuals trying to figure out all this new information, and I find myself in the position of trying to explain data points to others who are not trained in the necessary sciences to really dig in to the information.

So the question becomes, are we understanding the information correctly?

I work in the online marketing space, so I'm constantly looking at site analytics reports. Clients want to know why fewer customers clicked through to their sites last week as opposed to the week before. And sometimes the answer is quite clear. (You had an amazing blog post that captured people's attention!) And other times the answer is not clear at all. (Nothing changed. So what happened?)

Especially difficult to answer are those questions of "Why does that website rank #1?"

Now is a good time to point out that we can easily tell aspects of a particular phenomenon without actually knowing the full reason why something has happened. Case in point: I can tell you that the #1 site has 100 incoming links or that it has great content or that the design is amazing and that it loads quickly. Are any one of those characteristics or all of them together could be the reason why it ranks #1. Who can tell merely from observation?

Big Data says Jay-Z fans love Elf
Big Data says Jay-Z fans love Elf

Let's take another example, TNW just published an article about a media study that revealed that Jay-Z fans were likely to watch Toy Story, Step Brothers, or Elf. Does that mean that Jay-Z fans watch those a Pixar flick and two Will Ferrell hits because they like Jay-Z? Not necessarily. It's simply information that analysis has turned up. Does that make the information useless since we can't understand the reasoning behind it? Not at all! If I am responsible for advertising any of those three comedies mentioned, then I am going to make sure I start targeting Jay-Z fans with my ads to test out the theory.

We can't look at this piece of information as gospel truth that Jay-Z fans will always click on our ads, but we can certainly the test the idea for a reasonable cost.

Correlation and Causation

I'm a regular reader of Moz.com, and one of the phrases they throw out time and again is that "correlation does not mean causation." Just because 2 things occurred at the same time and seem to have some sort of link does not mean that one thing caused the other thing. Just because sci-fi fans have a broader taste in music than fantasy fans (another example from the TNW article) doesn't mean that sci-fi will cause you to have a broader taste in music.

Still, it might mean that a sci-fi fan might be a better person to discuss music with.

We have to see that correlation can be helpful, too. We don't always have to know the why to every piece of information.

What Do You Think?

Do you see ways that correlation could help people in your profession or art make better decisions?

Big Data Limitations - We Still Need the Human Element

Big Data Limitations - We Still Need the Human Element

Big Data Vs. Sampling

Big Data Vs. Sampling