Big Data at First Glance

For all of the intensely complicated technology terms we've devised over the past hundred years, "big data" is endlessly delightful. But for all its obviousness, the idea needs to be clarified. Big data isn't just about having lots of numbers and spreadsheets; it's about what is uniquely is possible once you have such large volumes of data. You see, it's not just about the collection of the data. It's using that data to predict.You already see big data in action everyday with the messages like "Customers with similar purchase histories also like these products..." that pop up on Amazon and Netflix. Online retailers can't really tell what you'll like, but they're getting better at predicting what you'll want.

The large troves of information are not just used for shopping suggestions, though you should certainly expect to see even more targeted versions of those kinds of messages going forward.

In the aptly titled Big Data by Viktor Mayer-Schonberger and Kenneth Cukier, the authors recounted the way that Google was able to begin tracking the outbreak of the H1N1 virus simply based on the search terms people used. The tracking system did not rely on a series of terms supplied by Google, but it matched up user queries and locations after a case had already occurred. Upon analysis, patterns emerged in the local population's online activities, and those search patterns became predictors for where the next outbreak would occur.

Remember that Whole Privacy Thing?

And here's where big data quickly becomes a contested topic. The citizens of the United States are not exactly in a trusting mood at the moment while PRISM and NSA details continue to surface. It's entirely likely that we'll never know the full extent of how deep these tracking methods actually go.

My point here is not to deal exactly with the politics of the situation as to discuss what's possible with such large amounts of data. This is a bigger (get it, "big" data... I digress) topic than one blog post will allow, so we'll be discussing this for the next several days here on the blog. For now, consider what kinds of predictions can be made when the dataset includes not only shopping history from Amazon but search history from Google and personal history from Facebook. What kind of picture emerges with that combination of data?

It's problematic, to be certain.

And Yet, Big Data Is Still Helpful

Rather than approaching this on a fear-based angle, I would like to focus also on the possibilities of what big data provides. Like other technology, big data is a tool that can be used for good or bad.

Even though individuals are not going to have access to vast amounts of data like the U.S. government or large online retailers or search engines, professionals and amateurs alike have the ability to collect a sort of big data all on their own. This opportunity has already created new companies and jobs for people, and the benefits are not restricted to just the commercial sector. Nonprofits can use their own datasets to better help those in need.

What Do You Think?

We've barely skimmed the surface on the topic of big data so far, but what do you think? Are you worried about companies and governments ability to utilize this data? Are you intrigued by using this data yourself?

Image provided by Marius B, taken from a TED video that exemplifies a fascinating use of big data to study the development of language.

