Three Kinds of Lies (with Data)

Jeremy wrote this on December 15, 2010 in , . It has 8 comments.

There are three kinds of lies: lies, damned lies and statistics.

- Mark Twain

Designing awesome things for people means teasing out insights into human behavior from mountains of data. Julie Zhuo was over at ZURB recently and shared how Facebook drives decisions by data, most often with relatively small impacts.

At ZURB we've confronted massive data sets in the form of billions of Photobucket photos, millions of NYSE market transactions per minute, and tens of thousands of genes in the human genome with 23andMe. Since 80% of statistics are made up on the spot (as 4 out of 5 dentists agree), how do you find patterns in your data to help you make the right changes and not screw it all up?

Consider these three lies that are easy to run into with your data:

1. Asking the wrong questions

The worst thing you can do is ask no questions when researching your data for answers. Second to that, though, is being lazy. Ask boring or loaded questions and they can lead to self-fulfilling results (wishful thinking that's bound to meet a cold, hard reality). Your expectations can subconsciously influence small but important decisions you make along the way--take Digg.com for example.

Digg's Version 4 redesignDigg.com's Version 4 redesign focused on things at the expense of existing customers and pageviews tanked 37%

Founder Kevin Rose's unveiling of Digg.com's Version 4 redesign this year pointed myopically on metrics that catered a new business strategy around friends and sharing at the expense of anything else. Justifying their decision to kill the Upcoming News tab for instance, a page near and dear to power users, Rose said:

Out of 200+ Million pageviews in July, only 0.4% was from upcoming (yes, that's less than 1/2 of a percent). I definitely see the fun behind wanting to see stories just before they jump...

That's an odd perspective to take given the role of that page for so many die hard users. It seems Rose and team were asking questions wishing for answers. Along the way they forgot the people they served and didn't ask the tough questions they would the day the new site launched. Three months later, Digg.com has shed 37% of its pageviews and lost its lead over competitor Reddit.com in a catastrophic fall.

2. Using bad data

Good questions help avoid selecting bad data, but it's still easy to get lazy and sample something too small or too favorable to the answers you hope to find. It's interesting to look at what constituted a'oebad dataa'' in the case of product flop Google Wave.

From the Wave team's blog and interviews its clear they are very smart people who spent a lot of time focusing on their own code and with tracking and responding to the concerns of early adopter developers. Noble pursuits, but they don't add up to the concerns of everyday people who were excited by the promise of a live, conversational document.

Google Wave screenshotWith so much complexity in Wave, the Google team needed a laser-like focus on metrics that illustrated customer problems to build a true replacement for email.

The right answers that could have led the Google Wave team toward key insights are no doubt there, if only they'd known to position their product this way and look for them. The data they sampled didn't seem to solve problems people were experiencing!

3. Misinterpreting results

Even if you're asking the right questions and mining the right data, there are more lies you can fall into. First, be sure that pattern in the data you think you see actually exists. Humans are pattern-recognizing animals, sometimes to a fault.

Even if you recognize legitimate patterns in your results, you still have the burden of interpreting "why" those results exist and presenting them to other people to drive decisions. Kevin Rose reached for data in defense of the Digg redesign, even calling it a success:

Usage looks extremely good (ie. more people registering (43,000+ new users yesterday), digging, consuming, clicking, following, etc.)

This is so easy to do, but you can't ignore an obvious bad situation or change the answers you hope to find to fit your results after the fact. Rose quoted activity stemming from the short-term spike in awareness and traffic from the buzz around their relaunch. He was no doubt comfortable measuring those metrics, but did so at the expense of asking tough new questions.

The questions you ask, the numbers you look at, and the patterns you see all effect whether the numbers you end up using actually help people, or just end up lying to them.


It has 8 comments.

Dmitry (ZURB) says

Love the post. Just like Julie from Facebook mentioned during her talk - there are moments when it's just best to stay away from data. Rule of thumb - turn to data when you have a very specific question. Don't just browse data and come up with conclusions.


Ernest (ZURB) says

Re: Digg's explanation.

On my blog less than 0.5 % of the hits come from people who publish stories and the super majority is from the readers, so we have decided to shut the Publish page down.


Nicolae Rusan (ZURB) says

Definitely agree that humans are pattern-seeing animals, but also pattern-seeking animals. The question of how long one should observe before drawing a conclusion about patterns is not easy to answer. There is also the issue that the design decisions you make in your product can also cause new patterns to emerge. It works both ways: patterns -> Design, design - > patterns.

I think that's the challenge of driving design decisions based on focus groups or user input. I wrote a blog post about the perils of focus groups a while back. Seems like more and more companies (e.g. Tumblr, and Facebook) are coming to the realization that data driven approaches (a la Google), may be too slow, and often off the mark of what users 'really want' vs what they say they want. More are moving towards trusting intuitions guided only minimally by data.


memo (ZURB) says

It is quite interesting to see how people manipulating with numbers.


Jeremy (ZURB) says

@Nicolae, I think these companies are discovering how to balance the two. It takes instinct to ask the right questions and know when (and when not) to rely on data. Sometimes the slower approach in the short term can save a heck of a lot of money and time over the long term.


Mara (ZURB) says

Like this article a lot, and just happened to see Michael Shermer (editor of skeptic magazine)'s Ted Talk which is all about what he calls "patternicity," the dangers of how humans are hardwired to see patterns. Personally I am intrigued by it from a psychological perspective, but think it has a lot of implications for the dangers of data almost misrepresenting itself (we're so hotwired to see correlation as causation etc., that we don't even need the suggestion in order to see patterns).

Also reminded me of the oft-passed-around list of bread correlations: (98% of convicted felons are bread-eaters)


Dmitry (ZURB) says

Nice comment Mara. As I mentioned before the trouble often arises when you start browsing data and graphs without asking the question first. You see spikes in data and start to think of reasons for the patterns in data. These reasons start to turn into conclusions of what you should do next. You loose your focus on what you were doing before. Instead - just ask a question before you even look at data. Then you're looking for specific data to answer your question and help you move whatever you are working on forward.


Brainstorm (ZURB) says

I like this article a lot, it is definitely interesting how that being a pattern-recognizing animal that pattern play a fact in how are brains can work against us and misinterpret the data.



Get a job, nerd!

via Job Board from ZURB