Friday, December 21, 2012

Can Big Data be trusted to represent frontier market populations?

This week, at the WorldBank’s Big Stats meeting , they discussed the need for governments to look closely at how big data can be applied to official statistics. Big data allows for quick results that provide really interesting answers that come out of questions asked of unstructured questions. On the other hand, official statistics take sometimes years to compile, are often out of date, but are usually trustworthy.

But what happens when you look at frontier markets? Does big data still have the same appeal, since so much of it is dependent on tracking through social networks accessible by only some of the population?

Being stakeholder obsessed requires that you know your stakeholder. I strongly advocate for robust voice of customer (VOC) programs, which can give both qualitative and quantitative insights into what is truly important. Many customer experience leaders today rely on increasingly complex forms of VOC feedback, and social media has been a game changer in this arena.

But in frontier markets, that isn't always accessible. If you are making $400 a month, and web access through your mobile costs $40/month, your participation in social networks is likely to be lower, and big data begins representing stakeholders with higher income. So people in frontier markets using social media are usually wealthier than the total population, and would skew accordingly in big data. But that data is still useful. In 2008, internet penetration covered only 7.9% of the population of Kenya, but Ushahidi was still a valuable resource used to crowd source attack tracking in Kenya following the 2007 election. Results can be informative even if you don’t have the most representative sample.  

On the other hand, why should government statistics from frontier markets can be trusted any more than big data? When I was taking my first quantitative methods course in college, North Korea was easily identified as an unreliable outlier, because their statistics were always way too rosy to use in any calculations.

But the North Korea example shows why we need to trust official government statistics—we know North Korea’s stats are off because they don’t compare to more trustworthy results. Well calculated official government stats are still extremely important. They serve as a baseline. Just as we can use them to identify North Korea as an outlier data point that should probably not be considered, we can also use them to see how our big data searches compare to traditional statistics.

At the event Paul Cheung made a great point: government statistics have information depth attached to every data point (census data on a household), whereas big data has breadth: lots of data points with comparatively shallow background information (where the 23 year old user was when he liked McDonald’s on Facebook).

For what it’s worth, the Wisdom Network’s 624 Facebook 
users born in North Korea have the highest affinity
 with the Dynamo Dresden team.
So they like German Football.
Linking this data, so government statistics could inform and be informed on big data trends could have a massive impact. Cheung suggests collecting Facebook and Twitter handles along with identification such as personal address.

You need two approaches to information gathering: official statistics need to serve as a baseline to verify big data, and big data needs to be used to create informative results quickly for agile responses to issues.

Organizations like the World Bank can use big data to overcome one of their primary criticisms: they move too slowly.

As I sat in the meeting the other day, I kept thinking of the problems that newspapers now face as a result of the web. Readers have now shifted to blogs and sites that promise faster information, sometimes sacrificing quality, ethics, and truthfulness. Sources of official statistics have to speed up their publication times and accept big data as a complimentary tool they can use, or they might go the way of the Seattle-Post Intelligencer. 

No comments:

Post a Comment