Google’s Flu Snafu and the Reliability of Web Data

The Web is full of data — much of it meaningful — but there’s some question as to how much we should actually rely on it. The latest evidence comes at Google’s expense, with some researchers questioning the validity of Google’s Flu Trends algorithm. They say the service, which estimates the number of flu cases around the world by analyzing trends on Google’s search engine, vastly overestimated this year’s season in the U.S. compared with more traditional methods of measuring flu cases.But this snafu is just a microcosm of a broader debate over how much stock we should put in Web and social media data, and in what cases it’s most valid. It’s hard to figure out how much we should value speed and scale over quality of data. Millions of (presumably) younger people proactively searching or tweeting about a topic provides a huge and theoretically unbiased data set, while traditional methods of phone calls or focus groups reach a smaller number of (presumably) older people who know they’re being observed, but who also are answering questions directly relevant to the research at hand.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.