Can social media predict epidemics?
Marcel Salathé is devoted to mixing his two passions: biology and technology. He believes the most logical site for the collision of these two worlds is in the area of epidemiology. Together with his team from Campus Biotech in Geneva, he is working on an early epidemic warning system that uses social media to predict the next outbreak.
“People who are sick or are getting sick like to talk about it on social media like twitter or they search their symptoms on Google,” Salathé told the Basel online newspaper. “We take these data points and use machine learning to make the information usable and answer health questions.”
Traditionally, epidemic trends are mapped using data collected from health care providers, who in turn collect data from sick patients. Unfortunately this process yields an incomplete picture. It only includes those groups who have access to health care or who choose to go to the doctor in the first place; it only captures information about reportable diseases, omitting a wide range of other illnesses; and it cannot provide information about health behaviours, sentiments or opinions. That is where social media come in — they are the petri dish in which public opinion can be observed.
Salathé is not the first to use online data for epidemiological purposes. Google Flu Trends was launched in 2008 to try to nip flu outbreaks in the bud by analysing Google queries. The initiative soon ran into some trouble though. Its estimates were not always accurate. During the 2012-2013 flu season in the northern hemisphere, it overestimated the flu prevalence by up to 100% (compared with numbers by the Center for Disease Control and Prevention). Moreover, their estimates could not easily be reproduced because researchers cannot easily gain access to the full breadth of Google’s data.
Consequently many turned to twitter as an alternative data source. The information there is public, which means everyone theoretically can get their hands on the same data.
Ming-Hsiang Tsou, a professor at San Diego State University and an author of a recent study entitled The Complex Relationship of Realspace Events and Messages in Cyberspace: Case Study of Influenza and Pertussis found a useful connection between twitter and flu outbreaks. His team observed that the correlation between weekly flu tweets and the national flu data around 11 U.S. cities was almost 86 percent. When Tsou compared their results with data from the San Diego County Health and Human Services Agency, the correlation was 93 percent.
Tweets have also been used to respond to foodborne health problems. In 2014 the Chicago Department of Public Health began analysing tweets that reference food poisoning. The city then used the information to step up inspections of establishments that had been implicated.
It must be said that social media tracking also has some detractors. Researchers involved in the Social Media and Internet-Based Data in Global Systems for Public Health Surveillance study found that this kind of event-based surveillance has severe limitations. For one, information is not always moderated by professionals or interpreted for relevance before it is disseminated to epidemiologists. There is also no standardised system for updates yet; algorithms and statistical baselines are still only in their early stages of development; and the entire approach continues to produce a lot of noise that cannot always successfully be filtered out.
But even with these limitations social media tracking has already been able to deliver some impressive results. For instance HealthMap — a service that scrapes government websites, social networks and local news reports to map potential disease outbreaks — identified the warning signs of Ebola a full nine days before the World Health Organization declared it an epidemic in 2015.
As the number of smartphone users reaches 6,1 billion by 2020, social media seem set to become increasingly important in epidemiological predictions. Although they cannot replace traditional methods quite yet, they certainly can perform a supplementary function.