On the ground validation of online diagnosis with Twitter and medical records

Social media has been considered as a data source for tracking disease. However, most analyses are based on models that prioritize strong correlation with population-level disease rates over determining whether or not specific individual users are actually sick. Taking a different approach, we develop a novel system for social-media based disease detection at the individual level using a sample of professionally diagnosed individuals. Specifically, we develop a system for making an accurate influenza diagnosis based on an individual's publicly available Twitter data. We find that about half (17/35 = 48.57%) of the users in our sample that were sick explicitly discuss their disease on Twitter. By developing a meta classifier that combines text analysis, anomaly detection, and social network analysis, we are able to diagnose an individual with greater than 99% accuracy even if she does not discuss her health

Published in:
Proceedings of the 23rd International Conference on World Wide Web, 651-656
Presented at:
23rd International World Wide Web Conference, Seoul, Korea, April 7-11, 2014
Geneva, ACM

 Record created 2015-12-10, last modified 2018-09-13

Rate this document:

Rate this document:
(Not yet reviewed)