An ensemble heterogeneous classification methodology for discovering health-related knowledge in social media messages

Tuarob, Suppawong; Tucker, Conrad S; Salathe, Marcel; Ram, Nilam

doi:10.1016/j.jbi.2014.03.005

Tuarob, Suppawong; Tucker, Conrad S; Salathe, Marcel; Ram, Nilam

2014

Formats

Format
BibTeX
MARCXML
TextMARC
MARC
DataCite
DublinCore
EndNote
NLM
RefWorks
RIS

Abstract

The role of social media as a source of timely and massive information has become more apparent since the era of Web 2.0.Multiple studies illustrated the use of information in social media to discover biomedical and health-related knowledge.Most methods proposed in the literature employ traditional document classification techniques that represent a document as a bag of words.These techniques work well when documents are rich in text and conform to standard English; however, they are not optimal for social media data where sparsity and noise are norms.This paper aims to address the limitations posed by the traditional bag-of-word based methods and propose to use heterogeneous features in combination with ensemble machine learning techniques to discover health-related information, which could prove to be useful to multiple biomedical applications, especially those needing to discover health-related knowledge in large scale social media data.Furthermore, the proposed methodology could be generalized to discover different types of information in various kinds of textual data.