Machine-Learning Algorithms Could Help Debunk Twitter Rumors

Developers are creating machine-learning algorithms to assess the credibility of disaster-related tweets automatically.

Twitter is one of the fastest and most comprehensive ways of staying abreast of breaking news. However, it’s not always easy to tell whether these microblogging status updates are being truthful.

There are plenty of hoaxes and rumors that are marred with tragedy, however, users manage to debunk most of the widely circulated falsehoods. Verification is one of the biggest challenges that first responders or humanitarian workers face when using social media, states Patrick Meier of the Qatar Foundation’s Computing Research Institute.

Now developers are working on machine-learning algorithms that could be used to automatically assess the credibility of information tweeted during a disaster. The idea is that computers might be able to quickly and automatically make a preliminary assessment about the credibility of a source.

Previous research has shown^[1] legitimate tweets of news propagate differently than falsehoods. False rumors were far more likely to be tweeted with a question mark or some indication of doubt or denial. The authors of this study developed a machine-learning classifier using 16 features to assess the credibility of newsworthy tweets. Truthful tweets tend to be longer and include URLs, people tweeting them will have higher follower counts, the tweets are negative rather than positive in tone and the tweets do not include question marks, exclamation points, or first- or third-person pronouns.

A more recent study by researchers at India’s Institute of Information Technology^[2] also found that credible tweets were less likely to contain swear words and significantly more likely to contain frowning emoticons than smiley faces.

A new paper will be published next month in the journal Internet Research by C. Castillo, M. Mendoza, and B. Poblete, testing out the algorithm they developed. It had an AUC (area under the curve) of 0.86, meaning that when it was presented with a random, false tweet and a random, true tweet, it would assess the true tweet as more credible 86% of the time.

References

M. Mendoza, B. Poblete, C. Castillo, Yahoo Research Twitter Under Crisis: Can we trust what we RT?
A. Gupta, P. Kumaraguru, Credibility Ranking of Tweets during High Impact Events, Indraprastha Institute of Information Technology, Delhi, India [PDF]