I’m excited to share that our paper on using machine learning to analyze cybersecurity threats discussed on Twitter was recently featured in Wired Magazine! The work was also covered by Security Today and Ohio State University.

Paper: Analyzing the Perceived Severity of Cybersecurity Threats Reported on Social Media
Authors: Shi Zong, Alan Ritter, Graham Mueller, Evan Wright
Published: NAACL 2019 (Conference of the North American Chapter of the Association for Computational Linguistics)
The Problem
When a new software vulnerability is discovered, security professionals need to quickly assess its severity to prioritize patching efforts. However, vulnerabilities are often discussed on social media days or even weeks before they appear in official databases like the National Vulnerability Database (NVD). This creates a critical window where organizations could be vulnerable without realizing it.
Traditional approaches rely on manual analysis or waiting for official CVE (Common Vulnerabilities and Exposures) reports, which can take time. Meanwhile, security researchers, hackers, and developers are actively discussing these threats on Twitter, sharing insights about their severity and potential impact.
Our Approach
We developed a machine learning system that:
- Monitors Twitter for discussions about software vulnerabilities
- Links tweets to CVEs in the National Vulnerability Database
- Analyzes user opinions about vulnerability severity using natural language processing
- Predicts which vulnerabilities will be rated “high” or “critical” before official ratings are published
Dataset
We created a dataset of 6,000 annotated tweets discussing software vulnerabilities, each labeled with perceived severity. This involved:
- Identifying tweets mentioning specific CVEs
- Extracting user opinions about threat severity
- Linking social media discussions to official vulnerability records
Model Performance
Our model achieved impressive results:
- Precision@50 of 0.86 when forecasting high severity vulnerabilities
- Successfully predicted which vulnerabilities would receive official “high” or “critical” severity ratings with over 80% accuracy
- Substantially outperformed baseline methods
Key Findings
Early Warning System
Twitter discussions can predict the majority of security flaws that appear in the National Vulnerability Database days later. This means security teams could get advance warning about critical threats before official CVE reports are published.
Severity Prediction
By analyzing the language used in tweets - how urgently people discuss a vulnerability, what words they use, and the context of the discussion - we can accurately predict official severity ratings. This is valuable because:
- Official severity scores sometimes take days or weeks to be published
- The crowd’s assessment often aligns with expert evaluations
- Early severity estimates help prioritize incident response
Exploit Prediction
Perhaps most importantly, we found that social media discussions about severe vulnerabilities can predict real-world exploit activity. When people on Twitter are discussing a vulnerability with urgency, it’s often a signal that exploits are being developed or are imminent.
Why This Matters
For Security Teams
- Proactive defense: Get early warnings about critical vulnerabilities before they’re officially rated
- Better prioritization: Focus limited resources on the threats that matter most
- Faster response: Start patching efforts before exploits appear in the wild
For Researchers
- Novel data source: Social media provides real-time, crowd-sourced threat intelligence
- NLP applications: Demonstrates the value of natural language processing for cybersecurity
- Early warning signals: Shows that online discussions contain predictive signals about real-world security events
Future Directions
This research opens several exciting directions:
- Real-time monitoring systems: Deploying models that continuously scan social media for emerging threats
- Multi-language support: Extending beyond English tweets to capture global threat discussions
- Integration with security tools: Incorporating social media signals into existing vulnerability management platforms
- Exploit prediction: Further work on predicting not just severity, but actual exploit development and deployment
Try It Yourself
The paper and additional materials are available on arXiv. The work was presented at NAACL 2019, one of the premier conferences in natural language processing.
By combining machine learning, natural language processing, and cybersecurity expertise, we demonstrated that social media contains valuable signals for predicting and prioritizing security threats. As the security landscape continues to evolve, leveraging these data sources will become increasingly important for staying ahead of emerging vulnerabilities.
This post summarizes our NAACL 2019 paper: “Analyzing the Perceived Severity of Cybersecurity Threats Reported on Social Media” by Shi Zong, Alan Ritter, Graham Mueller, and Evan Wright.