Network security in the era of Zero Trust architectures requires reliable real-time anomaly detection. In our recent paper presented at IEEE DSAA 2024, we introduce HDSGA (HyperDimensional Streaming Graph Anomaly detection), a novel approach that leverages Hyperdimensional Computing to detect anomalies in computer network traffic with exceptional speed and accuracy.
The Problem
Computer networks naturally represent as dynamic graphs where nodes are endpoints (servers, workstations, devices) and edges are communications between them. Network anomalies - whether from cyberattacks, malware, or intrusions - manifest as unusual patterns in this graph structure: edges that deviate from the baseline topology.

Example of anomaly detection in graph snapshots. The sequence shows snapshots sampled from an underlying windmill graph (t=1, 2, 4). At t=3, the snapshot is sampled from a complete graph and is therefore anomalous.
Traditional anomaly detection faces significant challenges:
- Scale: Real-world networks are extremely large with millions of connections
- Speed: Decisions must be made in real time
- Memory: Storing full network histories is impractical
- Interpretability: Security teams need confidence scores, not just binary predictions
Deep learning models, while powerful, often struggle with the streaming nature of network data and the need for sub-linear memory usage.
Our Approach: Hyperdimensional Computing for Graphs
What is Hyperdimensional Computing?
Hyperdimensional Computing (HDC), also known as Vector Symbolic Architectures, represents information as very high-dimensional vectors (typically 1,000-10,000 dimensions). HDC has shown promise in online learning due to its “embarrassingly parallel” operations and ability to work with symbolic representations.
Graph Encoding with Hypervectors
Our method encodes graph snapshots into hyperdimensional space using the following approach:
- Node Representation: Each network endpoint gets two random binary hypervectors:
- Ĥᵤ represents node u as a source
- Ȟᵤ represents node u as a destination
-
Edge Encoding: A connection from u to v is encoded as the Hadamard product (element-wise multiplication): H̄ᵤᵥ = Ĥᵤ ⊙ Ȟᵥ
-
Graph Snapshot: The complete network state at time t is the weighted sum of all edge encodings:
xₜ = Σ w(u,v)ₜ · Ĥᵤ ⊙ Ȟᵥ
where w(u,v) is the number of communications between u and v
Standardization and Theory
We prove that under reasonable assumptions about the baseline edge-generating process, the standardized encoding follows a known distribution. This allows us to perform non-parametric hypothesis testing to detect anomalies.
The standardized encoding is:
zₜ = (xₜ - E[xₜ]) / √var[xₜ]
When the network behaves normally (edges sampled uniformly), each element of zₜ approximately follows a standard normal distribution.
Anomaly Detection
We compute an anomaly score primitive by counting how many elements of zₜ deviate significantly from expectation:
| s̄ₜ = max(Σ 𝟙{ | zₜ,ᵢ | < θₗ}, Σ 𝟙{ | zₜ,ᵢ | > θᵤ}) |
This captures two types of anomalies:
- Graphs that are more random than baseline (equivalence testing)
- Graphs that are less random than baseline (hypothesis testing)
Confidence Scores
The primitive scores are converted to interpretable Bayesian confidence estimates using a Beta prior. This provides security analysts with probabilistic assessments of whether a network snapshot is anomalous.
The confidence score πₜ tells us: “With probability πₜ, this network activity is anomalous.”
Key Advantages
1. Constant Memory
Unlike methods that store historical data, HDSGA maintains only:
- A (d+1)-dimensional counter vector for score frequencies
- Fixed-size hypervector representations for nodes
This enables deployment on resource-constrained edge devices.
2. Real-Time Operation
The algorithm runs in O(dk) time for k edges and d dimensions. Computation can be further optimized by updating only changed edges.
3. Online Learning
Each snapshot is scored independently - no batch processing required. The confidence estimate updates incrementally as new data arrives.
4. Interpretability
Unlike black-box models, HDSGA provides:
- Probabilistic confidence scores
- Clear anomaly primitives
- Transparent decision logic for security audits
Experimental Results
We evaluated HDSGA on four datasets:
- SIMARGL 2021: 23M observations, 3,886 snapshots
- DARPA 1998: 4.5M observations, 970 snapshots
- APPRAISE H2020: 15M observations, 294 snapshots
- Simulated (TabMT): 80K observations, 303 snapshots
Detection Performance
| Method | Average AUC-ROC | Average AUC-PR |
|---|---|---|
| HDSGA-p | 95.73% | 90.20% |
| HDSGA-c | 93.65% | 85.08% |
| SpotLight | 85.55% | 62.40% |
| AnoGraph | 91.95% | 85.17% |
| MIDAS | 90.84% | 81.69% |
| MIDAS-R | 94.14% | 89.14% |
HDSGA achieved the highest average AUC-ROC across all datasets.
Runtime Performance
On average, HDSGA ran 1.88× faster than the next fastest method. For example, on the SIMARGL dataset:
| Method | Runtime (seconds) |
|---|---|
| HDSGA | 374.9 |
| SpotLight | 543.0 |
| AnoGraph | 1,456.5 |
| MIDAS | 1,352.4 |
| MIDAS-R | 2,392.3 |
Visualization Results
Our experiments on the SIMARGL dataset revealed that different attack types (DoS, port scanning, malware) map to distinct clusters in the hyperdimensional encoding space. This demonstrates that HDSGA not only detects anomalies but also preserves structural information about different types of attacks.

Left: Anomaly detection results on SIMARGL dataset showing ground truth (top row) and detection scores from various methods. Right: 2D projection (PCA) of binary encodings showing how different attack types (DoS, port scanning) cluster separately from benign traffic.
Synthetic Data Insights
We tested HDSGA on synthetic graphs with known structures (complete, caveman, star, wheel, windmill). The encodings successfully distinguished between different graph topologies, with similar structures mapping to nearby regions in the encoding space.
This validates that the hyperdimensional encoding captures meaningful graph properties beyond simple randomness.
Zero Trust Architecture
HDSGA was developed as part of Leidos’ Zero Trust Mission Cloud Initiative. In Zero Trust frameworks, network access is never implicitly trusted - every connection must be verified. Real-time anomaly detection is critical for:
- Behavioral monitoring: Detecting unusual patterns in authorized connections
- Threat hunting: Identifying lateral movement and command-and-control traffic
- Incident response: Rapidly flagging suspicious activity for investigation
HDSGA’s constant memory footprint and real-time operation make it ideal for distributed Zero Trust deployments across large enterprise networks.
Technical Deep Dive: The Mathematics
For those interested in the theoretical foundations, our paper proves several key results:
-
Encoding Distribution: Under uniform sampling, the encoding follows a weighted sum of Bernoulli random variables
-
Central Limit Theorem: The standardized encoding converges to standard normal as dimensionality increases (via Lyapunov CLT)
-
Finite Graph Corrections: We provide exact variance formulas accounting for sampling without replacement in finite networks
These theoretical guarantees ensure that our anomaly detection has well-calibrated false positive rates.
Hyperparameters
The main tuning parameters are:
- p, q: Source and destination mapping probabilities (typically 0.1-0.3)
- θₗ, θᵤ: Lower and upper anomaly thresholds (typically 0-4 standard deviations)
- d: Hypervector dimension (2¹⁰ = 1024 in our experiments)
- γ: Expected contamination rate (fraction of anomalous snapshots)
We provide guidance in the paper for setting these based on empirical percentiles of a small calibration set.
Limitations and Future Work
Current Limitations:
- Assumes baseline generating process exists (though it need not be uniform)
- Very similar graph structures may be confused (e.g., caveman vs. complete graphs)
- Requires bounded edge weights for theoretical guarantees
Future Directions:
- Multi-language support for node identifiers
- Integration with existing SIEM platforms
- Explainability features identifying which edges contribute most to anomaly scores
- Adaptive threshold learning
Implementation
We implemented HDSGA in Python with experiments conducted on AWS WorkSpaces (Amazon Linux 2, 7.7GB RAM, 2.5GHz Intel Xeon). The code leverages:
- NumPy for efficient vector operations
- Sparse representations for memory efficiency
- Parallel processing for large-scale networks
The approach is architecture-agnostic and can be deployed on various platforms from edge devices to cloud infrastructure.
Conclusion
By combining the theoretical foundations of Stochastic Block Models with the computational efficiency of Hyperdimensional Computing, HDSGA provides a practical solution for real-time network anomaly detection. Our results demonstrate that principled mathematical approaches can match or exceed the performance of complex deep learning models while offering superior speed, interpretability, and memory efficiency.
As organizations adopt Zero Trust architectures, tools like HDSGA will become increasingly critical for maintaining security without sacrificing performance.
This research was conducted under Leidos’ Office of Technology Zero Trust Mission Cloud Initiative (approval 24-LEIDOS-0430-27761) and presented at IEEE DSAA 2024.
Read the full paper: IEEE Xplore