Community detection is a fundamental problem in network analysis - identifying groups of nodes that are more densely connected internally than to the rest of the network. In our recent paper presented at the Learning on Graphs Conference (LoG 2025), we propose a novel approach that combines Graph Neural Networks (GNNs) with Stochastic Block Models (SBMs) to create a differentiable, architecture-agnostic framework for community detection.
Our Approach: SBM-Based Loss Functions for GNNs
Traditional community detection methods like Louvain and spectral clustering are effective but don’t leverage the representation learning capabilities of Graph Neural Networks. Meanwhile, existing GNN approaches for community detection often use heuristic loss functions that may not directly optimize for community structure quality.
Stochastic Block Models as Loss Functions
Our key insight is that Stochastic Block Models (SBMs) provide a principled way to evaluate partition quality through their likelihood functions. SBMs are generative models that describe how random graphs are created based on community structure. Since SBM likelihood functions are:
- Well-defined: They measure how well a partition explains the observed graph structure
- Differentiable: They can be used as loss functions for gradient-based optimization
- Theoretically grounded: They’re based on statistical principles rather than heuristics
We can use them directly as loss functions for training GNNs in an unsupervised manner.
Architecture-Agnostic Framework
Our approach is architecture-agnostic - it works with any GNN that outputs node embeddings. The training process:
- GNN produces node embeddings from the input graph
- Embeddings are mapped to soft community assignments
- SBM likelihood evaluates the quality of these assignments
- Gradients flow back through the network to improve embeddings
This framework allows different GNN architectures (GCN, GAT, GraphSAINT, etc.) to be trained for community detection without modifying their core structure.
Results
Our experiments across multiple datasets show that SBM-based loss functions produce competitive results compared to existing community detection methods, while providing the benefits of:
- End-to-end training: No separate clustering step needed
- Scalability: Leverages mini-batching and GPU acceleration
- Flexibility: Works with various GNN architectures
- Interpretability: Loss directly measures partition quality via statistical likelihood
Why This Matters
Community detection is crucial for understanding network structure in domains ranging from social networks to biological systems. By combining the representation learning power of GNNs with the theoretical foundation of SBMs, we create a framework that’s both principled and practical.
The code and experiments are available in our LoG 2025 paper.