DBGSOM is a powerful neural network tailored for clustering, classification, and nonlinear projection. It grows dynamically to establish an optimal number of prototypes based on data characteristics, facilitating effective data representation without the need for pre-defined group counts. With sklearn compatibility and built-in visualization, it enhances data analysis and insight extraction.
DBGSOM: Directed Batch Growing Self-Organizing Map
DBGSOM (Directed Batch Growing Self-Organizing Map) is an advanced neural network tool designed for clustering, classification, nonlinear projection, and data visualization. This powerful implementation, compatible with scikit-learn, enables intuitive and efficient data representation without the need to predefined cluster counts. By determining the appropriate number of prototypes needed to represent the data dynamically, it facilitates enhanced learning outcomes and data insights.
Key Features
- Dynamic Growth: Automatically expands the map by adding neurons at positions where quantization error exceeds a certain threshold.
- Sklearn Compatibility: Functions seamlessly as a drop-in substitute for
KMeansandDBSCAN, supportingfit_predict,transform,score, andpredict_probamethods. - Topology Preservation: Maintains a layout where neighboring neurons correspond to similar input data, achieving a topographic error of less than 5% on the Digits dataset.
- Batch Learning Efficiency: Utilizes a batch learning rule to train on all samples per epoch, making it faster than traditional self-organizing maps (SOMs).
- Visualization Tools: The integrated
plot()function provides insights into neuron densities, classification labels, error metrics, and hit counts.
How It Operates
DBGSOM starts with an initial setup of four neurons, which are adjusted through the following process:
- Samples are assigned to the nearest neuron using the best matching unit (BMU).
- Weights are recalibrated towards assigned samples.
- New neurons are created at boundary locations where quantization error is high.
- The process iterates, shrinking the influence of neighboring neurons on weight adjustments over time.
The outcome is a two-dimensional grid map where each neuron connects to four neighboring nodes, maintaining the structure of the data. As the training progresses, new neurons are added based on quantization thresholds, enabling automated scaling to fit data complexity.
Usage
DBGSOM conveniently integrates with the scikit-learn ecosystem and provides two main classes:
SomVQfor unsupervised clustering and vector quantization.SomClassifierfor supervised classification tasks.
Example of Clustering
from dbgsom import SomVQ
from sklearn.datasets import load_digits
X, y = load_digits(return_X_y=True)
vq = SomVQ(lambda_=80.0, max_neurons=80)
labels = vq.fit_predict(X)
print(f"Neurons: {len(vq.neurons_)}")
print(f"Quantization error: {vq.quantization_error_:.4f}")
print(f"Topographic error: {vq.topographic_error_:.4f}")
Example of Classification
from dbgsom import SomClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
clf = SomClassifier(lambda_=80.0, max_neurons=80)
clf.fit(X_train, y_train)
print(clf.score(X_test, y_test)) # Get accuracy
proba = clf.predict_proba(X_test) # Get class probabilities
Visualization
Utilize the plot() method to visualize neurons and groupings:
vq.plot(color="density")
clf.plot(color="label")
vq.plot(color="hit_count", pointsize="error")
Comparison with Other Algorithms
DBGSOM has showcased superior performance compared to standard SOMs and clustering algorithms like MiniSom, SuSi, and KMeans through various benchmarking studies involving the digits dataset. Full comparison notebooks are included for detailed analysis.
Additional Information
DBGSOM is built using Python 3.12 and relies on several essential libraries including NumPy, Numba, NetworkX, tqdm, scikit-learn, seaborn, and pandas. Research utilizing DBGSOM should cite:
Martens, S. (2025). DBGSOM: A Python implementation of the Directed Batch Growing Self-Organizing Map. Zenodo. https://doi.org/10.5281/zenodo.20525611
For comprehensive usage, examples, and performance comparisons on specific datasets, users can refer to the notebooks.
With its unique capabilities, DBGSOM stands out as a robust framework for tackling complex clustering and classifying tasks efficiently.
No comments yet.
Sign in to be the first to comment.