PitchHut logo
A scalable clustering solution with automatic prototype determination.
Pitch

DBGSOM is a powerful neural network tailored for clustering, classification, and nonlinear projection. It grows dynamically to establish an optimal number of prototypes based on data characteristics, facilitating effective data representation without the need for pre-defined group counts. With sklearn compatibility and built-in visualization, it enhances data analysis and insight extraction.

Description

DBGSOM: Directed Batch Growing Self-Organizing Map

DBGSOM (Directed Batch Growing Self-Organizing Map) is an advanced neural network tool designed for clustering, classification, nonlinear projection, and data visualization. This powerful implementation, compatible with scikit-learn, enables intuitive and efficient data representation without the need to predefined cluster counts. By determining the appropriate number of prototypes needed to represent the data dynamically, it facilitates enhanced learning outcomes and data insights.

DBGSOM
DOI

Key Features

  • Dynamic Growth: Automatically expands the map by adding neurons at positions where quantization error exceeds a certain threshold.
  • Sklearn Compatibility: Functions seamlessly as a drop-in substitute for KMeans and DBSCAN, supporting fit_predict, transform, score, and predict_proba methods.
  • Topology Preservation: Maintains a layout where neighboring neurons correspond to similar input data, achieving a topographic error of less than 5% on the Digits dataset.
  • Batch Learning Efficiency: Utilizes a batch learning rule to train on all samples per epoch, making it faster than traditional self-organizing maps (SOMs).
  • Visualization Tools: The integrated plot() function provides insights into neuron densities, classification labels, error metrics, and hit counts.

How It Operates

DBGSOM starts with an initial setup of four neurons, which are adjusted through the following process:

  1. Samples are assigned to the nearest neuron using the best matching unit (BMU).
  2. Weights are recalibrated towards assigned samples.
  3. New neurons are created at boundary locations where quantization error is high.
  4. The process iterates, shrinking the influence of neighboring neurons on weight adjustments over time.

The outcome is a two-dimensional grid map where each neuron connects to four neighboring nodes, maintaining the structure of the data. As the training progresses, new neurons are added based on quantization thresholds, enabling automated scaling to fit data complexity.

Usage

DBGSOM conveniently integrates with the scikit-learn ecosystem and provides two main classes:

  • SomVQ for unsupervised clustering and vector quantization.
  • SomClassifier for supervised classification tasks.

Example of Clustering

from dbgsom import SomVQ
from sklearn.datasets import load_digits

X, y = load_digits(return_X_y=True)
vq = SomVQ(lambda_=80.0, max_neurons=80)
labels = vq.fit_predict(X)

print(f"Neurons: {len(vq.neurons_)}")
print(f"Quantization error: {vq.quantization_error_:.4f}")
print(f"Topographic error:  {vq.topographic_error_:.4f}")

Example of Classification

from dbgsom import SomClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

clf = SomClassifier(lambda_=80.0, max_neurons=80)
clf.fit(X_train, y_train)

print(clf.score(X_test, y_test))       # Get accuracy
proba = clf.predict_proba(X_test)      # Get class probabilities

Visualization

Utilize the plot() method to visualize neurons and groupings:

vq.plot(color="density")
clf.plot(color="label")
vq.plot(color="hit_count", pointsize="error")

Comparison with Other Algorithms

DBGSOM has showcased superior performance compared to standard SOMs and clustering algorithms like MiniSom, SuSi, and KMeans through various benchmarking studies involving the digits dataset. Full comparison notebooks are included for detailed analysis.

Additional Information

DBGSOM is built using Python 3.12 and relies on several essential libraries including NumPy, Numba, NetworkX, tqdm, scikit-learn, seaborn, and pandas. Research utilizing DBGSOM should cite:

Martens, S. (2025). DBGSOM: A Python implementation of the Directed Batch Growing Self-Organizing Map. Zenodo. https://doi.org/10.5281/zenodo.20525611

For comprehensive usage, examples, and performance comparisons on specific datasets, users can refer to the notebooks.

With its unique capabilities, DBGSOM stands out as a robust framework for tackling complex clustering and classifying tasks efficiently.

0 comments

No comments yet.

Sign in to be the first to comment.