GrafoPropagation - Efficient text classification through innovative attention mechanisms.

Projects Leaderboard

Pitch

Description

Comments

GrafoPropagation

philosophical_blue_floris

Efficient text classification through innovative attention mechanisms.

Visit project

Pitch

GrafoPropagation offers a cutting-edge solution for text classification with its compact architecture and unique geometric attention mechanisms. Leveraging advancements such as WordNet pre-training and AG News fine-tuning, it delivers impressive performance while maintaining efficiency, making it a strong choice for modern NLP applications.

Description

GrafoPropagation is an innovative text classification architecture characterized by its compact design, featuring approximately 990,000 parameters. It leverages geometric von Mises-Fisher (vMF) attention and integrates WordNet dictionary pre-training alongside AG News fine-tuning to enhance its performance in natural language processing tasks.

Key Innovations

GrafoPropagation introduces several cutting-edge components that set it apart from traditional models:

vMF Dual-Scale Attention: Utilizes queries and keys on a unit hypersphere with a learnable concentration parameter (κ), allowing for local and global attentional focus.
Asymmetric Query/Key Projections: Offers improved expressivity through distinct projections for queries and keys.
Riemannian Temporal Embedding: Employs Log-Map and Parallel Transport on the sphere to account for position-aware dynamics in text data.
Rotary Position Embedding (RoPE): Utilizes normalized direction vectors to enhance model understanding of context.
Dynamic GrafoConnect: Integrates learned cross-layer skip-connections, modulated by curvature for optimized information flow.
Global Workspace Memory: Facilitates learnable broadcast slots following Global Workspace Theory, improving long-term memory storage during processing.
System-2 Latent Search: Implements an iterative branch-evaluate-merge process using Gumbel Monte Carlo Tree Search (MCTS) to enhance reasoning capabilities.
Dictionary Pre-Training: Implements multi-label binary cross-entropy over WordNet definitions, contributing to foundational understanding.
Quantum Learning Rate Modulation: Uses an 8-qubit PennyLane circuit to dynamically adjust the learning rate during training.

Quick Start Guide

For users interested in leveraging this architecture, GrafoPropagation provides a straightforward API:

from grafopropagation import CFG, GrafoPropagation, run_training, build_or_load_tokenizer

# Default configuration (~990k params)
cfg = CFG()
result = run_training(cfg)

# Scale architecture as needed
cfg = CFG(d_model=128, n_layers=4, n_heads=8, head_dim=32, d_ff=640)
result = run_training(cfg)

# Full configurability via dictionary
cfg = CFG.from_dict({
    "d_model": 256,
    "n_layers": 6,
    "n_heads": 8,
    "head_dim": 32,
    "d_ff": 1024,
    "dict_epochs": 500,
    "epochs": 50,
})
print(f"Estimated parameters: {cfg.count_parameters():,}")
result = run_training(cfg)

Architecture Breakdown

GrafoPropagation employs a multi-layered approach involving character-seeded token embeddings, global workspace memory, and advanced transformer blocks, all designed to optimize the handling of text data and bolster classification accuracy.

Training Performance

In initial training configurations with approximately 990,000 parameters, GrafoPropagation shows promising results with accuracy metrics improving swiftly over epochs, evidencing its efficacy:

Epoch	Train Accuracy	Validation Accuracy
1	67.2%	25.6%
3	90.6%	84.4%
5	92.6%	91.1%
10	94.9%	92.7%
13	95.9%	93.1%
15	96.4%	93.1%

GrafoPropagation stands as a significant advancement in text classification architectures, combining efficiency with innovative techniques to push the boundaries of traditional models in natural language processing.

0 comments

No comments yet.

New comment