NeuroFlow - Revolutionizing video inference with EMA-Gated compression for Vision Transformers.

NeuroFlow

Revolutionizing video inference with EMA-Gated compression for Vision Transformers.

Pitch

NeuroFlow offers an innovative PyTorch implementation for enhancing video inference efficiency in Vision Transformers. By utilizing EMA-Gated Temporal Sequence Compression, it achieves a remarkable 55.8x speedup in wall-clock time, effectively managing semantic redundancy to optimize computational resources while maintaining accuracy.

Description

NeuroFlow: EMA-Gated Temporal Sequence Compression for Vision Transformers

NeuroFlow is a cutting-edge framework designed for optimizing video inference within Vision Transformers using a novel approach to temporal sequence compression. By addressing the inefficiencies associated with traditional self-attention models, NeuroFlow dramatically enhances computational efficiency while maintaining high fidelity in video stream analysis.

Key Features

Dynamic Routing Framework: NeuroFlow tracks semantic surprise in embedding space, which enables the physical elimination of redundant background tokens. This significantly reduces the computational burden on Vision Transformers, leading to accelerated performance.
Architecture Innovations:
- Architecture C (Dual-Memory Reconstruction): A unique, training-free inference engine that combines a Layer 0 Retinal Gate and a Layer 12 Cortical Cache. It achieves 71.55% zero-shot top-1 accuracy at 84.0% token sparsity on SigLIP, all without modifying any weights.
- Architecture B (Extreme Wall-Clock Speedup): Successfully eliminates unnecessary tokens prior to encoding, achieving a 55.80× wall-clock speedup on high-resolution video at 97.37% embedding fidelity, effectively transforming the inference speed from 678 ms to just 11.9 ms.
Robust Performance: NeuroFlow demonstrates exceptional resilience against varying scene dynamics, isolating semantic surprise while maintaining high processing efficacy amid complex backgrounds.

Repository Structure

The repository is organized into several key directories:

/core: Contains production-ready PyTorch classes for various NeuroFlow architectures and core functionalities.
/scripts: Offers evaluation and verification tools designed to test the performance of different gating architectures.
/paper: Includes LaTeX source files and the preprint of the associated research manuscript.
/weights: Provides instructions for downloading the necessary model weights.

Usage Examples

NeuroFlow enables users to leverage its different architectures seamlessly. Here are some usage examples:

Architecture C – Training-Free Inference

from transformers import AutoModel, AutoProcessor
from neuroflow_gate import NeuroFlowSiglipVisionArchC
import torch

base = AutoModel.from_pretrained("google/siglip-base-patch16-224")
processor = AutoProcessor.from_pretrained("google/siglip-base-patch16-224")

model = NeuroFlowSiglipVisionArchC(
    base.vision_model,
    threshold=0.35,
    ema_decay=0.01,
).cuda().eval()

for frame_pil in video_frames:
    inputs = processor(images=frame_pil, return_tensors="pt").to("cuda")
    embedding = model(inputs["pixel_values"])  # [1, 768]

model.reset()

Architecture B – Fine-Tuned High-Resolution Inference

from transformers import AutoModel, AutoProcessor
from neuroflow_gate import NeuroFlowSiglipVisionArchB
import torch

base = AutoModel.from_pretrained("google/siglip2-base-patch16-224")
model = NeuroFlowSiglipVisionArchB(
    base.vision_model,
    threshold=0.35,
    ema_decay=0.01,
).cuda().eval()

state = torch.load("nf_archb_siglip2.pth", map_location="cuda")
model.model.load_state_dict(state, strict=False)

model.reset()
for frame_pil in video_frames:
    inputs = processor(images=frame_pil, return_tensors="pt").to("cuda")
    embedding = model(inputs["pixel_values"])

Architecture A – MLP Gating

from neuroflow_gate import NeuroFlowSiglipVisionArchA

model = NeuroFlowSiglipVisionArchA(
    base.vision_model,
    threshold=0.15,
    ema_decay=0.01,
).cuda().eval()

Additional Resources

For detailed information regarding the architecture specifications, performance metrics, and installation notes, refer to the complete README within the repository. To stay updated with improvements and new features, it is advisable to watch the repository and keep track of the discussions within the community.

0 comments

No comments yet.

New comment