arXiv9h ago

Dynamic sparsity in tree-structured feed-forward layers at scale

Reza Sedghi, Robin Schiewer, Anand Subramoney, David Kappel

View original ↗

Analysis

Viral velocity

low

Implementation gapYES

Novelty8/10

Categorypaper

Topics

inferencetransformersoptimization

Opportunity Brief

Develop a lightweight, modular library for tree-structured routing in transformer MLP blocks. This implementation should allow users to swap out standard dense layers for these sparse counterparts to save compute during inference.

Suggested repo: tree-mlp

"Conditional computation for transformers without the router overhead."

Estimated effort: 40h