Training + Transformers

16.0

Build a modular drop-in replacement for standard MLP blocks that uses tree-structured routing. This provides a path for developers to reduce inference compute costs while maintaining performance without a dedicated router network.

+0

emergingimplementation gap

trainingnlpinferencetransformers

Signals (2)

arXiv10h ago

Dynamic sparsity in tree-structured feed-forward layers at scale

GitHub1d ago

Training + Transformers

Signals (2)

Dynamic sparsity in tree-structured feed-forward layers at scale

google-research/bert