Transformers + Training

7.0

Implement a tree-structured sparse feed-forward layer for existing transformer models to replace dense MLP blocks. This enables significantly more efficient inference without sacrificing model capacity.

+0

emergingimplementation gap

trainingsparsitynlpllmtransformers

Signals (2)

arXiv8h ago

Dynamic sparsity in tree-structured feed-forward layers at scale

GitHub1d ago

Transformers + Training

Signals (2)

Dynamic sparsity in tree-structured feed-forward layers at scale

google-research/bert