arXiv13h ago

WAND: Windowed Attention and Knowledge Distillation for Efficient Autoregressive Text-to-Speech Models

Hanna Lee, Tan Dat Nguyen, Jaehoon Kang, Kyuhong Shim

View original ↗

Analysis

Viral velocity

low

Implementation gapYES

Novelty7/10

Categorypaper

Topics

inferencettsefficiency

Opportunity Brief

Implement the WAND architecture for autoregressive TTS to replace full-attention mechanisms with constant memory windowed-attention. This would significantly reduce the compute requirements for long-form speech generation.

Suggested repo: wand-tts

"Constant-memory autoregressive speech synthesis with windowed attention."

Estimated effort: 60h