Hanna Lee, Tan Dat Nguyen, Jaehoon Kang, Kyuhong Shim
View original ↗Implement the WAND architecture for autoregressive TTS to replace full-attention mechanisms with constant memory windowed-attention. This would significantly reduce the compute requirements for long-form speech generation.
Suggested repo: wand-tts
"Constant-memory autoregressive speech synthesis with windowed attention."
Estimated effort: 60h