Create an open-source library that maps standard audio tags (like those in Gemini Flash) to SO-VITS-SVC or similar open-source voice models.
Suggested repo: express-tts
"Give your TTS models emotional control."
Estimated effort: 100h