Jaeik Kim, Woojin Kim, Jihwan Hong, Yejoon Lee, Sieun Hyeon, Mintaek Lim, Yunseok Han, Dogeun Kim, Hoeun Lee, Hyunggeun Kim, Jaeyoung Do
View original ↗Build a unified masked-diffusion architecture that handles text, audio, and visual modalities without separate decoders.
Suggested repo: dynin-diff
"One model to rule every modality: a pure masked-diffusion architecture."
Estimated effort: 150h