Bingbing Wen, Sirajul Salekin, Feiyang Kang, Bill Howe, Lucy Lu Wang, Javier Movellan, Manjot Bilkhu
View original ↗Develop a data-mixture optimization tool for multimodal LLM midtraining. This platform should visualize how different data ratios affect specific benchmarks and allow users to optimize their mixture recipes dynamically.
Suggested repo: mix-atlas
"Stop guessing your data ratios: architect your multimodal training mixture with data-aware benchmarks."
Estimated effort: 100h