Pufan Zeng, Yilun Liu, Mingchen Dai, Mengyao Piao, Chunguang Zhao, Lingqi Miao, Shimin Tao, Weibin Meng, Minggui He, Chenxin Liu, Zhenzhen Qin, Li Zhang, Hongxia Ma, Boxing Chen, Daimeng Wei
View original ↗Create an automated pipeline that uses geometric misalignment to identify and filter high-quality cultural seeds for LLM fine-tuning. This tool would significantly reduce the manual labor involved in creating culturally nuanced synthetic datasets.
Suggested repo: culture-seed
"Beyond bias: Unsupervised discovery of culturally grounded training data."
Estimated effort: 60h