Zijun Wang, Haoqin Tu, Weidong Zhou, Yiyang Zhou, Xiaohuan Zhou, Bingni Zhang, Weiguo Feng, Taifeng Wang, Cihang Xie, Fengze Liu
View original ↗Create an open-source implementation of Neuron-Activated Graph Ranking for data selection. This allows devs to curate high-quality pretraining data without the black-box opacity of traditional embedding-based similarity.
Suggested repo: nag-rank
"Data selection, unboxed: Use neuron activations to curate better models."
Estimated effort: 50h