Mingjie Li, Wai Man Si, Michael Backes, Yang Zhang, Yisen Wang
View original ↗Develop a tool to audit and recover safety alignment mechanisms that are often 'bleached' during heavy CoT fine-tuning.
Suggested repo: safe-react
"Find and rescue the safety mechanisms lost in your fine-tuning run."
Estimated effort: 50h