Yeonjun In, Wonjoong Kim, Sangwu Park, Chanyoung Park
View original ↗Create an open-source library for structured reasoning safety alignment. Instead of fine-tuning model weights alone, this tool should re-train the reasoning scratchpad patterns to inherently prevent harmful outputs.
Suggested repo: alt-train
"Hard-wire safety into your model's reasoning process."
Estimated effort: 120h