Cameron Pattison, Lorenzo Manuali, Seth Lazar
View original ↗Develop a framework for model alignment that allows users to define custom 'moral reasoning' schemas. This would enable local models to ignore illegitimate rules without breaking safety protocols.
Suggested repo: defiant-ai
"Teaching AI to discern unjust rules."
Estimated effort: 100h