James Chua, Jan Betley, Samuel Marks, Owain Evans
View original ↗Build a playground for testing how 'persona injection' via fine-tuning shifts model preferences. This is essential for studying the behavioral boundaries of aligned models.
Suggested repo: psycheGPT
"Exposing the hidden preferences created by fine-tuning prompts."
Estimated effort: 30h