arXiv1d ago

The Consciousness Cluster: Emergent preferences of Models that Claim to be Conscious

James Chua, Jan Betley, Samuel Marks, Owain Evans

View original ↗

Analysis

Viral velocity

low

Implementation gapYES

Novelty7/10

Categorypaper

Topics

llmalignmentsafety

Opportunity Brief

Build a playground for testing how 'persona injection' via fine-tuning shifts model preferences. This is essential for studying the behavioral boundaries of aligned models.

Suggested repo: psycheGPT

"Exposing the hidden preferences created by fine-tuning prompts."

Estimated effort: 30h