hypedarhypedar
feedtrendsdiscovershowcasearchive
login
login
login
FeedTrendsDiscoverShowcaseArchiveDashboard
Submit Showcase

Trending now

Inference + Agents + Llm67Robotics + Rl + Agents58Math + Games56
View all trends →

hypedar

AI trend radar for developers. Catch emerging papers, repos, and discussions before the hype peaks.

AboutGitHubDiscord

By the makers of hypedar

Codepawl

Open-source tools for developers.

Explore our tools →
AboutPrivacyTermsX

© 2026 Codepawl

Built by Codepawl·© 2026

About·Terms·Privacy·Security

GitHub·Discord·X

feedtrendsdiscovershowcasearchive
← trends

Alignment + Fine Tuning + Safety

47.0

Develop an evaluation suite that tests for 'hidden' model constraints that survive fine-tuning. This tool would help researchers identify alignment artifacts in supposedly uncensored models.

+139
emergingimplementation gap
trainingdataalignmentfine-tuningsafety

Signals (5)

arXiv11h ago

SaFeR-Steer: Evolving Multi-Turn MLLMs via Synthetic Bootstrapping and Feedback Dynamics

arXiv11h ago

Subliminal Transfer of Unsafe Behaviors in AI Agent Distillation

arXiv1d ago

C-Mining: Unsupervised Discovery of Seeds for Cultural Data Synthesis via Geometric Misalignment

YHN17h ago

Even 'uncensored' models can't say what they want

arXiv11h ago

Shifting the Gradient: Understanding How Defensive Training Methods Protect Language Model Integrity