Arth Singh
View original ↗Build a red-teaming tool that exploits denoising irreversibility in diffusion LMs. This helps safety researchers harden their models against prompt-injection and adversarial re-masking.
Suggested repo: diff-hack
"Bypass safety filters in diffusion LMs with a two-step re-masking attack."
Estimated effort: 50h