Develop a token-level RLHF wrapper that optimizes prompt robustness to prevent failure on subtle input variations. This is critical for reliable agents.