Create a stable RL training framework that uses logical structural constraints instead of just final answers. This will drastically improve reasoning depth for OSS LLMs.