Create an open-source dataset pipeline that extracts reasoning-heavy Q&A from biology literature. Use this to fine-tune a small model for biological reasoning benchmarks.