Caption First, VQA Second: Knowledge Density, Not Task Format, Drives Multimodal Scaling | hypedar