Develop an open-source framework that mimics Deep Extract's ability to interpret complex, non-standard document layouts using vision-language models. This would allow developers to process legacy PDF formats and sparse data tables that traditional OCR tools struggle with.