raunakchowdhuri
View original ↗Develop an open-source framework that mimics Deep Extract's ability to interpret complex, non-standard document layouts using vision-language models. This would allow developers to process legacy PDF formats and sparse data tables that traditional OCR tools struggle with.
Suggested repo: VisionExtract
"Turn messy, visual-heavy PDFs into clean JSON with vision-language models."
Estimated effort: 80h