Benjamin K. Johnson, Thomas Goralski, Ayush Semwal, Hui Shen, H. Josh Jang
View original ↗Build a streaming-optimized implementation of the Flash-SemiCRF inference engine. This would enable low-latency segmentation tasks without the memory overhead of tensor materialization.
Suggested repo: flash-crf
"Exact segment-level inference without the memory bloat."
Estimated effort: 40h