Yang Li, Zirui Zhang, Yang Liu, Chengzhi Mao
View original ↗Build a custom CUDA kernel or attention layer wrapper that allows parallel reasoning paths in an LLM to attend to each other's hidden states during inference.
Suggested repo: lace-attention
"Break reasoning out of its silo: let multiple LLM threads collaborate on the same problem."
Estimated effort: 80h