Saif Mahmoud, Ahmad Almasri
View original ↗Build a custom CUDA kernel or a specialized dispatch handler for Vision Transformers to bridge the performance gap left by standard variable-length attention APIs after token pruning.
Suggested repo: ragged-attention
"Stop wasting GPU cycles on pruned ViT sequences."
Estimated effort: 80h