Implement a middleware to optimize cross-GPU communication on non-NVIDIA hardware. Focus on making Torchcomms backends plug-and-play for AMD ROCm.
Suggested repo: comm-flex
"Unlock performance on AMD: unified communication backend for distributed training."
Estimated effort: 160h