I’ve tried to run a resnet50 on PyTorch MPS backend, while running Mac Pro with 6900XT, and achieved 23% utilization, while 3090 was running 10 times as fast on the same code. Do you have ideas on why is this happening, and how to further optimize things on Radeon GPU’s?
Our current Proto release is focused on functionality and we have not tuned the performance yet. Do look out for performance improvements in the PyTorch nightly builds in the upcoming months.
For this particular case, we would like to know:
- What’s the current PyTorch nightly you are using? Do update to latest and see if it still is giving bad utilization.
- Can you share the network code?
- Are there any operations falling back to the cpu? That hurts performance.
- What’s the OS is it 12.3/12.4 or Ventura?
Do file an issue on PyTorch on GitHub and tag it with "module:mps". Also send it to us through FeedbackAssistant.