How to Run PyTorch with GPU on Mac Metal GPU

MacOS users with Apple's M-series chips can leverage PyTorch's GPU support through the Metal Performance Shaders (MPS) backend. This guide explains how to set up and optimize PyTorch to use your Mac's GPU for machine learning tasks.

Why Use Metal GPU with PyTorch?

Apple's Metal framework provides efficient and optimized GPU access for macOS. Leveraging Metal with PyTorch allows you to:

  • Use the unified memory architecture of the M-series chips for seamless CPU-GPU data transfer
  • Achieve significant speed-ups in training and inference
  • Provide optimized performance tailored for macOS hardware and software

Step 1: Prerequisites

Before enabling GPU support, ensure you have the following:

  1. A Mac with an M-series chip
  2. Python (version 3.8 or later)
  3. A recent version of PyTorch (1.12 or later)

Install PyTorch with MPS Backend

To install PyTorch, run the following command:

pip install torch torchvision torchaudio

This installs PyTorch and its libraries for computer vision and audio.

Step 2: Verify GPU Availability

To check if PyTorch detects the MPS backend, execute the following script:

import torch

if torch.backends.mps.is_available():
    print("MPS backend is available.")
else:
    print("MPS backend is not available.")

Troubleshooting

If the MPS backend is unavailable:

  • Ensure you're running macOS Monterey 12.3 or later
  • Update PyTorch to the latest version
  • Verify that your Mac supports the Metal framework

Step 3: Configuring PyTorch to Use the GPU

To perform computations on the GPU, specify the mps device for your tensors and models.

Example Code

import torch

# Check device
device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
print(f"Using device: {device}")

# Example: Tensor operations
x = torch.rand(3, 3).to(device)
y = torch.rand(3, 3).to(device)
z = x + y
print(z)

# Example: Neural network
model = torch.nn.Linear(3, 1).to(device)
input_tensor = torch.rand(1, 3).to(device)
output = model(input_tensor)
print(output)

Step 4: Optimize Performance

The M-series GPU is designed for performance and energy efficiency. Follow these tips to maximize throughput:

  1. Batch Processing: Increase batch sizes to utilize GPU capacity effectively.
  2. Use Mixed Precision:
  • Convert tensors to torch.float16 for faster computation
  • Example: x = x.to(torch.float16).to("mps")
  1. Monitor Activity: Use macOS Activity Monitor to track GPU usage

Known Limitations

  1. Limited Feature Support: Not all CUDA-specific operations are supported. Check PyTorch's MPS documentation for compatibility details.
  2. Memory Constraints: The M-series chips use shared memory. Training very large models may cause memory bottlenecks.
  3. Debugging: Errors on the MPS backend can sometimes be less descriptive than CUDA errors.

Conclusion

PyTorch's MPS backend provides an excellent way to utilize the GPU capabilities of Apple's M-series chips. With this guide, you can set up and optimize your deep learning workloads to achieve faster training and inference.

Wei-Ming Thor

I create practical guides on Software Engineering, Data Science, and Machine Learning.

Background

Full-stack engineer who builds web and mobile apps. Now, exploring Machine Learning and Data Engineering. Read more

Writing unmaintainable code since 2010.

Skill/languages

Best: JavaScript, Python
Others: Android, iOS, C, React Native, Ruby, PHP

Work

Engineering Manager

Location

Kuala Lumpur, Malaysia

Open Source
Support

Turn coffee into coding guides. Buy me coffee