How to Run PyTorch with GPU on Mac Metal GPU

29 November 2024 / Programming, Mac OS

MacOS users with Apple's M-series chips can leverage PyTorch's GPU support through the Metal Performance Shaders (MPS) backend. This guide explains how to set up and optimize PyTorch to use your Mac's GPU for machine learning tasks.

Why Use Metal GPU with PyTorch?

Apple's Metal framework provides efficient and optimized GPU access for macOS. Leveraging Metal with PyTorch allows you to:

Use the unified memory architecture of the M-series chips for seamless CPU-GPU data transfer
Achieve significant speed-ups in training and inference
Provide optimized performance tailored for macOS hardware and software

Step 1: Prerequisites

Before enabling GPU support, ensure you have the following:

A Mac with an M-series chip
Python (version 3.8 or later)
A recent version of PyTorch (1.12 or later)

Install PyTorch with MPS Backend

To install PyTorch, run the following command:

pip install torch torchvision torchaudio

This installs PyTorch and its libraries for computer vision and audio.

Step 2: Verify GPU Availability

To check if PyTorch detects the MPS backend, execute the following script:

import torch

if torch.backends.mps.is_available():
    print("MPS backend is available.")
else:
    print("MPS backend is not available.")

Troubleshooting

If the MPS backend is unavailable:

Ensure you're running macOS Monterey 12.3 or later
Update PyTorch to the latest version
Verify that your Mac supports the Metal framework

Step 3: Configuring PyTorch to Use the GPU

To perform computations on the GPU, specify the mps device for your tensors and models.

Example Code

import torch

# Check device
device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
print(f"Using device: {device}")

# Example: Tensor operations
x = torch.rand(3, 3).to(device)
y = torch.rand(3, 3).to(device)
z = x + y
print(z)

# Example: Neural network
model = torch.nn.Linear(3, 1).to(device)
input_tensor = torch.rand(1, 3).to(device)
output = model(input_tensor)
print(output)

Step 4: Optimize Performance

The M-series GPU is designed for performance and energy efficiency. Follow these tips to maximize throughput:

Batch Processing: Increase batch sizes to utilize GPU capacity effectively.
Use Mixed Precision:

Convert tensors to torch.float16 for faster computation
Example: x = x.to(torch.float16).to("mps")

Monitor Activity: Use macOS Activity Monitor to track GPU usage

Known Limitations

Limited Feature Support: Not all CUDA-specific operations are supported. Check PyTorch's MPS documentation for compatibility details.
Memory Constraints: The M-series chips use shared memory. Training very large models may cause memory bottlenecks.
Debugging: Errors on the MPS backend can sometimes be less descriptive than CUDA errors.

Conclusion

PyTorch's MPS backend provides an excellent way to utilize the GPU capabilities of Apple's M-series chips. With this guide, you can set up and optimize your deep learning workloads to achieve faster training and inference.

Wei-Ming Thor

I create practical guides on Software Engineering, Machine Learning, and running local LLMs.

Creator of ApX Machine Learning Platform

Background

Full-stack engineer who builds web and mobile apps. Now, into Machine Learning & Large-Language Models Read more

Writing unmaintainable code since 2010.

Skills/Languages

Best: JavaScript, Python

Web development: HTML, CSS, Javascript, Vue.js, React.js

Mobile development: Android (Java, Kotlin), iOS (Swift), React Native

Back-end development: Node.js, Python, Ruby

Databases: MySQL, PostgreSQL, MongoDB, SQLite, LevelDB

Server: Ubuntu Server, Amazon Linux, ~~Windows Server~~, Nginx, Docker

Cloud service: Amazon Web Services (AWS)

Machine learning: Tensorflow, PyTorch, Keras, Scikit-Learn

Work

Engineering Manager

Location

Kuala Lumpur, Malaysia

Open Source

MyKad (NPM package)

Support

Turn coffee into coding guides. Buy me coffee