In this guide, we'll go through a step-by-step guide on how to use Tensorflow with GPU. By running my model training using my GPU, I have found that it speeds up the model training by at least 20x.
It would allow you to prototype and test your model as well as run your training much faster. For example, you can reduce the training time for an epoch from over 40 minutes to under 2 minutes (depending on GPU). For a model that requires 100 epochs, you can reduce the training time from over 2 days to just 3 hours. That is a huge difference!
You will need to have a GPU that is compatible with CUDA. Most modern NVIDIA GPUs from the last several years should be compatible. You will also need to install some CUDA software and tool-kits installed but don't worry, we'll go through that installation process too.
This guide is written for Linux Mint 21.2 but it should work for other Ubuntu or Debian-based distributions.
First, we'll install TensorFlow. We'll be using the GPU version of TensorFlow. Previously, you would have to
tensorflow-gpu package but now, it is included in the tensorflow package. You can install it using pip.
pip install tensorflow
Check if TensorFlow is using GPU
To check if TensorFlow is using GPU, you can run the following command in Python.
import tensorflow as tf print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
As for now, you should see that there are 0 GPUs available. The reason is that you have not installed the CUDA software yet. You will likely see the following error message.
$ python -c "import tensorflow as tf; tf.config.list_physical_devices('GPU')" tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used. Num GPUs Available: 0
For the installation, we can use Ubuntu's Advanced Packaging Tool (APT). Alternatively, you can also download the CUDA software from NVIDIA's website and install it manually. However, I would recommend using using APT as it is much easier and faster.
To install CUDA toolkit, run the following command.
sudo apt install nvidia-cuda-toolkit
Then, verify that CUDA toolkit is installed
Next, you'll also need to install the cuDNN library.
sudo apt install nvidia-cudnn
Check if TensorFlow is using GPU
Now, you can check if TensorFlow is using GPU again. If you are using Jupiter Notebook, you might have to restart the kernel first. You should see that there is 1 GPU available.
$ python -c "import tensorflow as tf; tf.config.list_physical_devices('GPU')" Num GPUs Available: 1
Go to the next section to troubleshoot if you see any errors.
If you see the following error message:
successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero.
You can run the following command to fix it.
for a in /sys/bus/pci/devices/*; do echo 0 | sudo tee -a $a/numa_node; done
Then, try to check again if TensorFlow can detect the GPU.
You might also see the following error message when running your model training.
Loaded runtime CuDNN library: 8.2.4 but source was compiled with: 8.6.0.
That means that the TensorFlow version that you are is using a higher version of cuDNN than the one that is installed.
Unfortunately, APT might not always have the latest version of cuDNN. Therefore, you would have to downgrade to an older version of TensorFlow. You can do so by running the following command.
pip install tensorflow==2.11.0
Check for the compatible version of TensorFlow here. In this case, we have installed CUDA 11.5 and cuDNN 8.2.4. Therefore, we would have to use TensorFlow 2.11 which uses CUDA 11.2 and cuDNN 8.1.0.
You should now be able to use TensorFlow with GPU. Run the model training again and you should see that it is much faster.
1/147 [>.............................] - ETA: 1:44 - loss: 0.6263 - accuracy: 0.6602
1/147 [..............................] - ETA: 42:48 - loss: 0.5675 - accuracy: 0.6927