GPU for ML only on Ubuntu

To go against the grain, you have to fight.
other
Author

Lara Thompson

Published

September 11, 2020

I fought with NVIDIA drivers taking over my graphics and xterm on Ubuntu 20.04. It was either have my GPU do everything and lose up to half its memory in the process… or nothing. I had it working once… then updates uninstalled the driver and broke everything again.

That was a few weeks ago. I’m confident now, a few updates later, that I’m using the correct driver and that settings are all good, so I’ll document what it took. Some may be unnecessary steps but since I got it working once and haven’t experimented further, I can’t say.

Now all should be working. To be sure, running nvidia-smi should show something like

Fri Sep 11 19:45:45 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.66       Driver Version: 450.66       CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 2070    Off  | 00000000:02:00.0 Off |                  N/A |
| 35%   47C    P0    22W / 175W |      0MiB /  7982MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

And, in e.g. a jupyter notebook, running tf.config.list_physical_devices('GPU') should output the

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

Eureka! All the memory for ML!