I’ve spent more than a few minutes trying to run nvidia-smi on centos 7 virtual machines running on GCP (Google Cloud Platform). Checking a few nice instructions on how to run it properly like [1,2]. Getting iritated I told: “No way! It’s just standard DKMS” so I just have to follow a few standard steps as I did a number of times in my life.

Install CUDA on cloud VM

If execution of nvidia-smi only returns “NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver” this quick tutorial is definitivelly something for you.

Install nvidia packages

The best is probably to install those over network from nvidia repositories[1]. In my case CentOS 7 it was:

yum-config-manager --add-repo http://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.reposudo

yum clean allsudo

yum -y install nvidia-driver-latest-dkms cuda

Run DKMS to build the driver

In many places, you’ll find info that you have to reboot the VM. It can work, since DMKS may rebuild the drivers on reboot, but it’s not necessary and in case of cloud VM you’ll not see any error message from the process. Instead of reboot just make those two simple steps:

#dkms status

nvidia, 418.87.00: added

to verify that DKMS module is installed and run

#dmks autoinstall

Error! echo

Your kernel headers for kernel 3.10.0-957.27.2.el7.x86_64 cannot be found at

/lib/modules/3.10.0-957.27.2.el7.x86_64/build or /lib/modules/3.10.0-957.27.2.el7.x86_64/source.

to build the module. As you probabily noticed it failed in my case telling me that I don’t have kernel-headers installed, which may be confusing for some of you, since quick check rpm -qa | grep kernel-headers will show something opposite. In case of Centos 7, make sure that you have kernel-devel package installed for the running kernel. Use the following one-liner:

rpm -qa | grep kernel-devel-$(uname -r) || yum -y install kernel-devel-$(uname -r)

Now you should be ready to run: dkms autoinstall without an issue.

Load the module



# nvidia-modprobe

# echo $?

0

You should be able to successfully call nvidia-smi now, like on the picture below.



Let me know if this worked for you!

[1] https://devtalk.nvidia.com/default/topic/1000340/cuda-setup-and-installation/-quot-nvidia-smi-has-failed-because-it-couldn-t-communicate-with-the-nvidia-driver-quot-ubuntu-16-04/

[2] https://towardsdatascience.com/troubleshooting-gcp-cuda-nvidia-docker-and-keeping-it-running-d5c8b34b6a4c