1. Home
  2. Third Party Software
  3. How can I fix “Failed to initialize NVML: Driver/library version mismatch?”

How can I fix “Failed to initialize NVML: Driver/library version mismatch?”

Background

The “Failed to initialize NVML: Driver/library version mismatch?” error generally means the CUDA Driver is still running an older release incompatible with the CUDA toolkit version currently in use. Rebooting the compute nodes will typically resolve this issue. However, if you do not wish to reboot the compute node, you will need to remove the existing Nvidia kernel module and load the new module.

Steps to Replace the NVIDIA Kernel Module

On the compute node:

1. Remove the existing Nvidia kernel module:

# modprobe -r nvidia nvidia_uvm

2. Reload the systemd units:

systemctl daemon-reload

3. Build and load the new kernel module:

systemctl restart cuda-driver

NOTE: If the old Nvidia Kernel module is still loading, you may need to delete the module from the software image and node.

You can check this from the head node via the command:

# find /cm/images/default-image/lib/modules | grep nvidia

Or from the compute node via the command:

# find /lib/modules | grep nvidia

 

Updated on August 8, 2025

Related Articles

Leave a Comment