How To Fix: GPU throttling on NVIDIA cards
Hashcat OpenCL benchmark of WPA cracking, when GPU is still cold.
Running Hashcat multiple times, I saw something that I wasn’t happy about: some form of GPU throttling. It started with full blast aka GTX 1080Ti @ ~560kH/s and GTX 1060 @ 206kH/s and overall performance of ~1200kH/s down to about 800-900kH/s after just 10 minutes of workload.
So, what is happening? Thermal throttling? But why?
To answer that question, we have to look at fans and what are they actually doing: when GPU approach threshold of 80 deg Celsius, fan start to trail temperature, so it doesn’t get more than 85 deg C, but at the same time you see that soon after reaching 80 degrees, GPU clock start to swing downwards, resulting in slow down of Hashcat. What is even funnier: fans rarely go above 50%.
Quick look at the power meter and we have nearly 880W from the outlet. Loads of power goes into heat. There must be a way to reduce that…
What we can do is twofold:
- Force fans to run at higher speeds, thus reducing GPU temperatures and avoiding thermal throttling, albeit having a noisier environment.
- Reduce power output, so it doesn’t heat up that much, thus reducing total system power requirements.
1. Setting up the fan speed
Windows is easier, but Kali is Linux. Unfortunately, it’s not one press of a button solution. First things first:
To force fans to run at designated speed, we have to switch some options in NVIDIA driver, normally turned off. To do that on my Kali 2018.3, we have to add one line in /usr/share/X11/xorg.conf.d/20-nvidia.conf:
Section "Device" Identifier "MyGPU" Driver "nvidia" Option "Interactive" "0" Option "Coolbits" "28" EndSection
Line with “Coolbits” is the one you need to add to turn additional features on the NVIDIA X Server. Now, you have full access to fan settings:
As you can see 90% of fan speed reduced GPU temperature to 79 deg Celsius at 100% usage.
You can alternatively use the command line:
nvidia-settings -a [gpu:0]/GPUFanControlState=1 nvidia-settings -a [fan:0]/GPUTargetFanSpeed=100 nvidia-settings -a [gpu:1]/GPUFanControlState=1 nvidia-settings -a [fan:1]/GPUTargetFanSpeed=100 nvidia-settings -a [gpu:2]/GPUFanControlState=1 nvidia-settings -a [fan:2]/GPUTargetFanSpeed=100
Which will change fans speed to 100% on all my cards?
So, no more thermal throttling! If you look at the menu, you’ll find that we can increase GPU clock speed offset to healthy +200MHz without any problems, as well as -1000MHz for memory clock, to reduce a bit heat on them, without any implications.
Just because it says 2100MHz, it doesn’t mean you will get there, but there is some headroom for improvement just in case.
But… that 850W from the wall… Let us do something about it too.
2. Power envelope:
On Linux, we can’t really adjust the voltage on NVIDIA manually, luckily we can change power envelope these graphics card can utilize: nvidia-smi command is installed together with the driver, so we have access to it via command line. It has a few options that we are interested in:
nvidia-smi -i Target a specific Unit: 0, 1, 2 in my case nvidia-smi -pl Specifies maximum power management limit in watts. In my case GTX 1080Ti is 250W and GTX 1060 is 140W by default.
You will have to have an Administrator powers to change these options and they are there only until next system restart unless you use option: -pm, persistence mode, but I prefer to set up power limits before I’m using Hashcat, so when I boot up my system again, everything is back to normal.
nvidia-smi -pm 1 Set persistence mode: 0/DISABLED, 1/ENABLED
You can, of course, experiment with those options, but I spend a bit of time on it and for me, it works as follows:
- nvidia-smi -i 0 -pl 200 Power limit set to 200W on my first GTX 1080Ti
- nvidia-smi -i 1 -pl 200 Power limit set to 200W on my second GTX 1080Ti
- nvidia-smi -i 2 -pl 100 Power limit set to 100W on my third GTX 1060
You can open another command line and type
to get constant updates on how your GPUs are doing: fan speed, GPU temperature, load, etc.
So, as you can see, we shaved about 140W of graphics cards without losing too much of a hashing speed. The power meter is showing now 667W and is fluctuating +/-5W. To sums up, this is my Hashcat output to show you my gains:
As you can see, speed on 2x 1080Ti and 1x 1060 is not lowered as before, it runs full speed with cooler temperatures, not really throttling (maybe +/- 50MHz) and is working for the past 2 hours. I kinda got used to the fan noise, but I would not want to sleep next to them 😉
Your mileage may vary, so be aware of that. As usual, I do not take any responsibility for any damage to your system. Those things can end up destroying your card/computer etc, so you do modify them on your own behalf.