Record the process of checking and configuring an old GPU server
in Tutorial with 0 comment
Record the process of checking and configuring an old GPU server
in Tutorial with 0 comment

background

The company's small computer room has a server with a video card inserted on it nvidia-smi You can see that:

 test-nvia-smi-2021-10-21

It can be seen that both the SMI version and the driver version are quite old. The most important thing is that there is no CUDA. Our goal is to reconfigure and try to be new. I have done it before, but I haven't recorded it. This time, I will write it down.

Check the graphics card

Tools are lshw nvidia-detect , experience both of them here, recommend lshw

lshw

 yum install -y lshw lshw -numeric -C display *-display description: 3D controller product: GK110BGL [Tesla K40c] [10DE:1024] vendor: NVIDIA Corporation [10DE] physical id: 0 bus info:  pci@0000 :03:00.0 version: a1 width: 64 bits clock: 33MHz capabilities: pm msi pciexpress bus_master cap_list configuration: driver=nvidia latency=0 resources: iomemory:21f0-21ef iomemory:21f0-21ef irq:40 memory:ca000000-caffffff memory:21fe0000000-21fefffffff memory:21ff0000000-21ff1ffffff

It tells us that the video card model is Tesla K40c

nvidia-detect

 yum install nvidia-detect

The installation process here is slow and patient. I use/usr/local/proxychains ng master/bin/proxychains 4 to speed up the installation process. The download speed has increased from 2kb/s to 22kb/s. Yes, the speed has increased 10 times, but it is still very slow.

After installation, execute, and the results are as follows

 [ root@original  ~]# nvidia-detect -v Probing for supported NVIDIA devices... [102b:0532] Matrox Electronics Systems Ltd. MGA G200eW WPCM450 [10de:1024] NVIDIA Corporation GK110BGL [Tesla K40c] This device requires the current 460.84 NVIDIA driver kmod-nvidia WARNING: Xorg log file /var/log/Xorg.0.log does not exist WARNING: Unable to determine Xorg ABI compatibility WARNING: The driver for this device does not support the current Xorg version

Update graphics card driver

Unload old drive

 yum remove -y nvidia* reboot

Find new drive

Enter link https://www.nvidia.com/Download/index.aspx?lang=en -us
Select the corresponding video card information. Here in CUDA, I select the latest 11.4 by default

 2021-10-21T12:48:32.png

Click search to download

 2021-10-21T12:49:00.png

Then get the download link
Download link: https://us.download.nvidia.com/tesla/470.57.02/NVIDIA-Linux-x86_64-470.57.02.run

 wget  https://us.download.nvidia.com/tesla/470.57.02/NVIDIA-Linux-x86_64-470.57.02.run

I downloaded it to the/root directory.

Shield nouveau

stay /lib/modprobe.d/dist-blacklist.conf , set nvidiafb Comment out

 #blacklist nvidiafb

Add the following configuration to the file:

 blacklist nouveau   options nouveau modeset=0

Install new drive

 Chmod a+x NVIDIA-Linux-x86_64-470.57.02. run # Add execution permission to the driver ./NVIDIA-Linux-x86_64-470.57.02.run -no-x-check -no-nouveau-check -no-opengl-files

Command interpretation:

The installation process is a graphical interface Install NVIDIA's 32-bit compatibility libraries Select No when

encounter Would you like to run the nvidia-xconfigutility to automatically update your x configuration so that the NVIDIA x driver will be used when you restart x? Any pre-existing x confile will be backed up. Select Yes when

inspect

 lspci |grep NVIDIA 03:00.0 3D controller: NVIDIA Corporation GK110BGL [Tesla K40c] (rev a1)
 nvidia-smi Thu Oct 21 18:29:45 2021 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.57.02     Driver Version: 470.57.02    CUDA Version: 11.4     | |-------------------------------+----------------------+----------------------+ | GPU   Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC | | Fan   Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. | |                               |                      |               MIG M. | |===============================+======================+======================| |   0   Tesla K40c          Off  | 00000000:03:00.0 Off |                    0 | | 23%   40C     P0    67W / 235W |      0MiB / 11441MiB |     98%      Default | |                               |                      |                  N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes:                                                                  | |   GPU   GI   CI        PID   Type   Process name                  GPU Memory | |         ID   ID                                                   Usage      | |=============================================================================| |   No running processes found                                                 | +-----------------------------------------------------------------------------+

See here, we have successfully installed, but we still need to restart again

 reboot

Install CUDA

download

To the official link https://developer.nvidia.com/cuda-toolkit-archive

 2021-10-21T16:31:20.png

Because CUDA Version: 11.4, 11.4.2 is selected here

 2021-10-21T16:31:41.png

Direct download and installation. The file is large and the download is slow. It is recommended to use an agent for acceleration

install

 wget  https://developer.download.nvidia.com/compute/cuda/11.4.2/local_installers/cuda_11.4.2_470.57.02_linux.run sudo sh cuda_11.4.2_470.57.02_linux.run

Graphics card

As you can see, our video card model is Tesla K40c, which is a relatively old video card. Here is a comparison link of 1080 vs K40c https://versus.com/cn/nvidia-geforce-gtx-1080-vs-nvidia-tesla-k40

 2021-10-21T16:32:41.png

We can see that the power consumption of K40c is higher than that of 1080, the transistor is 100w less, and the chip process is 28nm. What is better about K40c than 1080? That is, the video memory is 12GB, 4GB more than the 8GB of 1080, the memory bus width is 384bit, 128bit more than the 256bit of 1080, and the processing unit 2880 is 320 more than the 2560 of 1080. See the link for more information.
Although this card has been used for some years (released in 2013), it was also a very expensive card at that time, and its video memory was also large. Now, a treasure is just over 1000 years old, and its cost performance is very high.


Closing~ 👊

Responses