PVE+Tesla P4 vGPU All IN ONE NAS

I bought a recently popular Tesla P4, and allocated vGPU on the PVE All IN ONE NAS to the virtual machine for hard solution; After two days of trial and exploration, briefly summarize the common problems and usage plans.

polular science

Some simple terms, click to expand.

NVIDIA graphics card

NVIDIA's graphics cards can be simply divided into game graphics cards, professional graphics cards and data center graphics cards.

  • Most game graphics cards are GeForce series graphics cards, such as GeForce RTX 3070.
  • Most professional graphics cards are Quadro series and NVIDIA RTX series, such as Quadro RTX4000.
  • The graphics cards in the data center are chaotic, including the old K2, Tesla, and the new AXX series.

Generally, the three graphics cards of the same generation on the hardware are all of the same GPU architecture.

vGPU

One feature of the data center card is that it supports GPU virtualization, that is, a GPU is divided into multiple vGPUs for use by virtual machines, so that each virtual machine has a "unique" use.

VGPU authorization and verification

The vGPU of the N card is sold separately by hardware and guest side authorization, and the guest side authorization is divided into several different authorization according to the price.

For example, you bought a video card that supports vGPU and allocated it to four guests. Four guests need to purchase authorization to work properly. Authorization verification is specific to each virtual machine guest.

Guest authorization is divided into several price points, corresponding to A, B, C, Q type vGPUs. The functions supported by several vGPUs are different, which are reflected in cuda and graphics interface support, resolution, etc.

Among several vGPU types, the Q-type vGPU is the most widely used in application scenarios, and the authorization is vWS.

In the driver software installed by Guest, you need to fill in the legal authorization verification, otherwise the driver in Guest can only work for 20 minutes, and then the performance of the video card and the number of display frames will be forced to limit, making the application unable to run normally.

vGPU-unlock

From the hardware point of view, the GPU architecture of the data center graphics card and consumer graphics card of the same generation is the same, but the driver layer limits the supported features.

VGPU unlock bypasses the restrictions of the driver layer by pathing the host driver in the data center and changing the ID, so that ordinary consumer graphics cards can also support virtualization; At the same time, before and after the vGPU is started, hooks are performed to bypass specific checks and report false results, so that the vGPU of the virtual machine can start normally.

Hard solution

Hard decoding generally refers to the use of GPU and other devices to decode video streams in the video field. Since the GPU has a separate processing unit and optimization, the video decoding speed is very fast.

The hardware decoding/coding of the N card is generally called NVENC/NVDEC. When ffmpeg calls the hard solution, it depends on CUDA support.

preface

Since my NAS uses the D1581 without core display, and I want to try the jellyfin hard solution, I must build a GPU for the PVE virtual machine. At present, there are three solutions:

  • Ordinary graphics card passthrough (PVE GPU pass through)
  • Ordinary graphics card+vGPU unlock
  • Data center graphics card

A fish saw a Tesla P4 video card (8GB video memory, supporting all vGPU features) on sale at a low price. It sold for more than 400 years and came back with a fun attitude.

Appearance and size of graphics card

After you get it, use a ruler to measure the size of the graphics card for reference:

Note that this card is used in the data center cabinet. It does not have an active cooling fan and relies on the airflow of the server's ten thousand fan for cooling.

Heat dissipation transformation

The cabinet card does not have an active cooling fan and cannot be directly used in an ordinary chassis.

At present, some businesses on Taobao are selling MSI 1050 graphics card cooling shell, which costs 20 yuan and is about the same size as our Tesla P4. Many tutorials will recommend it.

I bought a piece of the shell, and the test conclusion is Not recommended The reasons for this cooling fan are as follows:

  • High noise, 12V+deceleration line noise is unacceptable, and the noise connected to 5V is still unacceptable
  • Small air volume under speed reduction or pressure reduction

In the light to moderate use scenarios, I recommend Dual 12CM/9CM silent fan+PCI-E fan rack In the scheme of, two fans blow the heat sink directly. I use two Limin 12CM fans connected to large D port 5V (step-down) for direct blowing, and the temperature during vGPU ffmpeg decoding does not exceed 60 degrees.

In the scene of heavy use, I recommend buying a 3D printed turbine frame+high-performance turbine fan.

Typical installation process

Tesla P4 is a pure data center graphics card. In principle, vGPU unlock is not required to use vGPU functions.

Briefly describe the typical installation process:

This process does not include vGPU unlock, which turns down.

Step 0: Shield the open source driver and install the dependent software package

Add the following lines to /etc/modules It is used to load the required kernel modules:

 vfio vfio_iommu_type1 vfio_pci vfio_virqfd

Shield open source drivers:

 echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf

Open IOMMU: omitted.

Install dependent packages:

 apt install build-essential dkms pve-headers mdevctl

If some software packages cannot be found/pulled, please add the pve source of Tsinghua University.

If the subsequent NVIDIA driver installation fails to find the corresponding version of pve headers/Linux headers, use the following command to install the matching version:

 apt install pve-headers-$(uname -r)

Restart the machine.

Step 1: Host server driver

Install the NVIDIA official host driver on the PVE host:

The driver of the NVIDIA official website is non-public, and you need to register a NVIDIA business account to access it; You can also choose to search the HOST driver package online (for example, find me)

 #Note: Linux version and driver have version dependency #Please select the driver version matching the host (azimiao friendly reminder) chmod +x NVIDIA-Linux-x86_64-510.108.03-vgpu-kvm.run ./NVIDIA-Linux-x86_64-510.108.03-vgpu-kvm.run

During the installation process, you may encounter some errors. If the package is missing, check step 0.

Step 2: Divide PCI-E devices to virtual machines (guests)

Assign PCI devices to the virtual machine through the PVE graphical interface. After selecting the graphics card device, select the vGPU mode at the MDev type.

In the MDev field here, the number in the last half of the field is the video storage capacity, and the letter is the vGPU type. Since the difference between A, B, C, and Q types of vGPU has been described previously, it will not be repeated here, and you can select the Q type.

Step 3: Guest installs the client driver

NVIDIA provides client drivers in each version of the driver package, which can be installed directly.

At this point, your vGPU is running.

Installation process with vGPU unlock

Direct reference to PolloLoco's NVIDIA vGPU Guide

Note that Tesla P4 natively supports vGPU and does not need to modify the host driver. His article has indicated the steps that need to be skipped. Look carefully.

Authorization issues

Well, our driver has been installed, can we start to use it? Not yet.

As mentioned above, vGPU authorization is verified on the client driver, and it can only be used for 20 minutes without authorization.

To solve the authorization problem, we have four ways:

  1. Apply for 90 day free license from NVIDIA, and purchase license after expiration
  2. Use vGPU masquerade to install the Quadro driver in the client (function provided by vGPU unlock)
  3. Restart NVIDIA driver at fixed time (vGPU_LicenseBypass)
  4. (New) Use the simulated authorization server collinwebdesigns/fastapi dls

I tried all four methods and derivations, and the results are as follows.

Method 1: Apply for Free License

After submitting the application, the NVIDIA Chinese staff called to ask for the company name, purpose and other information, and I replied "for personal entrepreneurship test". The staff replied "only for enterprise users" and hung up my phone directly.

Mode 2: vGPU camouflage

Use vGPU unlock to disguise the pci_device_id and pci id of the guest vGPU as the same generation Quadro graphics card. The guest installs the Quadro driver.

  1. Host
    • Official driver+vGPU unlock, use the latter to override the vGPU model, report the fake vGPU model to the virtual machine, and simulate it as Quadro
  2. Guest
    • (Temporarily) It can only be used in Windows virtual machines;
    • Only R510 branch Quadro drive (512, 513) can be used;
    • No time limit;
    • Can run 3D games;
    • CUDA cannot be used, resulting in the failure of ffmpeg hard solution.

Mode 3: Restart NVIDIA driver at a fixed time

The client automatically restarts the NVIDIA driver after installing vGPU_LicenseBypass for a period of time.

  1. Host
    • Official driver, no special modification is required.
  2. Guest
    • Complete functions, 24-hour high performance, and automatically restart the vGPU driver after 24 hours of scheduled tasks;
    • Only 14.1 vGPU drivers can be used, and 14.2 and above drivers cannot be used.

Mode 4: Use a simulated authorization server

  1. Host
    • Official driver.
  2. Guest
    • Official driver, full function, currently unlimited driver version.
  3. Authorization server (can be built in Host or any VM)
    DockerHub: collinwebdesigns/fastapi-dls

A simulated authorization server is established through the Docker image. The image encapsulates the authorization verification function. Other vGPU virtual machines can access the authorization server to obtain credentials.

Through practical verification, the simulation verification can support the latest 15.1 driver, and the scheme is nearly perfect. Of course, there is still a little flaw in this scheme, so I won't elaborate.

Please refer to the original author's DockerHub link for specific use problems. You are also welcome to join the group discussion.

Due to the strong demand of CUDA, I can only choose mode 3 or 4.

experience

Compared with the video card pass through, the vGPU experience is a higher level. If you want, you can allocate a 1G video card to all eight virtual machines.

But then again, vGPU not only charges you for hardware, but also charges you for software according to the number of virtual machines. It is also very annoying because the purchase is not open to individuals.

Since NVIDIA China does not provide authorized purchase to individual consumers, I can only try the Guest driver repeatedly through vGPU_LicenseBypass, which is also a helpless move.

Next door, AMD only charges hardware fees, not software fees. It sounds good, but AMD only has old cards that are cheap. Old cards not only have high power consumption, but also do not have modern GPU capabilities.

Zimiao haunting blog (azimiao. com) All rights reserved. Please note the link when reprinting: https://www.azimiao.com/9289.html
Welcome to the Zimiao haunting blog exchange group: three hundred and thirteen million seven hundred and thirty-two thousand

Comment

*

*

Comment area

  1. lonley 12-16 12:51 reply

    Why can't I recognize p4 in my pci

  2. fishensl 02-06 15:11 reply

    My vgpu license made by the Docker is licensed to Win10. It is currently used normally, but the 1q, 2q, 4q used for jellyfin transcoding is about 60fps... I don't know where the limit is