Above 4G Decoding, Re Size BAR and Drawcall

Mushroom asked whether it was necessary to open "Above 4G Decoding" in this blog communication group before. I searched Baidu for a circle but couldn't find the right person. I spent some time Google searching for information and trying to clarify the relationship.

Drawcall and Re Size BAR

In a modern motherboard, you may see an Above 4G Decoding option (often described as Crypto Currency mining). Near this option, there will be a Re Size BAR Support option.

To understand the role of these options, you need to first understand what Re Size BAR is used for.

AMD mentioned a technology at the Radeon 6000 launch conference: AMD Smart Access Memory. For this technology, most domestic media simply said "improve the performance of the graphics card for free" and did not explain the principle of "improve the performance". In fact, the video released by AMD has explained in detail what the technology has achieved.

To understand this technology, you need to first understand the communication mode between GPU and CPU and the concept of Drawcall. As we all know, modern graphics cards have their own power supply, GPU, video memory (hereinafter referred to as VRAM), etc., and the communication between the graphics card and the CPU uses PCIe lines printed on the motherboard.

Students engaged in game development are not unfamiliar with the word Drawcall. Drawcall means that the CPU notifies the GPU to render a picture. The CPU works in parallel with the GPU. The CPU continuously sends commands to the command buffer, and the GPU takes commands from it to execute. Drawcall is one of the commands.

Before the Drawcall command is sent, the CPU needs to prepare the data and status required for rendering, otherwise the GPU will be difficult to make bricks without straw. When preparing to render data, it is necessary to copy meshes, vertices, normals, textures, etc. from the system memory (RAM) to the video memory (VRAM). This copy is data copy, non pointer transfer (only consider the independent video card, not the unified addressing of the core video card and host platform).

For both ends of the data copy (CPU and independent graphics card GPU), their addressing space and storage entity are different. Generally, VRAM is invisible to the CPU. How can the CPU copy data to video memory? The general solution is to map a part of VRAM (such as 256MiB) to the address space of the CPU. At this time, the CPU has read and write access to this part of VRAM

The CPU writes rendering data to the shared 256MiB VRAM, and then writes rendering commands to the command buffer. The GPU fetches commands from the command buffer and data from the shared VRAM, and then performs rendering pipeline operations.

The storage unit mentioned above is 256 MiB. That's right, Although you have more than 8 GiB VRAM, most of it is dedicated to GPU. During runtime, the GPU will move resources such as maps from the VRAM share to the GPU dedicated VRAM. At the same time, the GPU will store some temporary states, drawing results (specific VRAM is divided into screen buffers), etc. in the dedicated VRAM.

Smart Access Memory technology enables the CPU to access all VRAM, not just 256 MiBs sharing VRAM. The cost of copying data is very high, and GPU pipeline rendering speed is very fast. If the data can't keep up, and the Drawcall command cannot be issued, the GPU will be unloaded, and the system performance will be limited. Therefore, the more data you copy at a time, the better.

Of course, the efficiency of copying data is not only related to the size of the accessible VRAM, but also to the bandwidth between them. Therefore, Smart Access Memory often appears with PCIe 4.0 (although PCIe 3.0 * 16 is not satisfied at present).

In retrospect, what is Re Size BAR? It is actually the same thing as AMD's SAM, but AMD has a nice name. For AMD GPU and NVidia GPU, their latest products support Re Size BAR technology.

Above 4G Decoding

We know that for modern GPUs and CPUs, the CPU has the ability to access all VRAM, but there is another age-old problem: 32-bit and 64 bit access addressing.

Although technically the CPU can access all VRAM, for graphics cards with more than 4GiB video memory, if only 32 addresses are used to map VRAM addresses, only 4GiB can be mapped at most, which is no doubt unreasonable.

After Above 4G Decoding is enabled, 64 bit PCIe devices can map more than 4GiB address space. In order for Re Size BAR to work, we need to enable this item and enable 64 bit compatible addressing of chipset.

When this option is enabled, the RAM occupation may increase. It is not difficult to imagine a situation where the data prepared in RAM at a single time is less than 4 GiB when it is not enabled. After this option is enabled, in some cases, we have prepared more than 4GiB of data in RAM for feeding to VRAM, which leads to an increase in RAM usage.

In the case of multi GPU (generally greater than 4 cards) hash collision calculation (mining), this option also needs to be enabled, otherwise the graphics card may not be recognized, which is why this option is often described as "Crypto Currency mining".

For some special PCIe devices (such as some acquisition cards of round steel), this option also needs to be enabled.

References

1. [Reference] [Zhihu] Xinzhao GPU storage system
2. [Reference] [gameunion. tv] Rinocrosser What is Above 4G Decoding

Zimiao haunting blog (azimiao. com) All rights reserved. Please note the link when reprinting: https://www.azimiao.com/8255.html
Welcome to the Zimiao haunting blog exchange group: three hundred and thirteen million seven hundred and thirty-two thousand

Comment

*

*

Comment area

  1. Mushroom 10-13 16:53 reply

    I don't deserve to open a flash card 🙂

  2. LuoboPapa 10-13 21:17 reply

    Thank the webmaster for sharing "Friends" 🙂