Collection
zero Useful+1
zero

L1 cache

Computer Academic Language
synonym L1 cache (Computer term) generally refers to L1 cache
Cache memory (Cache), referred to as fast memory, is set to solve the speed matching problem between CPU and main memory, [1] Because of the technical difficulty and high manufacturing cost of flash memory, multilevel flash memory is generally used in computer systems. L1 cache, or Level 1 Cache, or L1 Cache for short, is located next to the CPU core. It is the CPU cache most closely integrated with the CPU. Usually, access only takes a few cycles, usually tens of kilobytes in size.
Chinese name
L1 cache
Foreign name
Level 1 Cache
Alias
L1 cache
Discipline
computer
Purpose
Realize fast CPU access
Field
computer system

Program introduction

Announce
edit
Level 1 Cache (L1 Cache for short) is located next to the CPU core. It is the CPU cache that is most closely integrated with the CPU, and it is also the earliest CPU cache in history. Because the first level cache has the highest technical difficulty and manufacturing cost, the increase in technical difficulty and cost caused by increasing capacity is very large, but the resulting performance improvement is not obvious, the cost performance ratio is very low, and the existing first level cache hit rate is already very high, so the first level cache has the smallest capacity of all caches, much smaller than the second level cache. In general, the first level cache can be divided into the first level data cache (D-Cache) and the first level instruction cache (I-Cache). They are used to store data and decode the instructions that execute these data in real time. The L1 data cache and L1 instruction cache of most CPUs have the same capacity. For example, AMD's Athlon XP has 64KB L1 data cache and 64KB L1 instruction cache. The L1 cache is represented by 64KB, and the L1 cache of other CPUs is represented by the same method. The theoretical basis for L1 cache implementation is the program Principle of locality

CPU cache

Announce
edit
In a computer system, the CPU cache is a component used to reduce the average time required by the processor to access memory. In the pyramidal storage system, it is located in the second layer from top to bottom, second only to CPU registers. Its capacity is far less than memory, but its speed can be close to the processor frequency. When the processor makes a memory access request, it will first check whether there is request data in the cache. If there is (hit), the data will be returned directly without accessing the memory; If it does not exist (invalid), the corresponding data in the memory should be loaded into the cache first, and then returned to the processor.
The cache is effective mainly because the program's access to memory presents the characteristics of Locality. This locality includes both spatial locality and temporal locality. Using this locality effectively, the cache can achieve a very high hit rate.
From the processor's perspective, the cache is a transparent component. As a result, programmers usually cannot directly interfere with cache operations. However, it is true that specific optimization can be implemented for program code according to the characteristics of cache, so as to make better use of cache.

Program principle

Announce
edit
As early as 1968, Denning. P pointed out that the program will show a local law when it is executed, that is, in a relatively short time, the execution of the program is limited to a certain part; Accordingly, the storage space it accesses is also limited to a certain area. He made the following points:
(1) program When executing, except for a few transfer and procedure call instructions, it is still executed sequentially in most cases. This argument has also been confirmed in the research of many later scholars on high-level programming languages (such as FORTRAN language, PASCAL language) and the laws of C language.
(2) The procedure call will make the execution track of the program transfer from one area to another. However, the research shows that the depth of the procedure call does not exceed 5 in most cases. This means that the program will run within the scope of these processes for a period of time.
(3) There are many loop structures in the program. Although these are composed of only a few instructions, they will be executed many times.
(4) The program also includes a lot of data structure processing, such as array operations, which are often limited to a very small range.
The limitations are also shown in the following two aspects:
(1) Time limitations. If an instruction in the program is executed, it may be executed again in the near future; If a data has been accessed, it may be accessed again in the near future. The typical reason for time limitation is that there are a lot of cyclic operations in the program.
(2) Space limitations. Once a program accesses a storage unit, the nearby storage units will also be accessed in the near future, that is, the addresses accessed by the program in a period of time may be concentrated in a certain range, and the typical situation is the sequential execution of the program [2]

Read hit rate

Announce
edit
When the CPU finds useful data in the cache, it is called a hit. When there is no data required by the CPU in the cache (this is called a miss), the CPU accesses the memory. Theoretically, in a CPU with a level 2 cache, the hit rate of reading L1 cache is 80%. That is to say, the useful data found by the CPU from the L1 Cache accounts for 80% of the total data, and the remaining 20% is read from the L2 Cache. In some high-end CPUs (such as Intel's Itanium), we often hear about L3 Cache, which is designed for miss data after reading L2 Cache.
In order to ensure a high hit rate during CPU access, the contents of the cache should be replaced by a certain algorithm. Its counter clearing process can eliminate some data that is no longer needed after frequent calls from the cache, and improve the utilization of the cache. The development of cache technology.

Write operation

Announce
edit
Write back policy
In order to maintain data consistency with lower level storage (such as memory), data updates must be timely propagated. This propagation is accomplished by writing back. There are generally two write back strategies: Write back and Write through.
Writeback means that a cache block is written to memory only when it needs to be replaced. If the cache hits, the memory is never updated. In order to reduce memory write operations, the cache block usually has a dirty bit to identify whether the block has been updated since it was loaded. If a cache block has never been written before it is replaced back into memory, the write back operation can be avoided. The advantage of writeback is that it saves a lot of write operations. This is mainly because the update of different cells in a data block can be completed only by one write operation. This saving in memory bandwidth further reduces energy consumption, so it is suitable for embedded systems.
Write through means that every time the cache receives a write data instruction, it directly writes the data back to memory. If this data address is also in the cache, the cache must be updated at the same time. Since this design will cause a large number of write memory operations, it is necessary to set a buffer to reduce hardware conflicts. This buffer is called a Write buffer, and usually does not exceed 4 cache blocks. However, for the same purpose, write buffers can also be used for writeback caching. Writethrough is easier to implement than writeback, and it is easier to maintain data consistency.
Allocate by write and not by write
When a write failure occurs, the cache can have two processing strategies, called Write allocate and No write allocate. Allocate by Write means that, as with read invalidation, the required data is first read into the cache, and then written to the read unit. If not allocated by write, data is always written back to memory directly. Any combination of write back policy and allocation policy can be used when designing cache. For different combinations, the behaviors of data write operations are different.

replacement strategies

Announce
edit
For group associated cache, when all cache blocks of a group are full, if cache invalidation occurs again, you must select a cache block to replace it. There are multiple strategies to determine which block is replaced.
Obviously, the best replacement block should be the one that is the latest to be accessed next time. This ideal strategy cannot be really implemented, but it provides a direction for designing other strategies.
The first in, first out (FIFO) algorithm replaces the cache block that has been in the group for the longest time. The longest unused algorithm (LRU) tracks the usage of each cache block and compares which block has not been accessed for the longest time according to statistics. For more than two connections, the time cost of this algorithm will be very high.
An approximation of the longest unused algorithm is non recent use (NMRU). This algorithm only records which cache block has been used recently. During replacement, any other block will be replaced randomly. Therefore, it is not recently used. Compared with LRU, this algorithm only requires hardware to add a use bit to each cache block. Alternatively, a purely random substitution method may be used. Tests show that the performance of complete random replacement is similar to that of LRU [3]