The original meaning of cache isAccess speedCompared with general random accessMemory(RAM)A fast RAM. Generally speaking, it is not used like the main memory of the systemDRAMTechnology, while using expensive but fasterSRAMTechnology, also has the name of cache memory.
CachestorageIs present inMain storageAndCPUPrimary storage betweenMemory chip(SRAM) composition, small capacity butSpeed ratioThe main memory is much higher, close to the CPU speed.On the computerstorage system Ofhierarchical structureMedium, is betweena central processorandMain memoryHigh speed small capacity memory between.Together with the main memory, it forms a primary memory.The scheduling and transmission of information between cache memory and main memory is automatically carried out by hardware.
Cache memory is the primary memory between main memory and CPUMemory chip(SRAM)Composition, small capacity, butSpeed ratioThe main memory is much higher, close to the CPU speed.
address translationPart: Create a directory table to realize the conversion from main memory address to cache address.
Replace part: When the cache is full, replace the data block according to a certain strategy, and modify the address translation part.[1]
working principle
Announce
edit
[2]Cache memory is usually composed of high-speed memoryAssociative memory, Replacelogical circuitAnd correspondingControl lineform.In case of cache memorycomputer systemThe address of central processing unit accessing main memory is divided intoLine number, column number and intra group address.Therefore, the main memory is logically divided into several lines;Each line is divided into severalStorage unitGroup;Each group contains several or dozens of words.The high-speed memory is also correspondingly divided into storage unit groups of rows and columns.Both have the same number of columns and the same size of groups, but the number of rows in high-speed memory is much less than that in main memory.
Associative memory is used for address association and has a storage unit with the same number of rows and columns as high-speed memory.When a row storage unit group in a column of the main memory is called into an empty storage unit group in the same column of the high-speed memory, the storage unit in the location corresponding to the associative memory records the row number of the transferred storage unit group in the main memory.
When the central processor accesses the main memory, the hardware first automatically decodes the column number field of the access address, so as to compare all the row numbers of the associative memory column with the row number field of the access main memory address: if there are the same, it indicates that the main memory unit to be accessed has been in the high-speed memory, called hit, and the hardware will access theAddress mappingIs the address of high-speed memory and performs access operation;If they are all different, it means that the unit is not in the high-speed memory, which is called miss. The hardware will perform the operation of accessing the main memory and automatically transfer the main memory unit group where the unit is located to the empty storage unit group in the same column of high-speed memory, and at the same time, store the line number of the group in the main memory into the unit at the corresponding position of associative memory.
When there is a miss and there is no empty position in the corresponding column of high-speed memory, a group in the column will be eliminated to make room for the new incoming group, which is called replacement.The rule for determining replacement isReplacement algorithm, common replacement algorithms are:Least recently usedAlgorithm(LRU)、FIFO(FIFO)And random method (RAND), etc.The replacement logic circuit performs this function.In addition, in order to maintain the consistency of main memory and high-speed memory, hit and miss must be handled separately when writing to main memory.
Storage hierarchy
Main-Secondary depositStorage hierarchyBecause the computerMain storageThe capacity is always too small compared with the capacity required by the programmer. The transfer of programs and data from secondary storage to main storage is arranged by the programmer himself. The programmer must spend a lot of energy and time to divide the large program into blocks in advance, determine the location of these program blocks in secondary storage and the address to be loaded into main memory, and also arrange the program in advanceRuntimeHow and when the blocks are transferred in and out, so there arestorage space Ofassignment problem 。The formation and development of the operating system enables programmers to get rid of the address location between the main memory and the auxiliary memory as much as possible. At the same time, the "auxiliary hardware" supporting these functions has been formed. Through the combination of software and hardware, the main memory and the auxiliary memory are unified as a whole, as shown in the figure.At this time, a storage hierarchy is formed from primary storage and secondary storage, namelystorage system 。As a whole, its speed is close to the speed of main memory, and its capacity is close to the capacity of secondary memoryaverage priceIt is also close to the average price of cheap and slow secondary storage.With the continuous development and improvement of this system, it has gradually formed a widely usedVirtual storage system。In the system, the application programmer canMachine commandsAddress codeFor the whole programUnified addressing, just as the programmer has all the virtual memory space corresponding to the width of the address code.This space can be much larger than the actual main memory space, so that the entire program can be stored.This instruction address code is called virtual address (virtual memory addressVirtual address)OrLogical address, its correspondingstorage capacity It is called virtual storage capacity or virtual storage space;The address of the actual main memory is calledPhysical address, real (storage) address, whose corresponding storage capacity is called main storage capacityReal existenceCapacity or real (main) storage space
Primary secondary storage hierarchy
Primary secondary storage hierarchy
CACHE - Main memory storage hierarchy
When the virtual address is used to access main memory, the machine automatically converts it into main memory through auxiliary software and hardwareReal address。Check whether the unit content corresponding to this address has been loaded into the main memory. If it is in the main memory, it will be accessed. If it is not in the main memory, it will be transferred from the auxiliary memory to the main memory by the auxiliary software and hardware, and then accessed.These operations do not need to be arranged by programmers, that is, they are transparent to application engineers.The primary secondary memory level solves the contradiction between the requirement of large storage capacity and low cost.In terms of speed, the main memory andCPUKeep about oneOrder of magnitudeGap.Obviously, this gap limits the potential of CPU speed.In order to bridge this gap, a single memory using only one process is not feasible, and it must be further improved fromComputer system structureAnd organize research.Setting cache is the solutionAccess speedImportant methods.A cache memory is set between the CPU and main memory to form a cache main memory hierarchy. The cache is required to keep up with the CPU in speed.The address mapping and scheduling between Cache and main memory absorb the technology of the primary secondary memory storage layer that appeared earlier than it. The difference is that its speed requirements are high, and it is not realized by the combination of software and hardware but by hardware, as shown in the figure.
Cache - main memory hierarchy
Address mapping and transformation
Address mapping refers to the correspondence between the address of a data in memory and the address in buffer.Here are three address mapping methods.
1. Fully connected mode
Address mapping rule: any block of main memory can be mapped to any block in the Cache
(1) Main memory and cache are divided into the same sizedata block。
(2) A data block in main memory can be loaded into any space in the cache.If the number of cache blocks is Cb and the number of main memory blocks is Mb, there are Cb × Mb mapping relationships.
The directory table is stored in the associated memory, which includes three parts: the block address of the data block in the main memory, the block address after being stored in the cache, andSignificant bit(also called loading bit).Because it is fully associative, the capacity of the directory table should be the same as the number of cached blocks.
advantage:hit rateQuite high. Cache storage space utilization is high.
Disadvantages: When accessing the relevant memory, you have to compare it with the whole content every time. The speed is low and the cost is high, so there are few applications.
2. Direct connection mode
Address mapping rule: A block in main memory can only be mapped to a specific block in Cache.
(1) Main memory and cache are divided into data blocks of the same size.
(2) The main memory capacity should beCache capacityThe number of blocks in each area of main memory is equal to the total number of blocks in the cache.
(3) When a block in an area of main memory is stored in the cache, it can only be stored in a location with the same block number in the cache.
The data blocks with the same block number in each area of main memory can be respectively transferred to the address with the same block number in the cache, but at the same time, only the blocks in one area can be stored in the cache.Since the primary and cache block numbers are the same, only the area code of the transfer in block can be recorded during directory registration.The primary, cache block number and address within the block are identical.The directory table is stored in a high-speed small capacity memory, which includes two parts: the area code and the effective bit of the data block in the main memory.The capacity of the directory table is the same as the number of cached blocks.
Advantages: The address mapping method is simple. When accessing data, you only need to check whether the area codes are equal, so you can get faster access speed. The hardware device is simple.
(1) Main memory and cache are divided into blocks according to the same size.
(2) Main memory and cache are divided by the same sizeGroup。
(3) The main memory capacity is an integer multiple of the cache capacitybufferThe size of is divided into areasNumber of groupsSame number of groups as cached.
(4) When the data in main memory is transferred into the cache, the group numbers of main memory and cache should be the same, that is, a block in each area can only be stored in the space with the same group number in the cache, but the address of each block in the group can be stored at will, that is, the direct mapping method is used from the group in main memory to the group in the cache;Fully associative mapping is used in two corresponding groups.
The conversion of main memory address and cache address has two parts. Group address is accessed by address in direct mapping mode, while block address is accessed by content in full association mode.Group Associatedaddress translationThe components are also implemented using related memory.
Advantages: The collision probability of blocks is relatively low, the utilization rate of blocks is greatly improved, and the failure rate of blocks is significantly reduced.
Disadvantages: The implementation difficulty and cost are higher than that of direct mapping.
Replacement Policy
1. According toProcedural localityIt can be seen from the rule that programs always frequently use the recently used instructions and data in the running process.This provides a theoretical basis for the replacement strategy.Based on various factors such as hit rate, difficulty of implementation and speed, replacement strategies can include random method, first in first out method, least recently used method, etc.
(1) . Random method (RAND method)
The random method is to determine the replacement storage block randomly.Set onerandom numberThe generator determines the replacement block according to the generated random number.This method is simple and easy to implement, but its hit rate is relatively low.
(2) . First in, first out (FIFO) method
The first in, first out method is to select the first imported block to replace.When a block is called in first and hit many times, it is likely to be replaced first, so it does not conform to the local rule.The hit rate of this method is better than that of the random method, but it does not meet the requirements.The first in, first out method is easy to implement,
(3) . Least recently used method (LRU method)
The LRU method is based on the use of each block. It always selects the least recently used block to be replaced.This method well reflects the local law of the program.There are many ways to implement the LRU strategy.
2 In multi-bodyParallel storage systemSince the level of the I/O device's request to main memory is higher than that of the CPU's memory access, the CPU waits for the I/O device's memory access, causing the CPU to wait for a period of time, or even wait for several main memory cycles, thus reducing the CPU'swork efficiency。To avoid contention between CPU and I/O devices for memory access, addL1 cacheIn this way, the main memory can send the information that the CPU wants to fetch to the cache in advance. Once the main memory is exchanged with the I/O device, the CPU can read the required information from the cache directly, without having to wait empty to affect the efficiency.
3 The algorithms proposed at present can be divided into the following three categories (the first category is the key to master):
(1) The traditional replacement algorithm and its direct evolution are represented by the following algorithms: ① LRU (Least Recently Used) algorithm: replace the least recently used content from the Cache; ②LFU(Lease Frequently Used) algorithm:Number of visitsReplace Cache with the least content; ③If all the contents in the cache are cached on the same day, replace the largest document from the cache; otherwise, replace it according to the LRU algorithm. ④FIFO (First In First Out): FollowFirst in, first outIn principle, if the current cache is full, replace the one that entered the cache the earliest.
(2) The replacement algorithms based on the key features of cache content include: ① Size replacement algorithm: replace the largest content out of the Cache ② LRU-MIN replacement algorithm: this algorithm tries to minimize the number of replaced documents.Set the size of the document to be cached as S, and replace the document cached in the Cache with the size of at least S according to the LRU algorithm;If there is no object with the size of at least S, replace it from the document with the size of at least S/2 according to the LRU algorithm; ③LRU Threshold replacement algorithm: consistent with the LRU algorithm, but documents whose size exceeds a certain threshold cannot be cached; ④Lowest Latency First replacement algorithm: replace the document with the least access delay from the cache.
(3) Cost based replacement algorithm, which uses aCost functionEvaluate the objects in the cache, and finally determine the replacement object according to the generation value.The representative algorithms are: ① Hybrid algorithm: the algorithm assigns autility function , replace the object with the least utility with Cache; ②Lowest Relative Value algorithm: replace the object with the lowest utility value with the Cache; ③Least Normalized Cost Replacement (LCNR) algorithm: This algorithm uses aDocument AccessReasoning function of frequency, transmission time and size to determine the replacement document; ④Bolot et al. proposed a weight reasoning function based on the cost, size, and last access time of document transmission time to determine document replacement; ⑤Size - Adjust LRU (SLRU) algorithm: sort cached objects according to the ratio of cost to size, and select the object with the smallest ratio to replace it.
The capacity of cache memory is generally only a few hundred times of the main memory, but its access speed can match the central processing unit.according toPrinciple of program localityThe cells adjacent to a cell of the main memory being used are very likely to be used.Therefore, when the central processor accesses a unit of main memory, the computer hardware will automatically transfer the contents of the group of units including the unit into the cache memory. The main memory unit that the central processor is about to access is likely to be in the group of units that have just been transferred into the cache memory.Thus, the CPU can directly access the cache memory.ThroughoutProcessingIf most of the operations of the central processing unit accessing the main memory can be replaced by the access cache memory, the computer systemprocessing speed Can be significantly improved.[3]
Read hit rate
Announce
edit
[4]When the CPU finds useful data in the cache, it is called a hit. When there is no data required by the CPU in the cache (this is called a miss), the CPU accesses the memory.Theoretically, in a CPU with Level 2 Cache, read thehit rate80%.That is to say, the useful data found by the CPU from the L1Cache accounts for 80% of the total data, and the remaining 20% is fromL2CacheRead.Because the data to be executed cannot be accurately predicted, the hit rate of reading L2 is about 80% (16% of the total data is read from L2).Then the remaining data will have to be called from memory, but this is a relatively small proportion.In some high-end CPUs, we often hear about L3Cache. It is a cache designed for missing data after reading L2Cache. Among CPUs with L3Cache, only about 5% of data needs to be called from memory, which further improves CPU efficiency.
To ensure a high hit rate during CPU access, the contents of the cache should be replaced by a certain algorithm.A common algorithm is“Least recently usedAlgorithm "(LRUAlgorithm), which is to eliminate the least visited rows in the recent period.Therefore, you need to set one for each lineCounterThe LRU algorithm is to clear the counters of hit lines, and add 1 to the counters of other lines.Knock out row counters when replacement is requiredCount valueThe largest data row is out.This is an efficient and scientific algorithm. Its counter clearing process can eliminate some data that is no longer needed after frequent calls from the cache, and improve the cache'sUtilization。
CacheReplacement algorithmImpact on hit rate.When a new main memory block needs to be called into the Cache and itsspatial location When the cache is full again, the cache data needs to be replaced, which causes the replacement strategy (algorithm) problem.according toProcedural localityIt can be seen from the rule that programs always frequently use the recently used instructions and data in the running process.This provides a theoretical basis for the replacement strategy.The goal of the replacement algorithm is to maximize the hit rate of the Cache.Cache replacement algorithm affectsProxy Cachesystem performance A good cache replacement algorithm can produce a high hit rate.Common algorithms are as follows:
(1) Random method (RAND method) The random replacement algorithm usesRandom number generatorGenerate a block number to be replaced and replace the block. This algorithm is simple and easy to implement, and it does not consider the past, present and future use of the Cache block, but does not use the“historical information ”. Not based on the visitPrinciple of locality, so the hit rate of the cache cannot be improved, and the hit rate is low.
(2) FIFO(FIFOFirst In First Out (FIFO) algorithm.It is to replace the information block that first enters the cache.The FIFO algorithm determines the order of elimination according to the order of incoming cache, and selects the word block that was first imported into the cache for replacement. It does not need to record the usage of each word block, and is relatively easy to implement,System overheadSmall. Its disadvantage is that some program blocks that need to be used frequently (such as loop programs) may also be replaced as the first blocks to enter the cache. Moreover, it does not follow the locality principle of memory access, so it cannot improve the hit rate of the cache.Because the information that was first called in may be used later or often, such as a circular program.This method is simple and convenient. It uses the "historical information" in the main memory, but it cannot be said that the first one to enter will not be used frequently. Its disadvantage is that it cannot be reflected correctlyPrinciple of program locality, the hit rate is not high, and an exception may occur.
(3) LRU (Least Recently Used) algorithm.This method is to replace the information blocks in the cache that are used least recently.This algorithm is better thanFirst in first out algorithmBetter.However, this method cannot guarantee that it was not used in the past and will not be used in the future.The LRU method is based on the use of each block. It always selects the least recently used block to be replaced.Although this method better reflects the law of program locality, this replacement method needs to record the use of each block in the Cache at any time to determine which block is the least used recently.The LRU algorithm is relatively reasonable, but it is complex to implement and has high system overhead.It is usually necessary to set a hardware or software module called counter for each block to record its use.