L3 cache

A cache designed for miss data after reading L2 cache
Collection
zero Useful+1
zero
The L3 cache is for misses after reading the L2 cache data Designed cache , in the CPU Only about 5% of the data needs to be called from memory, which further improves the efficiency of the CPU. Its operation principle is to use faster Storage The device keeps a copy of the data read from the slow storage device and copies it. When it is necessary to read and write data from the slower storage, the cache can make the read and write action first fast device This will make the system respond faster.
Chinese name
L3 cache
Foreign name
Cache
Classification
External and internal
Principle
Keep a copy of the data read from the slow storage device

Cache Introduction

Announce
edit
When the CPU reads data, it first looks for the cache file, and then automatically reads it, and then inputs it to the CPU for processing. Of course, if the corresponding cache file is not found in the cache, it will be read from memory and transferred to the CPU for processing. Of course, this will take some time, so it will be very slow. After processing by the CPU, the data block where the data is located will be saved in the cache file quickly, so that the data can be read directly in the cache in the future, instead of repeatedly calling and reading the data in memory.
Cache size is also one of the important indicators of CPU, and the structure and size of cache have a great impact on the CPU speed. The cache in CPU runs very frequently, usually at the same frequency as the processor, and the work efficiency is far greater than the system memory and hard disk. In actual work, the CPU often needs to read the same data block repeatedly, and the increase of cache capacity can greatly improve the hit rate of read data inside the CPU, instead of looking in memory or hard disk, so as to improve system performance. But considering the CPU chip area and cost, the cache is very small.

Cache Classification

Announce
edit
The first level cache is built into the CPU and runs at the same speed as the CPU, which can effectively improve the operating efficiency of the CPU. The larger the L1 cache is, the more efficient the CPU is. However, due to the limitation of the internal structure of the CPU, the capacity of the L1 cache is very small.
The L2 cache is used to coordinate the speed between the L1 cache and memory. The CPU call cache is the first level cache. When the speed of the processor gradually increases, the first level cache will be in short supply, so it has to be upgraded to the second level cache. The second level cache is slower than the first level cache, but it has more space than the first level cache. It is mainly used for temporary data exchange between L1 cache and memory.
The L3 cache is designed for data missed after reading the L2 cache. Among CPUs with L3 cache, only about 5% of the data needs to be called from memory, which further improves the efficiency of the CPU. Its operation principle is to use a faster storage device to retain a copy of the data read from the slower storage device and copy it. When it is necessary to read or write data from the slower storage, the cache can make the read/write action complete on the faster device first, which will make the system respond faster.

L3 Cache Classification

Announce
edit
AMD FX 8150CPU has 8M three-level cache
Cache (three-level cache) is divided into two types. In the early stage, it is external, and in the future, it will be upgraded built-in Of. Its practical effect is that the application of L3 cache can further reduce Memory latency , while improving the calculation of large amount of data processor Performance. Reducing memory latency and improving the computing power of large amounts of data are very helpful for game software. However, the performance of adding L3 cache in the server field is still significantly improved. If there is a large L3 cache physical memory It is more efficient, so its slower disk I/O subsystem can handle more data requests. Processors with larger L3 caches provide more efficient file system caching behavior and shorter message and processor queue lengths.
In fact, the earliest L3 cache is used in AMD K6-III released processor At that time, the L3 cache was limited by the manufacturing process and was not integrated into the chip, but on the motherboard. You can only System bus frequency The synchronized L3 cache is not much different from the main memory. Later, L3 cache was used Intel Launched for the server market Itanium Processor. Then came P4EE and xeon MP. Intel also plans to introduce an Itanium2 processor with 9MB L3 cache and a dual core Itanium2 processor with 24MB L3 cache in the future.
But basically L3 cache pair processor The performance improvement of is not very important. For example, Xeon MP processors equipped with 1MB L3 cache are still not competitors of Opteron. It can be seen that the increase of front-end bus will bring more effective performance improvement than the increase of cache.

L2 Cache Price

Announce
edit
First of all, let's calculate a small account (February 2009) about the second level cache of Intel processors:
The second level cache 512K Celeron dual core E1200 only costs 270 yuan, and the second level cache 1M Pentium dual core The E2140 sells for 370 yuan. It costs 100 yuan to purchase the additional 512K cache; The price of the Core 2 E4300 or Pentium dual core E5200 with a 2M L2 cache is more than 550 yuan, which means that you have to pay another 200 yuan for this additional 1M L2 cache; The Core 2E7200, a 3M second level cache, sells for 750 yuan, and has to spend 200 yuan to buy this additional 1M second level cache; Core 2 series processors with L2 cache 4M/6M, and so on
No matter Core 2, Pentium dual core or Celeron dual core, their core architecture is exactly the same, and the frequency can be changed at will. The only difference is the second level cache. It is no exaggeration to say that Intel is selling the second level cache, which costs 200 yuan and 1M.
In fact, over the years, Intel has divided its product line by the size of the second level cache. At the beginning, there were only two specifications: Pentium and Celeron. In the Core 2 era, Intel reached the peak: just dual core products have up to six versions of 512K, 1M, 2M, 3M, 4M, and 6M, and four core products also have four versions of 4M, 6M, 8M, and 12M, dazzling! Intel's strategy of subdividing product lines provides excellent products at each price point, but it also creates unprecedented confusion for users: how large is the L2 cache?

L2 Cache Development

Announce
edit
Throughout the development of Intel processors, and no matter how the core architecture changes, the L2 cache grows exponentially is the most intuitive. Willamette with 0.18 μ m technology in Pentium 4 era has 256K L2 cache, 0.13 μ m Northwood Core With 512K, the Prescott of 0.09 μ m in the later stage once increased to 1M. In the era of Core, while the architecture has undergone earth shaking changes, the 65nm technology has doubled the secondary cache again. Even the Allendale core, the representative of low-end Core when it was first launched, the secondary cache has reached 2M, and the high-end Core has 4M secondary cache. After entering the 45nm process, the capacity of the second level cache has further increased. The high-end E8X00 series second level cache has reached an astonishing 6M, and the low-end E7X00 has also reached 3M. So far, Intel has realized the "seamless connection" of the second level cache from 512K to 6M or even 12M.
There is no permanent laggard in the market. When AMD enters the 45nm era and the Phenom II comes, AMD can also design CPUs that target different markets by matching the number of cores and caches.

Impact on performance

Announce
edit
The impact of L3 cache on performance is sometimes high and sometimes low. In terms of games, improving the capacity of the three-level cache has a great impact on the performance of the game. Although it is not useful for ordinary home computers, if you increase the capacity of the three-level cache of an Internet cafe or a computer, there will be a significant performance improvement. Although the L3 cache can also bring significant performance improvement to the PC, after all, the L3 cache works on the server. For the PC, the L3 cache can only serve as an auxiliary function. With the same other parameters, the larger the capacity of the L3 cache, the better the performance. If the other parameters are different, the role of the L3 cache is not obvious.
No matter how important the role of L3 cache is, it is also one of the parameters that contribute to the development of computers.

development history

Announce
edit
The earliest L3 cache was applied to the K6-III processor released by AMD. At that time, the L3 cache was limited by the manufacturing process and was not integrated into the chip, but on the motherboard. The L3 cache, which can only synchronize with the system bus frequency, is not much different from the main memory. Later, the Itanium processor introduced by Intel for the server market was used for L3 caching. Then came P4EE and Xeon MP. Intel also plans to introduce an Itanium 2 processor with 9MBL3 cache and a dual core Itanium 2 processor with 24MBL3 cache in the future.
But basically, L3 cache is not very important for improving the performance of processors. For example, XeonMP processors equipped with 1MBL3 cache are still not competitors of Opteron. It can be seen that the increase of front-end bus will bring more effective performance improvement than simply increasing cache.
Viewing Level 3 Cache from Hardware System Configuration [1]
Overview of AMD Phenom processor The development of the core architecture is relatively limited, and the change of the three-level cache is the most intuitive. From the 2MB three-level cache of the early Phenom to the 6MB three-level cache of the Phenom II, Athlon X4 with the Phenom II architecture but no three-level cache was introduced for the needs of the market. and AMD Dual core Three cores Quad core Architecture, plus different combinations of caches, amd processors Presents Mending prospers The scene of.
For AMD users, most are price performance oriented users, while AMD processor This phenomenon is both good and bad for users. The good thing is that users have more choices, but the bad thing is that users are at a loss for CPU choices. On closer observation, the phenomenon of AMD processor Bailong competing for favor is nothing more than the result of the combination of core and cache.
Compared with 65nm Phenom, the new generation 45nm The biggest change of Phenom II is that it has upgraded to 45nm SOI immersion photolithography production process, which has the advantages of higher main frequency, lower power consumption Integration Higher, especially the L3 cache has soared from 2MB to 6MB!
The L3 cache has tripled, which naturally requires a certain price. By comparing the schematic diagrams of Phenom and Phenom II chips, we can know something about it:
Brcelona/Agena has integrated 468 million transistors, and the core area (Die) is about 285 square millimeters. Shanghai/Xeneb's transistors have increased by 62%, up to 758 million, but the core area has decreased by 9.5%, only 258 square millimeters. The benefits of the new process can be seen.
The increase in the number of transistors is mainly due to the large number of L3 caches Capacity expansion The proportion of this part in the whole core has also increased from about one sixth to one third.

other

Announce
edit
AMD's Attitude towards L3 Cache
first , the L3 cache capacity is The server The role of the domain is more obvious, but if the server and desktop processor Different architectures will inevitably increase the production difficulty and cost, so they are brought to the desktop;
second On the desktop, the increase of the three-level cache from 2MB to 6MB can bring about a performance improvement of about 5%, which is also proved by the actual test;
third From the previous data, we can see that the three-level cache has tripled, but thanks to the improvement of the production process, the core area is smaller and the cost is lower.
Familiar with Intel Nehalem Core i7( Core i7 )The processor people must have thought that Intel also uses the same high-capacity shared three-level cache design, with a capacity of up to 8MB, which also occupies the entire core area one-third The difference between left and right is that each core of Core i7 L1 cache and L2 Cache Only 64KB and 256KB are less than half of the Phenom/Phenom II.
Interestingly, it is also based on 45nm The core i7 of the process integrates 731 million transistors, slightly less than the Phenom II, but the core area is slightly larger, 263 square millimeters.
From the perspective of cost, we can see from the schematic diagram of the chip structure of Phenom II X4 that the chip area of the three-level cache is more than that of the two cores and L1L2 combined shield The Phanom II X3, which has a core, is not low in cost. For AMD, which focuses on cost performance, the profit loss will be relatively large.
Therefore, AMD is releasing Phenom II X4 and X3 processor Later, they are also actively preparing to position mainstream middle and low-end products to replace the Athlon 64 X2 series that have been in the war for many years. Because of the high cost of L3, AMD completely deleted the three-level cache of the Phenom II X4 (note that it is not shielded), and Athlon X4 was introduced to everyone.
In this way, people can easily understand how much the 6M L3 contributes to the performance of AMD's Phenom II architecture processors through comparative evaluation, and can also know in advance whether the Phenom II X3 with a complete L3 but one less core is strong, or the Athlon X4 with four cores but no L3? I believe many friends will be very interested.
AMD's listed Phenom II 920 (6M L3) and Phenom 9850 (2M L3), as well as a mysterious project sample of Athlon X4 without L3, let them all work at the frequency of 200 * 14=2.8GHz, so that we can intuitively compare the performance difference caused by the 6M/2M/0M three-level cache.
In addition, the newly released Phenom II X3 720 processor It has a complete 6M three-level cache, but one core is missing. This reflects whether more cores contribute more or 6M L3 contributes more? The test results show that, from the perspective of CPU architecture, caching has a great impact on performance, but the performance of Athlon X4, especially in the process of a large number of operations, is stronger than the previous generation's 9850, which has a complete level 3 cache, Memory bandwidth Its advantages are self-evident.
Intel 6 (16MB L3 cache) core processor
The high-end desktop PC processor brand Core i7 and Nehalem EP for the high-efficient server market are the first to be introduced to the market. It is expected to be available in the fourth quarter of 2009. Subsequently, new architecture products will be launched successively, including Nehalem EX for the scalable server market, Havendale and Lynnfield for the desktop market, and Auburndale and Clarksfield for the mobile market, which are expected to debut in the second half of 2009.
Next generation Core microarchitecture (Nehalem) processors all jump from four cores, but they also use Hyper Threading technology, which can process eight threads at the same time. Core i7 supports Turbo Mode and Power Gates technology, which can completely shut down idle cores when multithreading operation is not required. Each core can work under different voltages/frequencies. The Turbo Mode mode with a single core frequency raised can significantly improve the performance of single threaded applications.
Intel also released the first 6-core processor, the "Dunnington" Xeon X7460 for the multi-channel server market, with a built-in 16MB L3 cache, which was launched in September 2008. It is Intel's last 45nm Core 2 microarchitecture processor before switching to Nehalem microarchitecture. The server model using this processor has broken many world records, including the 8-way 48 core IBM System x3950 M2 server breaking 1 million tpmC for the first time in the TPC Benchmark C database test, the 4-way system HP Proliant DL580 G5 breaking the TPC-C record, and the Dell PowerEdge R900 breaking the TPC-E record, Sun Fire X4450 broke the SPECjbb 2005 record, and Fujitsu Siemens PRIMERGY RX600 S4 broke the SPECint_rate2006 record. [1]
AMD 45nm processor without L3 cache
After technological progress and technology improvement, AMD's new 45nm desktop processor will have a complete set of five sub series, of which the brand of Phenom II will have a three-level cache, and the brand of Athlon will still be used if the three-level cache is reduced (I don't know why it is not Athlon II). As for Sempron, it will be eliminated soon.
The most high-end "Phenom II X4 900/800" series four core code is Deneb, the second level cache is 4 × 512KB, and the third level cache is 6MB in the former and 4MB in the latter. These two series will be first released in January next year. The first two models of the Phenom II X4 940/920 with AM2+interface will be launched at CES 2009 on January 8, and all the later models will use AM3 interface.
The three core series "Phenom II X3700" is code named Heka. The second level cache is 3 × 512KB, and the third level cache is a complete 6MB, which will be followed up in February next year.
In addition to these two series, AMD has also prepared a version without L3 cache. In addition, it did not add L3 cache at the beginning of the design, instead of simply masking the version, which will not cause waste.
The four cores will be the "Athlon X4 600" series, code named Propus, and the second level cache will be 4 × 512KB. The three cores will be the "Athlon X3400" series, code named Rana, and the second level cache will be 3 × 512KB. Both will be launched in April next year.
Finally, the "Athlon X2200" series dual core, code named Regor, has a 2 × 1MB second level cache, which is twice as much as other series. However, it is also released at the latest, and will not be launched until next June, in order to avoid conflicts with existing old architecture models.
In terms of chipsets, the current AMD 7 series motherboards can support new processors when updating BIOS, depending on the motherboard manufacturer's technical support. In addition, AMD will launch a new 8 series in the first half of 2009 Integrated chipset RS880 and RS880C are matched with SB750 South Bridge, and RS880D and RS890 are matched with the next generation SB800. [2]
Who is more important in Level 1, Level 2 and Level 3 cache
The first level is the most important, but the current CPU L1 cache Almost the same, so ignore it.
L2 Cache It is very important for Intel's CPU. The larger the L2 cache size of Intel's CPU is, the more significant the performance improvement is. Although the L2 cache size of AMD's CPU is also important, the L2 cache size is not very significant for AMD's CPU performance improvement.
In fact, the three-level cache only serves as an auxiliary function. In addition to the server, it is useless for most home computers. Memory is still very important. However, if you run large programs or games, the three-level cache is important. At present, the new CPU has three-level cache.
Therefore, in addition to the frequency, there is also a core number to measure the CPU performance, and then the size of the cache. Theoretically, the larger the L2 cache is processor Better performance, but that doesn't mean L2 cache capacity A doubling of the processor will result in a doubling of the performance. In 2006, most of the data processed by the CPU was 0-256KB in size, and a small number of data was 256KB-512KB in size. Only a few data exceeded 512KB in size. By 2009, there were 1M and 2M. So as long as the processor is available L2 Cache With a capacity of more than 256KB, you can handle normal applications; The 512KB L2 cache is enough to meet the needs of most applications.
Which is more important: primary frequency, L2 cache, or L3 cache
The working principle of cache is that when the CPU wants to read a data, it first looks in the cache. If it is found, it will be read immediately and sent to the CPU for processing; If it is not found, it will be read from the memory at a relatively slow speed and sent to the CPU for processing. At the same time, the data block where the data is located will be transferred to the cache, so that the whole block of data can be read from the cache in the future without having to call the memory.
It is this reading mechanism that makes the hit rate of CPU read cache very high (most CPUs can reach about 90%). That is to say, 90% of the data that the CPU reads next time is in the cache, and only about 10% needs to be read from memory. This greatly saves the time for the CPU to read the memory directly, and basically eliminates the need for the CPU to wait when reading data. In general, the CPU reads data in the order of cache first and then memory.
The size of the CPU's L2 cache and L3 cache is not the only standard to measure the performance of the CPU. It also depends on the main frequency of the CPU and the manufacturing process. For example, the 45nm cache is better than the 65nm cache. It also depends on whose product it supports, L2 cache is important for Intel products, but L2 cache is not as important for AMD as Intel because AMD has L3 cache in addition to L2 cache.
Whether the primary frequency, L2 cache or L3 cache is more important depends on what you are pursuing and what tasks you are mainly performing. The main frequency is high and the operation speed is fast. The L2 and L3 caches serve as buffers between the memory and the CPU. Alleviating the mismatch between memory and CPU speed will affect the efficiency of CPU execution. Therefore, large L2 and L3 are more efficient when the CPU processes large amounts of data for a long time. High dominant frequency can process a small amount of data quickly in a short time. In fact, these three items are very important, and any item that fails to meet certain standards will have a bottleneck effect.
IntelXeon 7100 series CPU (16MB L3 cache)
Intel officially released the latest dual core for high-end servers Xeon processor , code named Tulsa's Xeon 7100 series. The processor is still based on the previous generation NetBurst architecture, but there are no small improvements in performance and power performance.
Xeon 7100 Series CPU Configuration
The Xeon 7100 series CPU core code is Tulsa, with dual core design. Each core is equipped with 1MB L2 (Level 2) cache and 16MB L3 (Level 3) cache. The processor also supports Hyper Threading, Intel Virtualization Technology, and Intel Cache Safe Technology. The TDP power consumption of Xeon 7100 series processors above 3.0GHz is 150W, and that of processors below 3.0GHz is 95W.
Xeon 7100 Series CPU Specifications
CPU No. Main frequency FSBL2 Cache L3 Cache 1000 Price
7140M 3.40GHz 800MHz 2×1MB 16MB 1980
7140N 3.33GHz 667MHz 2×1MB 16MB 1980
7130M 3.20GHz 800MHz 2×1MB 8MB 1391
7130N 3.16GHz 667MHz 2×1MB 8MB 1391
7120M 3.00GHz 800MHz 2×1MB 4MB 1177
7120N 3.00GHz 667MHz 2×1MB 4MB 1177
7110M 2.60GHz 800MHz 2×1MB 4MB 856
7110N 2.60GHz 667MHz 2×1MB 4MB 856
Xeon 7100 series processors support 667 and 800MHz PSB buses and are compatible with 8501 chipsets. Existing server platforms can be easily upgraded to new processors. In addition, Intel said that the 65nm process is developing very smoothly, and the proportion of 65nm products shipped has exceeded 90nm