synonymAMD APU(Processor) generally refers to APU (chip released by AMD in 2011)
APU (Accelerated Processing Unit)AMDThe product with the concept of "integrating the future" will, for the first timea central processorIt is built on the same chip with the unique display core. It has both high-performance processors and the processing performance of the latest independent graphics card. It supports DX11 games and the "accelerated computing" of the latest applications, greatly improving the computeroperating efficiency 。
In January 2011,AMDLaunched a revolutionary product, AMD APU, which is the first product of AMD Fusion technology.
In June 2011, Llano APU for the mainstream market was officially released.In October 2012, AMD released Trinity series chips.AMD claims that TrinityNotebook computerthanIntelChip computers are cheap, butrunning speed Equivalent.TrinityfunctionSpeed ratioLlano25% faster, and the computing speed of the graphics core is 50% faster.
In June 2013, AMD launched a new generation of APUs, namely, the Supreme Quad Core Richland, the Classic Quad Core Kabini and the Supreme Mobile Quad Core Temashi, which became the latest leading products of the desktop and mobile APUs respectively.
AMD launched Kaveri series APUs in 2014, supporting HSA heterogeneous architecture computingCPUAnd GPUCollaboration, and uses a 28nm process and GCN architecture GPU, which has reached a new level of performance compared with previous generations of APUs.
AMD also launched the PS4 APU andXbox OneAPU,In terms of performance, the PS4 APU is 1.5 times that of the Xbox One APU and 5 times that of the computer APU-7850k.
The APU of the PS4 is very powerful in performance, with 1.84T/s floating point GPU and 176GB/s 8GB GDDR5Shared memoryIn terms of performance, it can be compared with high-end computers.[1]
APU will use the x86 architecture for general computingCPU coreIt is integrated with programmable vector processing engine to make CPU more precisescalarOperation and large-scale parallel that only GPU has traditionallyVector operationCombine.AMD APU design combines CPU andGPUIt brings unprecedented flexibility for software developers to develop new applications in the most suitable way.AMD APU connects the vector processing architecture of a programmable x86 CPU and a GPU on a single silicon chip through a high-performance bus, and both sides can directly read high-speed memory.AMD APU also contains some other system components, such asMemory controller, I/O controller, dedicatedVideo Decoder 、display outputandBus interfaceEtc.The charm of AMD APUs is that they contain all the processing power composed of scalar and vector hardware.
PS4 APU framework diagram
The so-called APU is actually the abbreviation of "Accelerated Processing Unit", which is a new type of "Fusion" processor introduced by AMD that integrates x86/x64 CPU processing core and GPU processing core. Therefore, we can also find the term "Fusion Acceleration Processor" on the Internet.AMD has two kinds of APU platforms, one is the E-series entry-level APU that has been available in the market before, and the other is in 2011European and American marketsThe A-series mainstream APU, which is officially launched, is divided into four series: A4/A6/A8/A10, which is commonly referred to as "Llano APU processor" (Rano APU processor).
Therefore, the APU platform of Series A is generally called the Llano APU platform. Of course, some people also call the Llano APU platform the GPU integrated with APU“LynxPlatform "(LynxPlatform).
AMD believes that the integration of CPU and GPU will be carried out in four steps:
X1 APU framework diagram
The first step is physical integration, which integrates CPU and GPU on the same silicon chip, and uses high bandwidthInternal busCommunication, integrated high-performance memory controllersoftware systemfacilitateHeterogeneous computing。
The second step is called Optimized Platforms. The interconnection interface between CPU and GPU is further enhanced, and unified bidirectionalPower management, GPU also supports advancedprograming languageThis part is the most critical.
The third step is architectural integration to achieve a unified CPU/GPUAddressing space, GPU can be pagedsystem memoryGPU hardware is schedulable and CPU/GPU/APU memory is coordinated, which has been initially completed in APU.
The fourth step is architecture andsystem integration (Architectural&OS Integration), the main features include GPU computing environment switching, GPU graphics first computing, independent graphics cardPCI-ECollaborationTask parallelismRunning real-time consolidation, etcMicrosoft, ADOBE, etcIndustry softwareThe giant keeps communicating.
APU isAMDMost of the company's research results on fusion technology over the years in traditional computingFloating point operationsThey all leave the CPU and turn to the GPU that is good at this. GPU is no longer just a game tool, and hybrid computing will shine.In the near future, the concepts of CPU and GPU will also gradually blur, as AMD has advocated: The Future is Future.
framework
Announce
edit
Latest RyzenAPU[2]
Trinity APUIt was officially released on October 2, 2012LlanoAPU was released one year and three months ago, and the desktop platform code is "Virgo",Mobile platformAs "Comal", the new generation APU is manufactured by GlobalFoundries 32nm SOI HKMG process[3], with 2-4 improvedBulldozer architectureCPUCore,Sandy Bridge For "Piledriver", it can be said that this part of the improvement is relatively large, because the CPU part of the previous generation of Llano still uses the older K10 architecture, and the integrated GPU part has also been significantly improved, and the HD6000 core will be used VLIW4(CaymanThe HD6900 core is the new graphics core of this architecture.Direct competition for Intel Ivy Bridge architecture processors will be launched in April.AMDstayProcessor performanceContinues to lag behind, while leading significantly in graphics performance.New generation AMD Ryzen APU[4-5]Officially launched on February 12[2]。
Show Core
Piledriver APU
Trinity APU is based on the enhancedBulldozerArchitecture“Piling machine”(Piledriver), at most two modules and four cores, supporting the third generationDynamic acceleration technologyTurbo Core 3.0,Integrate VLIW4 architecture at the same timeRadeonHD 7000 series graphics core.
performance prediction
HD Graphics 3000
We can see the performance of Trinity APU processor from the mobile platform displayed by AMD.AMD has run the new DX11 on the laptop equipped with Trinity APU《Deus Ex: Human Revolution》To facilitate understanding, AMD also compared Intel's Sandy Bridge platform (the mobile Sandy Bridge is built with HD Graphics 3000)Anti-Aliasing (MLAA)、Texture filtering, screenAmbient Occlusion (SSAO)、Depth of field(DOF), post processingSurface subdivisionAfter special effects and technologies, Trinity APU platform runs more smoothly, while Sandy Bridge platform will show obviousCaton phenomenon。withPCMark Vantage、3DMark VantageBy measuring the performance, the processor performance and graphics performance of the desktop version can be improved by up to 30% compared with the Llano APU, while the laptop version can be improved by up to 25% and 50%.Trinity APU will targetWindows 8The operating system is specially optimized and new video processing capabilities are introduced, especiallyvideo compressionThe engine "VCE" is targeted at Intel QuickSync transcoding engine.
Power consumption and endurance
as forBattery endurance,AMDInternal testThe answer is: the Windows desktop is idle for 12 hours and 28 minutes, playingDVDStandard definition movie 7 hours and 15 minutes, playingBDBlu ray HD movie 4 hours, 2 minutes, running3DMark06 Test for 3 hours and 20 minutes.
Memory controller
Trinity APU has also improvedDDR memoryController, which can supportDDR3-2133 memory. From the test of Llano APU, the improvement of memory performance directly affectsgraphic displayPart of the performance can be improved by up to 55% after upgrading from DDR3-1333 memory to DDR3-1866 memory.Perhaps because of the many modifications, Trinity APU adopts a new FM2 encapsulation interface, which is incompatible with FM1 interface.
Future Outlook
Announce
edit
This generation of Llano APU did not play its due role due to shortage - Fusion APU was officially released on March 1, 2011, the mainstream Llano APU was officially released on June 1, 2011, and in mid September, it belonged to A series APUA8-3850 andA6-3650We haven't seen the goods in the store yet, at least we haven't seen the goods in Zhongguancun store.Under the condition that Sandy Bridge completed the distribution early and started the publicity, how many more Llano APUs havePerformance spaceIt is not known yet. Maybe the real energy of APU can burst out on Trinity APU.New bulldozer architecture processing core and more emphasis on integer computing performanceGeneral calculationThe new VLIW4 architecture graphics core will make the new generation Trinity APU more attractive, and the power of the fusion concept first proposed by AMD will also be released at that time.
Schema resolution
Announce
edit
APU and integration
APU and integration
differBulldozerLlano APU does not use a new kernel architecture, even unlikeBrazosAt least the processor of APU platform is the new Bobcat architecture, which is mainly K10 processor, DX11 graphics card (andNorth BridgeChip), but obviously not1+1=2So simple.The problem faced by Llano APU is not only to avoid 1+1<2, but also to strive to achieve 1+1>2.
Llano APUdesign goal There are mainly the following:
-Comprehensive performance of CPU and GPU: Provide the best CPU and GPU performance at the same time.
-Independent graphics card level GPU experience: Complete DX11 and function set;Dragtranscoding andAeroEffect, etcWindows 7Experience.
-Unique dual graphics card technology: Provides additional performance with the AMD Radeon standalone graphics card.
-Next generation video acceleration: That isUVD3 engines, innovative display and image quality functions, and higher bandwidth.
-3D stereo: Support HD3D, including Blu ray 3D, DisplayPort 1.1DP1.2)、HDMI1.4a。
It can be seen that five and a half of the six goals are related to GPU, and only half are related to CPU. The focus of Llano APU is self-evident, which is also consistent with the name of platforms such as AMD VISION.
Llano APU chip is manufactured by GlobalFoundries 32nm HKMG process, and is divided into two versions. One is the complete version, integrating 1.45 billiontransistor, core area 228square millimetre, also called Big Llano or Llano 1;The second is the compact version, which integrates 758 million transistors. The core area is unknown for the time being. It is also called Small Llano or Llano 2.Both adopt the new micro PGA encapsulation interface Socket FS1772 pin without top cover,Pin spacing1.2192 mm,Chip size35 × 35=1225mm2.
From all aspects, the first full version of the Llano APU was released, and the dual core version was also shielded by the quad coreThermal design power consumptionIt is also high.I don't know when we will see the native dual core version, but AMD revealed that it will launch a new version that does not require fan cooling in the near futurelow power consumptionThe model, I guess.
Similar to the previous Brazos APU, the Llano APU also integrates the following modules on a single silicon chip:x86Processor core、L2 Cache、DDR3 memoryController, graphicsSIMDArray (i.e. GPU)Display controller, UVD decoding enginePCI-E controller。From the two figures below, you can see the distribution position and relative size of each module.
Llano APU integrates so manyfunctional module How to ensure the high-speed interconnection between them, so as to keep the whole in the best state at any time and avoid any potential bottlenecks, is undoubtedly the most critical point in the APU design process, and is also the basic premise for achieving the effect of 1+1>2.AMD has obviously made great efforts in this regard, for example, specially designed a new Fusion Compute Link toNorth BridgeModule, GPUIOThe input and output are connected in series, allowing the GPU to access the consistency cache/memory. At the same time, a Radeon Memory Bus is built between the GPU and the North Bridge(Video memoryGPU access through high-speed bandwidthsystem memory。
In the final analysis, APU does not simply integrate the CPU and GPU into one silicon chip, otherwise it will not take AMD more than three years to modify the design repeatedly to finallyCultivate the fruit。
CPU and Turbo Core
The processor part of Llano APU comes from Stars architecture, commonly known as K10 architecture, andPhenomII/Athlon II series is of the same origin, which is more exactly equivalent to the previous Phenom II Mobile series on the mobile platform, with 128 bitfloating-pointUnitL1 cache(64KB+64KB per core), L2 cache (1MB per core), but noneL3 cache。
Of course, everything is not completely copied.exceptmanufacturing processfrom45nmImproved to 32nm for more efficient transistor controlIntegration, core area, frequency and power consumption, supportC6The power status has also been greatly optimized in detail, including higher capacity L2 cache, improved hardware prefetching, larger window size, and hardwareSplitter, support the intelligent overclocking technology of the second generation Turbo Core, and finallyClock cycleThe number of instructions has increased by more than 6%.
The Turbo Core, whose official Chinese name is:“Intelligent overclocking”。This technology first appeared on the six core Phenom Ⅱ X6 series, and now it has evolved to the second generation, supporting the whole range from bulldozer to APUSeries of products, but as of 2011, there is basically nosoftware toolThe dynamic frequency of Turbo Core can be monitored in real time, onlyAIDA64IncidentalCPUIDNot bad.
We know that the actual power consumption of processors under different loads varies greatly, and there is still some room for the maximum thermal design power consumption. On the other handMulti coreThe processor is active in different application environmentsNumber of coresIt is also different, which results in that processor resources cannot be fully utilized, resulting in waste.
The solution is based on power consumptionmonitorThe power consumption of each processor core is measured in real time, summarized by Beiqiao, and then uniformly reported to the P-State power state manager, who then allows each processor core to run under the appropriate power state, or speed down or speed up, especially when speed up, it can exceed the original frequency for a short time, and ensure that the overall thermal design power consumption is never exceeded.
The innovation of AMD Turbo Core is the use of digital advancedPower management(APM)Compared with the analog temperature and current monitoring methods in similar technologies, the module can provide highly sensitive power management,accuracyHigher, completeRepeatability。
More importantly, the Turbo Core will automatically coordinate the CPU and GPU, so that those who need more resources can obtain higher speed.When the GPU is idle, it will significantly reduce its frequency to increase as much as possibleCPU frequency。
If a heavy graphics or video task is encountered, the GPU will get a higher priority, and the CPU will take the second place.
If the GPU performs light load tasks such as DVD video playback, the acceleration space left for the CPU should exclude that part of the GPU from the overall thermal design power consumption.
In extreme cases, if both CPU and GPU are facing busy tasks, or need to work together for OpenCLAPPAccelerate the calculation. At this time, the CPU and GPU will be accelerated at the same time, or even exceed the design power consumption limit in a short time. Then, reduce the CPU frequency and power consumption (GPU remains unchanged) according to the situation to ensure that the core temperature is not too high.This is the second generation on Sandy BridgeTurbo BoostSome similarities.
Accelerate processor
In terms of memory support, Llano APU mobile version supports dual channel DDR3 SO-DIMM, one for each channelMemory module, that is, only two memories can be inserted in total, with a maximum capacity of 32GB.In terms of frequency and voltage, the standard version of DDR3 has a maximum of 1600MHz, a voltage of 1.5V, and the low-voltage version of DDR3L has a maximum of 1333MHz, a voltage of 1.35V, and a bandwidth of 25.6GB/s.
The desktop version of Llano APU supports dual channel DDR3DIMMThere are two memory modules per channel, and a total of four memory modules can be inserted. The maximum capacity is 64GB. It supports 1.35V DDR3-1333, 1.5V DDR3-1866, and the maximum bandwidth is 29.8GB/s.
Since CPU and GPU are "in the same room", it is inevitable to compete for resources (in fact, APUMemory bandwidthOfdependenceIs really strong), so AMD willMemory controllerThe bandwidth between the memory controller and the memory is four times that of the previous generation platform, and higher than that between the memory controller and the memory.
This part is the focus of Llano APU.itsDevelopment codeIt is "Sumo" (Sumo), originated from Redwood core of Radeon HD 5600/5500 series in the first generation DX11 family, with a maximum of 400Stream Processor, 20 texture units, 2 rendering backend, 8ROP unit, video memoryBit width128-bit。Unfortunately, independentGDDR5There is no video memory, and it is not like 880G motherboardOn boardHard video memory, can only share the systemDDR3 memory。
In addition to inheriting the original TeraScale 2 unified processing architecture, and the complete DX11OpenGL4.1 Various anti aliasing andAnisotropic filtering(including morphological anti aliasing MLAA), APPParallel computingBeyond acceleration technologyThe Sumo core has also added the UVD3 video decoding engine and power gating (deep power management and energy saving) from the Radeon HD 6000 family, redesigned the video memory interface to the North Bridge, and synchronously adopted the latest GlobalFoundries 32nm in manufacturing technology.
The Sumo core is naturally a VLIW5 5D stream processor architecture,Single precisionThe floating point computing performance is up to 480GFlops, and the integer computing performance is up to 480Gints, both of which are 480 billion times per second.
Accelerate processor
As a Fusion APUcompetitorAlthough the HD Graphics 3000/2000 integrated by Intel Sandy Bridge has made great progress over the previous generationGraphic technologyVideo technology is still far behind. In particular, OpenCL parallel computing is only supported by the processor, and the graphics core does not support it, so it cannot be co accelerated.
Accelerate processor
The processor and graphics core of Llano APU supportAMD APPaccelerateParallel processing technology, especially the OpenCL standard specification, for which AMD will constantly update the APPSDKDevelopment package, providing better performance and more functions.According to the plan,APP SDK version 2.5 will be launched in August, mainly Windows 7/Linuxperformance optimization, Multi GPU support (Windows 7)fast fourier transform (Root cardinality 5), UVD3/MPEG2 decoding, PowerExpress exclusive displayCollective displaySwitch support, GPUdebugger (Windows 7) and so on.
It is worth mentioning that,The OpenCL specification version officially supported by Llano APU has been updated to 1.2。
along withChip integrationThe composition of both desktop and mobile platforms is becoming more and more simpleNorth South BridgeThe dual three chip architecture has disappeared, replaced by the dual chip architecture of processors and interconnect chips.Most of the functions originally handled by Beiqiao have been transferred to the processor, including the graphics core, and the so-called chipset is left to serve as aSouth BridgeFunctional microchip.
Accelerate processor
The Hudson series chipset matched with Llano APU processor is alsoSingle chipDesign,There are A70M and A60M models on the mobile platform, the codes are Hudson-M3 and Hudson-M2 respectively, through the UMI bus(PCI-E1.0 x4+DP) and processor interconnection.It is the same master brother as Hudson-M1 A50M, which was previously used for the Brazos APU platform.
A70M/A60M chipset adopts65nm process manufacturing, 605 ball pin FC BGA package, chip size 23 × 23=529mm2,Typical thermal design power consumption 2.7-4.7W。
Both chipsets support sixSATA6Gbps storage interface and supportRAID 0/1 array mode, can provide four PCI-E 2.0 x1 connection channels, integratedClock generator、Consumer gradeInfrared receiver, fan control, voltage sensing, DAC (supportVGA)Wait, the main difference isUSB interface: The A70M natively supports fourUSB 3.0, tenUSB 2.0And two internal USB 1.1,The A60M does not have USB 3.0, but 14 USB 2.0。
There is also an optional replacement role on this platform, that is, the Vancouver Radeon HD 6000M series independent graphics cardPCI-The E x16 channel is connected to the processor.It can not only bring unique display performance to the notebook, but also support the graphics core integrated with Llano APU to form a dual display switching and acceleration system.
Finally, let's talk about power management andEnergy saving technology, including 32nm HKMG new process, AMD Turbo Core 2.0 dynamic speed regulation technologySystem management mode(SMM), ACIP compatible, P-states, C-states, S0/S3/S4/S5Sleep state, each core power gating (CC6), PCI-E core power gating, Radeon stream processor core and UVD3 video engine power gating.
Power GatingIt is particularly worth mentioning.It is a very scarce technology in the AMD 45nm era, and now it is finally fully supported.Compared toClock gating(Clock Gating), which can not only adjust theOperating frequency, voltage, and can be completely turned off when not needed to achieve partial zero power consumption.let me put it another way,Each processor core of Llano APU, each PCI-E controller, stream processor array, and UVD3 engine can be completely shut downThe Turbo Core technology has also reached a new level.
All of the above belong to AMD AllDay all day computing technology.According to the data given by AMD, ⅥStandby timeIt can take up to six and a half hours, and it can take up to ten hours to welcome the VI Session 2011 of APU;At the same time, compared with competitors, standbyEnduranceMore than one and a half hours longer, and the full load endurance will also be one hour longer.
technical parameter
Announce
edit
Desktop level
Richland Platform
The dynamic frequency modulation technology on Richland APU is "Hybrid Boost", which integrates moreTemperature sensor, and adjust the Turbo acceleration algorithm to make it more intelligent.In the past, CPU and GPU were accelerated at the same time when acceleration was needed, but this is rare. The current algorithm can ensure that those parts that need stronger performance will be accelerated.
Virgo Platform
The mobile platform is "Comal", and the new generation APU is manufactured by GlobalFoundries 32nm SOI HKMG process. It has 2-4 CPU cores based on the improved bulldozer architecture, and the core code is "Piledriver". It can be said that this part of the improvement is relatively large, because the CPU part of the previous generation Llano still uses the older K10 architecture, and the integrated GPU part has also been significantly improved,The HD6000 core will be replaced by a new graphics core with VLIW4 (the HD6900 of the Cayman core is the same architecture) architecture.
Trinity APU was officially released on May 15, 2012. Its main task is to replace Llano as a new generation of mainstream and high-performance mobile integrated processor.Like the Llano APU, it has at most four physical cores, but the core architecture has been upgraded from K10 to the Piledriver (the second generation of pile driversBulldozer)The fusion single display part can have up to 384 DX 11 RadeonsStream Processor(upgraded to the VLIW 4 architecture of HD 6900 series), the matched single chip still supportsSATA6Gbps、USB 3.0、PCI-E2.0 and other specifications, as for dual displayHybrid CrossFireX The function is also continuously supported.
Compared with the previous generation of AMD APU, the new generation of pile driver core Trinity has made a leap in performance. Each of its computing modules is composed of two cores, each module is equipped with a 2MB cache. The pile driver provides IPC improvementEnhancements such as lease reduction, CAC reduction and frequency lift, which are different from Llano's design, make Trinity more powerful in performance, and the performance improvement will be very obvious.On the way to previously announced APU institutions,Memory controllerCore unitthroughputAnd information processing capability has always been an important improvement project. Because of integration, the improvement of these single functions will greatly improve the actual application performance of AMD Trinity.[6]
Note: The GPU core configuration format is:Number of stream processors, number of texture units, number of raster units
model
Core Threads
Dominant frequency
Acceleration frequency
L2 Cache
GPU model
GPU configuration
GPU frequency
TDP
Memory support
E2-3200
Dual core and dual thread
2.4GHz
nothing
2×512 KB
HD 6370D
160:8:4
443MHz
65W
DDR3-1600 dual channel
A4-3300
Dual core and dual thread
2.5GHz
nothing
2×512 KB
HD 6410D
160:8:4
443MHz
65W
DDR3-1600 dual channel
A4-3400
Dual core and dual thread
2.7GHz
nothing
2×512 KB
HD 6410D
160:8:4
600MHz
65W
DDR3-1600 dual channel
A4-3420
Dual core and dual thread
2.8GHz
nothing
2×512 KB
HD 6410D
160:8:4
600MHz
65W
DDR3-1600 dual channel
A6-3500
Three cores and three threads
2.1GHz
2.4GHz
3×1MB
HD 6530D
320:16:8
443MHz
65W
DDR3-1866 dual channel
A6-3600
Four cores and four threads
2.1GHz
2.4GHz
4×1MB
HD 6530D
320:16:8
443MHz
65W
DDR3-1866 dual channel
A6-3620
Four cores and four threads
2.2GHz
2.5GHz
4×1MB
HD 6530D
320:16:8
443MHz
65W
DDR3-1866 dual channel
A6-3650
Four cores and four threads
2.6GHz
nothing
4×1MB
HD 6530D
320:16:8
443MHz
100W
DDR3-1866 dual channel
A6-3670K
Four cores and four threads
2.7GHz
nothing
4×1MB
HD 6530D
320:16:8
443MHz
100W
DDR3-1866 dual channel
A8-3800
Four cores and four threads
2.4GHz
2.7GHz
4×1MB
HD 6550D
400:20:8
600MHz
65W
DDR3-1866 dual channel
A8-3820
Four cores and four threads
2.5GHz
2.8GHz
4×1MB
HD 6550D
400:20:8
600MHz
65W
DDR3-1866 dual channel
A8-3850
Four cores and four threads
2.9GHz
nothing
4×1MB
HD 6550D
400:20:8
600MHz
100W
DDR3-1866 dual channel
A8-3870K
Four cores and four threads
3GHz
nothing
4×1MB
HD 6550D
400:20:8
600MHz
100W
DDR3-1866 dual channel
Athlon Ⅱ X4 631
Four cores and four threads
2.6GHz
nothing
4×1MB
nothing
nothing
nothing
100W
DDR3-1866 dual channel
Athlon Ⅱ X4 651
Four cores and four threads
3GHz
nothing
4×1MB
nothing
nothing
nothing
100W
DDR3-1866 dual channel
Mobile version
Comal platform
The new generation APU is manufactured by GlobalFoundries 32nm SOI HKMG process, and has 2-4 CPU cores based on the improved bulldozer architecture. The core code is "Piledriver". It can be said that this part of the improvement is relatively large, because the CPU part of the previous generation Llano still uses the older K10 architecture, and the integrated GPU part has also been significantly improved