a central processor

VLSI
Collection
zero Useful+1
zero
synonym Central processing unit (Central processing unit) generally refers to the central processing unit
Central Processing Unit (CPU) serves as computer system Of operation And control core, is information processing . Final running of the program execution unit Since the generation of CPU Logical structure operating efficiency And functional extension has made great development. [1]
Chinese name
a central processor
Foreign name
Central Processing Unit (CPU)

Development history

Announce
edit
CPU appears in large-scale integrated circuit Times, processor Iterative update of architecture design and Integrated circuit process The continuous improvement of promotes its continuous development and improvement. From the original dedicated to mathematical calculation to widely used in General calculation , from 4 bits to 8 bits, 16 bits 32-bit processor , last 64 bit processor , from incompatibility to difference among manufacturers Instruction set With the appearance of architecture specification, CPU has been developing rapidly since its birth. [1]
CPU has developed for more than 50 years. We usually divide it into six stages. [3]
(1) The first stage (1971-1973). This is the low gear of 4-position and 8-position microprocessor The representative products of the times are Intel four thousand and four Processor. [3]
In 1971, Intel's 4004 microprocessor Arithmetic unit and controller Integrated on a chip, marking the birth of CPU; In 1978, eight thousand and eighty-six The emergence of processors laid the foundation for X86 Instruction Set Architecture , then 8086 series processors are widely used in personal computer Terminal High performance server as well as ECS Medium. [1]
(2) The second stage (1974-1977). This is the era of 8-bit middle and high-end microprocessors. The representative products are Intel 8080 here Command system Already quite perfect. [3]
(3) The third stage (1978-1984). This is the era of 16 bit microprocessors. The representative products are Intel 8086 It is relatively mature. [3]
(4) The fourth stage (1985-1992). This is the era of 32-bit microprocessors. The representative products are Intel 80386 Already competent Multitasking , multi-user jobs. [3]
Issued in 1989 eighty thousand four hundred and eighty-six The processor implements 5 levels Scalar pipeline , which marks the initial maturity of CPU and the end of the development stage of traditional processors. [1]
(5) The fifth stage (1993-2005). This is Pentium Series microprocessor era. [3]
In November 1995, Intel Published Pentium Processor, which is the first time to use Superscalar Instruction pipelining Structure, which introduces the Out of order execution and Branch prediction Technology, which greatly improves the performance of the processor. Therefore, superscalar instructions Pipeline structure Modern processors, such as AMD (Advanced Micro devices) turion Intel Of CoRE Series, etc. [1]
(6) The sixth stage (after 2005). More processors core , higher Parallelism development. Typical representatives are Intel Of CoRE Series processors and AMD Of turion Series processors. [3]
To meet operating system The modern processors have further introduced Parallelization , multi-core Virtualization And remote management system And other functions, constantly promoting the development of the upper information system. [1]

working principle

Announce
edit
Von Neumann Architecture is the foundation of modern computer. Under this architecture, program and data Unified storage instructions and data Need to be from the same storage space Access via the same Bus The transmission cannot be overlapped. according to Von Neumann system The work of the CPU is divided into the following five stages: instruction fetching stage, instruction decoding stage, instruction execution stage, access number and result write back. [1]
Fetching instruction (IF, instruction fetch), To change an instruction from Main memory Get from Instruction register Process. Program counter The value in indicates the position of the current instruction in main memory. When an instruction is taken out, Program counter The value in (PC) will be automatically increased according to the instruction word length. [1]
Instruction decoding stage (ID, instruction decode), After taking out the command, Instruction decoder As scheduled Instruction format , split and interpret the retrieved instructions, identify different instruction categories and various methods of obtaining operands. Modern CISC processors will split to improve parallelism and efficiency. [1]
Execute command phase (EX, execute ), specifically realizing the function of the command. Different parts of the CPU are connected to perform the required operations.
In the MEM (memory) stage, the CPU accesses the main memory and reads the operands according to the instruction needs, gets the address of the operands in the main memory, and reads the operands from the main memory for operation. If some instructions do not need to access main memory, this phase can be skipped. [1]
result Writeback WB (write back), as the last stage, the result write back stage "writes back" the running result data of the execution instruction stage to some storage form. The result data will generally be written to the internal registers of the CPU for subsequent instructions Fast access; Many commands will change Program Status Word Register The status of flag bits in. These flag bits identify different operation results and can be used to affect the action of the program. [1]
After the instruction is executed and the result data is written back, if there is no Accidents (such as result overflow, etc.) Program counter Get the address of the next instruction, start a new cycle, the next Instruction cycle Take out the next instruction in sequence. [1] Many complex CPUs can extract multiple instructions decode , and execute at the same time.

brief introduction

Announce
edit
Central processing unit (CPU), which is electronic computer One of the main equipment of the computer, the core accessory of the computer. Its function is mainly to explain Computer instruction And processing data in computer software. CPU is responsible for reading instructions, decoding instructions and Execute instruction The core component of. The central processing unit mainly consists of two parts, namely, the controller Arithmetic unit , which also includes Cache memory And realize the connection between them data , Controlled Bus The three core components of electronic computers are CPU Internal memory , Input/ output device The functions of central processing unit mainly include processing instructions, executing operations, controlling time and processing data. [2]
stay Computer Architecture In, CPU refers to all hardware resources of the computer (such as storage , I/O unit) is the core hardware unit that performs control allocation and general operation. CPU is computer The core of operation and control. computer system The operations of all software layers in the Instruction set Operation mapped to CPU. [1]

Performance structure

Announce
edit

Performance measures

For CPU, the main indicators that affect its performance are Dominant frequency , CPU bits, CPU cache Instruction set CPU core Number and IPC( Instructions per cycle )。 The main frequency of the CPU is clock frequency , which directly determines the performance of the CPU. You can improve the main frequency of the CPU by overclocking to achieve higher performance. The number of CPU bits refers to those that can be calculated by the processor at one time Floating point number In general, the higher the number of CPU bits, the faster the CPU will perform operations. After the 1920s personal computer The CPU used is generally 64 bit, because 64 bit processors can process a larger range of data and natively support higher memory addressing Capacity, improving people's work efficiency The cache instruction set of the CPU is stored inside the CPU, mainly referring to the hardware program that can guide and optimize the CPU's operation. Generally speaking, CPU cache can be divided into L1 cache L2 Cache and L3 cache , cache performance directly affects CPU processing performance. Some CPUs with special functions may be equipped with Level 4 cache. [4]

CPU structure

Generally speaking, the structure of CPU can be roughly divided into operations Logic unit register Components and control components, etc. The so-called operation logic unit can mainly carry out relevant Logical operation , for example, you can perform a shift operation and Logical operation , in addition, you can also perform fixed-point or floating-point Arithmetic operation Operation, address operation and conversion commands are multifunctional Arithmetic unit and register Components are used to temporarily store instructions, data and address Of. The control unit is mainly used to analyze commands and send corresponding control signal
For CPU, it can be regarded as a large-scale Integrated circuit Its main task is to process and process various data. Traditional computer Its storage capacity is relatively small, which makes it difficult to process large-scale data, and the processing effect is relatively low. With the rapid development of information technology in China, high configuration processor computers have emerged control center , for improving computer CPU Structure function Play an important role. The core part of the central processor is the controller and arithmetic unit, which play an important role in improving the overall function of the computer. It can realize the spread of multiple functions such as register control, logic operation, signal receiving and sending, and lay a good foundation for improving the performance of the computer. [2]
The integrated circuit plays the role of regulating signals in the computer, according to the user Operation instruction Perform different command tasks. The CPU is a very large scale integrated circuit. It consists of arithmetic unit, controller, register, etc., as shown in the figure below. The key operation is to process and process various data. [5]
CPU architecture [5]
Traditional computer storage capacity Small, facing large scale data set The operation efficiency is low. The new generation of computers use highly configured processors as the control center, and the CPU has great room for improvement in structure and function. The central processor takes the arithmetic unit and controller as the main devices, and gradually spreads to multiple functions such as logic operation, register control, program coding, signal receiving and sending, etc. These have accelerated the optimization and upgrading of CPU regulation performance. [5]

CPU bus

CPU bus is the fastest bus in the computer system, and it is also Chipset With the motherboard core. People usually connect the Local bus be called CPU bus Or called Internal bus , integrating those with various common Expansion slot The connected local bus is called system bus Or External bus In CPUs with a single internal structure, only one set is often set data transfer The internal bus of the CPU is used to connect the registers and arithmetic logic operation units inside the CPU. Therefore, this type of bus can also be called ALU Bus. The bus in the component connects each chip together by using a set of buses, so it can be called a component Internal bus , generally including Address line And data lines. System bus refers to the line connecting various components of the system together system The whole of the foundation connected together; The bus outside the system is the basic circuit connecting the computer and other equipment. [4]

Core part

Announce
edit

Arithmetic unit

Arithmetic unit It refers to all kinds of arithmetic and Logical operation Operated parts, where Arithmetic logic unit It is part of the central processing core. [2]
(1) Arithmetic logic unit( ALU )。 Arithmetic logic unit It means that multiple groups can be realized Arithmetic operation Logical Combinational logic circuit , which is an important part of central processing. Arithmetic logic unit mainly performs binary arithmetic operations, such as addition, subtraction and multiplication. In the operation process, the arithmetic logic unit mainly uses Computer instruction Centralized execution of arithmetic and Logical operation In general, ALU can play the role of direct read in and read out, which is embodied in the processor controller, memory and Input/output device In terms of input and output, the implementation is based on the bus. The input command contains a Instruction word , which includes Operation code , format code, etc. [2]
(2) Middle register IR )。 Its length is 128 bits, and it passes through Operands To determine the actual length. IR at“ Stack It plays an important role in the "parallel data retrieval" instruction. During the execution of this instruction ACC The content of is sent to IR, then the operand is taken to ACC, and then the IR content is put on the stack. [2]
(3) Operation accumulator ACC )。 Current register Generally, they are single accumulators with a length of 128 bits. For ACC, it can be regarded as a variable length accumulator. In the process of describing instructions, the expression of ACC length is generally based on the value of ACS, and ACS length and ACC length have Direct contact , doubling or halving the length of ACS can also be regarded as doubling or halving the length of ACC. [2]
(4) Description word register (DR). It is mainly used to store and modify description words. The length of DR is 64 bits. To simplify data structure Processing, using descriptors plays an important role. [2]
(5) B register. It plays an important role in the modification of instructions. The length of the B register is 32 bits, and the address modification amount can be saved during the address modification process, Main storage The address can only be modified with the description word. Pointing to the first element in the array is the descriptor. Therefore, to access other elements in the array, you should use the modifier. For array members, they are composed of data of the same size or the same size Element composition And continuous storage. The common access method is vector description word. Because the address in the vector description word is a byte address, the basic address should be added first in the conversion process. For conversion, it is mainly caused by Hardware Automatic implementation. Pay special attention to alignment during this process to avoid exceeding array limits. [2]

controller

Controller refers to changing in a predetermined sequence Main circuit Or control the wiring of the circuit and change the resistance value in the circuit to control the starting, speed regulation, braking and reversing of the motor. The controller is programmed Status register PSR, System status register SSR, Program counter PC, Instruction register As“ decision-making body ”, the main task is to issue orders, playing the whole computer system Operation coordination and command function. The classification of control mainly includes two types, namely, combined logic controller Microprogram controller Both parts have their own advantages and disadvantages. Including combination Logic controller The structure is relatively complex, but the advantage is fast; Microprogram The structure of the controller design is simple, but a Machine commands In the function, all microprograms need to be reprogrammed. [2]

Brand Introduction

Announce
edit

"Godson" series chips

Godson ”The series chips are designed and developed by China Science Technology Co., Ltd., Chinese Academy of Sciences MIPS The system structure has independent intellectual property rights, and the products now include Godson-1 Small CPU Godson-2 There are three series of medium CPU and Godson No.3 large CPU, and Godson 7A1000 bridge piece is also included. Godson-1 series 32/64 bit processors are specially designed for embedded systems Domain Design , mainly used for Cloud terminal Industrial control data acquisition Handheld terminal network security Consumer Electronics And other fields, with low power waste , High Integration And height cost performance Etc. Including Godson lA 32-bit processor Hegodon 1C 64 bit processor Stable operation at 266~300 MHz Godson 1B The processor is a lightweight 32-bit chip. Godson 1D The processor is a special chip for ultrasonic heat meter, water meter and gas meter. In 2015, New generation Beidou navigation satellite Equipped with Godson 1E and 1F chips independently developed by China, these two chips are mainly used to complete Intersatellite link Data processing task 1 of. [6]
Godson-2 series is 64 bit high-performance for desktop and high-end embedded applications Low power processor Godson-2 products include loongson2e , 2F, 2H, 2K1000, etc. Godson 2E has achieved external production and sales authorization for the first time. Godson2F The average performance is more than 20% higher than that of Godson2E, which can be used in personal computers, industrial terminals, industrial control, data collection, network security and other fields. Godson 2H launched its official product in 2012, which is applicable to computers Cloud terminal network equipment , consumer electronics, etc HT perhaps PCI -Full function chipset of e interface. In 2018, Godson launched Godson 2K1000 Processor, which is mainly oriented to the field of network security and Mobile intelligence Dual core processing chip in the field, with the main frequency up to 1 GHz , can meet Industrial Internet of Things Rapid development Autonomous and controllable Industrial safety system requirements. [6]
Godson-3 series is aimed at High performance computer , servers and high-end desktop applications Multi-core processor , featuring high bandwidth, high performance and low power consumption. Godson 3A3000/3B3000 processor is independent Microstructure Design, main frequency can reach 1.5 GHz above; Market oriented Godson3A4000 It is the first quad core chip of Godson's third-generation product. Based on the 28nm process, the chip adopts the newly developed GS464V 64 bit high-performance processor core architecture and realizes 256 bits Vector instruction At the same time, optimize the on-chip interconnection and memory access, and integrate 64 bits DDR3 /4 Memory controller , integrated on-chip security mechanism, main frequency and performance will be greatly improved again. [6]
Godson 7A1000 bridge piece is the first special bridge piece set product of Godson, which aims to replace AMD RS780+SB710 bridge piece set Godson processor provide North South Bridge Function. It was released in February 2018 and is currently matched Godson3A3000 And Ultraviolet 4G DDR3 memory Applied in a high-performance network platform On. Compared with 3A3000+780e platform, the overall performance of this scheme is greatly improved, with high national production rate and high performance high reliability Etc. [6]

Intel

according to Intel Product line planning, Intel's 11th generation by 2021 Consumer grade CoRE There are six categories of products: i9/ i7 / i5 / i3 / Pentium / celeron In addition, there are server oriented xeon Platinum/gold/silver/bronze and Xeon W series for HEDT platform.

AMD

according to AMD Product line planning, AMD by 2021 turion The 5000 series processor has four consumer product lines: Ryzen 9/Ryzen 7/Ryzen 5/Ryzen 3. In addition The server The third-generation Xiaolong EPYC processor in the market and the thread ripper series for the HEDT platform. [7]

Shanghai Megacore

Shanghai Zhaoxin Integrated Circuit Co., Ltd It is a state-owned holding company established in 2013. Its processors adopt x86 architecture, and its products mainly include ZX-A ZX-c/ZX-C+、 ZX-D、 First KX-5000 and KX-6000; Kaisheng ZX-C+, ZX-D, KH-20000, etc. Among them, the first KX-5000 series processor adopts 28 nm process and provides four core or eight core versions. Its overall performance is up to 140% higher than that of the previous generation, reaching the international mainstream Processor performance The standard can fully meet the requirements of the party and government desktop office applications, including 4K Ultra HD Video viewing and other entertainment applications. Kaisheng KH-20000 series processors are the CPU products launched by Mega Core for servers and other devices. Kaixian KX-6000 series Processor main frequency Up to 3.0 GHz, compatible with the full range Windows operating system And Zhongke Fangde Winning the bid Kirin , Puhua and other domestic autonomous and controllable operating systems, whose performance is similar to that of Intel's seventh generation Core i5 Equivalent. [6]

Shanghai Shenwei

Shenwei processor It is called "Sw processor" for short. It comes from Alpha 21164 of DEC and adopts Alpha Architecture, with completely independent intellectual property rights, its products include single core Sw-1, dual core Sw-2, four core Sw-410, sixteen core SW-1600/SW-1610, etc. Shenwei Blu ray Supercomputer 8704 SW-1600 are used, equipped with Shenwei Ruisi operating system All software and hardware have been localized. But based on Sw-26010“ Shenwei · Light of Taihu Lake ”Since its release in June 2016, supercomputers have occupied the first place in the world's top 500 supercomputers for four consecutive times. The two 10 million core complete machine applications on "Shenwei · Taihu Light" have covered the world in 2016 and 2017 High performance computing Highest award in application field“ Gordon Bell ”Award. [6]

classification

Announce
edit

How the instruction set works

CPU can also be classified according to Instruction set Is divided into Reduced instruction set computer (RISC) and Complex instruction set computer (CISC)。 RISC Instruction length And execution time are constant, and CISC instruction length and execution time are not certain. The parallel execution of RISC instructions is better, and compiler The efficiency is also high. CISC instructions have better optimization for different tasks at the cost of complex circuits and difficult to improve Parallelism typical CISC instruction set yes x86 Microarchitecture , typical RISC instruction set yes ARM Microarchitecture. But in modern times Processor architecture RISC and CISC instructions will be converted in the decoding link and split into classes inside the CPU RISC Instruction [4]

Embedded system CPU

The traditional embedded field refers to a wide range of fields, and is the main processor in addition to the server and PC fields application area The so-called "embedded" means that in many chips, the processors contained in them are as unknown as if they were embedded in them. [8]
In recent years, with the further development of various new technologies and fields, the embedded field itself has also been developed into several different sub fields, resulting in differentiation. [8]
First, with Intelligent mobile phone (Mobile Smart Phone) and Handset With the development of Mobile Device, the mobile field has gradually developed into an independent field with a scale comparable to that of PC field. Because the processors in the Mobile field need to be loaded Linux Operating system also involves complex software ecology, so it has the same serious dependence on software ecology as the PC field. [8]
The second is the real time embedded field. Relatively speaking, software in this field is not so serious dependence , so there is no absolute monopoly, but because ARM The success of processor IP commercial promotion is still dominated by ARM processor architecture market share , other processor architectures such as Synopsys ARC and others also have good market performance. [8]
The last is the deep embedded field. This field is more like the traditional embedded field mentioned above. In this field requirement Very large, but often pay attention to low power consumption , low cost and high Energy efficiency ratio , no need to load image Linux For such large-scale application operating systems, most of the software needs to be customized bare pager Program or simple real-time operating system Therefore, the dependence on software ecology is relatively low. [8]

Mainframe CPU

mainframe , or mainframe. Mainframes use dedicated processor instruction sets, operating systems, and application software. The term mainframe originally refers to a large computer system installed in a very large iron box with a frame, which is used to compare with smaller Minicomputer and microcomputer There are differences. [9]
Reducing mainframe CPU consumption is an important task. Save every CPU cycles , which can not only delay hardware upgrade, but also reduce Software Licensing Fees.
mainframe Architecture It mainly includes the following two points: high virtualization, system resource Share all. Mainframe can integrate a large number of loads and realize resources Utilization Maximization of; asynchronous I/O operations That is, when the I/O During operation, the CPU will I/O Instructions Give to I/O Subsystem To complete, the CPU itself is released to execute other instructions. Therefore, the host can perform other tasks while performing heavy I/O tasks. [9]

Control technology form

Announce
edit
The powerful data processing power of the central processor effectively improves the computer's work efficiency , on Data processing The operation is not just a simple operation. The operation of the central processing unit is based on the instructions given by the computer users Execute instruction In the task process, the user input Control command Corresponds to the CPU. With the rapid development of information technology in China, computers are widely used in people's life, work and enterprise office automation. As a master control device, computers play an important role in promoting the development of e-commerce networks Facilitation The upgrade process of CPU control performance is greatly improved. Command control Actual control , operation control, etc. is the computer CPU Technology application Function performance. [2]
(1) Select Control. The operation of centralized processing mode is based on specific program instructions, so as to meet the needs of computer users. The CPU can be selected according to the actual situation during the operation to meet the needs of users Data flow Demand. instructions control technology Play an important role. Formulate the operation mode according to the user's needs, so that Data instruction The orderly formulation of actions is well maintained. During the execution of CPU, the implementation of each instruction of the program is completed smoothly. Only by making it follow a certain order, can the computer use effect be guaranteed. CPU is mainly expanded data set Automated processing, which implements centralized control Its core is instruction control operation. [2]
(2) Insert control. CPU for operation control signal The generation of is mainly realized by the function of commands, and the purpose of controlling these components is achieved by sending commands to corresponding components. Implement a Command function , which is mainly completed by executing a sequence of operations by components in the computer. More small Control elements It is the key to build a centralized processing mode, which aims to better complete CPU data processing operations. [2]
(3) time control Applying time timing to various operations is called time control When executing an instruction, it should be completed within the specified time. The CPU instruction is generated from the Cache memory or storage And then perform instruction decoding operation, mainly in Instruction register In this process, attention should be paid to strict control program Time. [2]

Compare with GPU

Announce
edit

GPU

GPU I.e image processor , CPU and GPU Workflow and Physical structure It is roughly similar. Compared with CPU, GPU Our work is more simple. In most personal computer GPU is only used to draw image Of. If the CPU wants to draw a two-dimensional graph, it only needs to send an instruction to the GPU, and the GPU can quickly calculate all the graphs pixel , and on monitor Draw the corresponding figure at the specified position on the. Since GPU generates a lot of heat, there is usually an independent cooling device on the graphics card. [3]

Design structure

CPU has powerful arithmetic Arithmetic unit , can be used in a few Clock cycle The arithmetic calculation is completed within. At the same time, there is a large cache that can store a lot of data in it. In addition, there is complex logic control unit When the program has multiple branches Branch prediction To reduce latency. GPU is based on large throughput Design, a lot of arithmetic Arithmetic unit And very little cache. At the same time, the GPU supports a large number of threads to run at the same time. If they need to access the same data, the cache will merge these accesses, which will naturally cause a delay problem. Although there is delay, because of the large number of arithmetic operation units, it can achieve a very large throughput. [3]

Use Scenarios

Obviously, because the CPU has a lot of cache and complex logical control So it is very good at logic control and serial operation. In comparison, GPU is good at large-scale computing because it has a large number of arithmetic operation units, so it can perform a large number of calculations at the same time Concurrent computing , large amount of calculation but nothing technical content And repeated many times. In this way, we use GPU to improve the program Operation speed The method is obvious. Using CPU to do complex logic control and GPU to do simple but large amount of arithmetic operations can greatly improve the program's running speed [3]

safety problem

Announce
edit
With the vigorous development of CPU safety problem Appeared in 1994 Pentium FDIV on processor bug (Pentium Floating point division Error) will result in Floating point number Division error; 1997 Pentium The F00F exception instruction on the processor can cause the CPU to crash Intel processors in 2011 Trusted Execution Technology (TXT, trusted execution technology) exists out of buffer Problem, which can be used by attackers for privilege promotion; 2017 Intel Management Engine( ME , management engine) can lead to remote unauthorized arbitrary code Implementation; In 2018, Meltdown and Spectre Two CPU vulnerabilities affect almost every kind of computing device manufactured in the past 20 years, making the privacy information stored on billions of devices at risk of disclosure. These security problems seriously endanger the country network security , key infrastructure security and key industries information safety , has caused or will cause huge losses. [1]

Future development

Announce
edit
The general central processing unit (CPU) chip is IT industry The basic part of weaponry The core device of. China lacks Independent intellectual property rights The CPU technology and industry of national security It is also difficult to obtain comprehensive protection. During the "Tenth Five Year Plan" period, the national "863 Program" began to support independent research and development of CPUs. During the 11th Five Year Plan period electronic device High end general-purpose chip And foundation software product ”(“ Nuclear high base ”)Major projects introduced CPU achievements in the "863 Program" into the industry. Since the 12th Five Year Plan, China has independently developed CPU applications and launch a pilot project Has formed an independent technology and industrial system within a certain range, which can meet the requirements of weapons, equipment promotion of information technology And other fields. However, foreign CPUs have been monopolized for a long time, and it will take some time for China to develop its own CPU products and mature its market. [10]