Which includes the central processing unit of the computer. How does a computer processor work? Microprocessors: performance and trends

The processor is the main chip in your computer. Typically, it is also one of the most high-tech and expensive PC components. Despite the fact that the processor is a separate device, it has a large number of components in its structure that are responsible for a specific function. What is their specificity?

Processor: device functions and history of appearance

The PC component, which is now commonly referred to as the central processor, is characterized by a rather interesting history of origin. Therefore, in order to understand its specifics, it will be useful to investigate some key facts about the evolution of its development. The device, which is known to the modern user as a central processing unit, is the result of many years of improvement in the technology of manufacturing computing microcircuits.

Engineers' vision of the processor structure has changed over time. In computers of the first and second generation, the corresponding components consisted of a large number of separate blocks, very dissimilar in the tasks being solved. Starting with the third generation of computers, processor functions began to be viewed in a narrower context. Computer design engineers determined that this should be the recognition and interpretation of machine instructions, entering them into registers, as well as controlling other hardware components of the PC. All these functions began to be combined in one device.

Microprocessors

With the development of computer technology, devices called "microprocessors" began to be introduced into the structure of the PC. One of the first devices of this type was the Intel 4004 product, released by an American corporation in 1971. Microprocessors on the scale of one microcircuit have combined in their structure the functions that we defined above. Modern devices, in principle, operate on the same concept. Thus, the central processor of a laptop, PC, tablet contains in its structure: a logical device, registers, and a control module responsible for specific functions. However, in practice, the components of modern microcircuits are often presented in a more complex set. Let's study this feature in more detail.

Structure of modern processors

The central processor of a modern PC, laptop or tablet is represented by the kernel - now it is considered the norm that there are several of them, cache memory at various levels, as well as controllers: RAM, system bus. The performance of an appropriate type of microcircuit is determined by its key characteristics. In what totality can they be represented?

The most significant characteristics of a central processor on modern PCs are as follows: the type of microarchitecture (usually indicated in nanometers), clock speed (in gigahertz), the amount of cache memory at each level (in megabytes), power consumption (in watts), and the presence or absence of graphics module.

Let's study the specifics of the operation of some key modules of the central processor in more detail. Let's start with the kernel.

Processor core

The central processing unit of a modern PC always has a core. It contains the key functional blocks of the microcircuit, through which it performs the necessary logical and arithmetic functions. As a rule, they are presented in a certain set of elements. So, the device of the central processor most often assumes the presence of blocks that are responsible for solving the following tasks:

Fetching and decoding instructions;

Data sampling;

Following instructions;

Saving calculation results;

Working with interrupts.

Also, the structure of microcircuits of the corresponding type is supplemented by a control unit, a memory device, an instruction counter, and a set of registers. Let's consider the specifics of the work of the corresponding components in more detail.

Processor core: components

Among the key blocks in the core of the central processor is the one that is responsible for reading instructions that are written in the address recorded in the command counter. As a rule, several operations of the corresponding type are performed simultaneously during one clock cycle. The total number of instructions to be read is predetermined by the index in the decoding blocks. The main principle here is that at each tick, the marked components are maximally loaded. Ancillary hardware elements may be present in the processor structure to meet this criterion.

In the decoding block, instructions are processed that determine the algorithm for the operation of the microcircuit in the course of solving certain problems. Keeping them running is a daunting task, according to many IT professionals. This is due, in part, to the fact that the length of the instruction is not always clearly defined. Modern processors usually include 2 or 4 blocks in which the corresponding decoding is performed.

As for the components responsible for data fetching, their main task is to ensure that instructions are received from the cache memory or RAM, which are necessary to ensure the execution of instructions. The cores of modern processors usually contain several blocks of the corresponding type.

The control components present in the microcircuit are also based on decoded instructions. They are designed to exercise control over the work of the units that are responsible for the execution of instructions, as well as to distribute tasks between them, to control their timely execution. Control components belong to the category of the most important in the structure of microprocessors.

In the cores of microcircuits of the corresponding type, there are also blocks responsible for the correct execution of instructions. In their structure, there are elements such as an arithmetic and logical unit, as well as a component responsible for floating point calculations.

There are blocks in the processor cores that control the processing of expanding the sets that are set for instructions. These algorithms, in addition to the basic commands, are used to increase the intensity of data processing, to implement encryption or decryption procedures for files. The solution of such problems requires the introduction of additional registers into the structure of the microcircuit core, as well as sets of instructions. Modern processors usually include the following extensions: MMX (designed for encoding audio and video files), SSE (used for parallelizing computations), ATA (used to speed up programs and reduce PC power consumption), 3DNow (expand the multimedia capabilities of a computer), AES (data encryption), as well as many other standards.

In the structure of processor cores, there are usually blocks responsible for storing results in RAM in accordance with the address contained in the instruction.

The core component that controls the interrupt operation of the microcircuit is important. This function allows the processor to ensure the stability of programs in multitasking conditions.

The work of the central processor is also associated with the use of registers. These components are analogous to RAM, but access to them is carried out several times faster. The volume of the corresponding resource is small - as a rule, it does not exceed a kilobyte. Registers are classified into several types. These can be general-purpose components that are involved in performing arithmetic or logical calculations. There are special-purpose registers that can include system data used by the processor during operation.

The processor core also contains various auxiliary components. Which for example? This could be a sensor that keeps track of what the current temperature of the CPU is. If its performance is higher than the established norms, then the microcircuit can send a signal to the modules responsible for the operation of the fans - and they will begin to rotate faster. There is a transition predictor in the kernel structure - a component that is designed to determine which instructions will be executed after the completion of certain cycles of operations performed by the microcircuit. An example of another important component is the command counter. This module fixes the address of the corresponding algorithm, which is transmitted to the microcircuit at the moment it starts executing one or another cycle.

This is the structure of the kernel, which is included in the computer's central processing unit. Let us now study in more detail some of the key characteristics of microcircuits of the corresponding type. Namely: process technology, clock speed, cache memory size, and power consumption.

Processor characteristics: type of process technology

It is customary to associate the development of computer technology with the emergence of new generations of computers with the improvement of computing technologies. At the same time, apart from performance indicators, one of the criteria for classifying a computer as a particular generation can be its absolute size. The very first computers were comparable in size to a multi-storey building. Second-generation computers were comparable in size to, say, a sofa or a piano. Computers of the next level were already very close to those that are familiar to us now. In turn, modern PCs are fourth generation computers.

Actually, what is all this for? The fact is that in the course of the evolution of computers, an unofficial rule was formed: the more technologically advanced the device, the smaller dimensions with the same performance, and even with greater - it has. It is fully valid in relation to the considered characteristics of the central processor, namely, the technical process of its manufacture. In this case, the distance between the individual silicon crystals that form the structure of the microcircuit is important. The smaller it is, the greater the density of the corresponding elements that the CPU board places on itself. All the more productive it, accordingly, can be considered. Modern processors are made according to the 90-14 nm process technology. This indicator tends to gradually decrease.

Clock frequency

CPU clock speed is one of the key performance indicators. It determines how many operations per second the microcircuit can perform. The more there are, the more efficient the processor and the computer as a whole. It can be noted that this parameter characterizes, first of all, the core as an independent module of the central processor. That is, if there are several corresponding components on the microcircuit, then each of them will operate at a separate frequency. Some IT professionals consider it acceptable to sum these characteristics across all cores. What does it mean? If, for example, the processor has 4 cores with a frequency of 1 GHz, then the total PC performance, if you follow this methodology, will be 4 GHz.

Frequency components

The indicator under consideration is formed from two components. First, this is the system bus frequency - it is usually measured in hundreds of megahertz. Secondly, this is the coefficient by which the corresponding indicator is multiplied. In some cases, processor manufacturers give users the ability to adjust both parameters. In this case, if you set sufficiently high values \u200b\u200bfor the system bus and the multiplier, you can significantly increase the performance of the microcircuit. This is how the processor is overclocked. True, you need to use it carefully.

The fact is that overclocking can significantly increase the temperature of the central processor. If an appropriate cooling system is not installed on the PC, this can lead to the failure of the microcircuit.

Cache size

Modern processors are equipped with cache memory modules. Their main purpose is the temporary placement of data, as a rule, represented by a set of special commands and algorithms - those that are used in the operation of the microcircuit most often. What does this give in practice? First of all, the fact that the load on the central processor can be reduced due to the fact that the very commands and algorithms will be available online. The microcircuit, having received ready-made instructions from the cache memory, does not waste time developing them from scratch. As a result, the computer runs faster.

The main characteristic of cache memory is volume. The larger it is, the correspondingly more capacious this module is in terms of the location of the very instructions and algorithms used by the processor. The more likely it is that the microcircuit will each time find the ones you need among them and work faster. The cache memory on modern processors is most often divided into three levels. The first works on the basis of the fastest and most high-tech microcircuits, the rest - slower. The amount of cache memory of the first level on modern processors is about 128-256 KB, the second - 1-8 MB, the third - can exceed 20 MB.

Energy consumption

Another significant parameter of the microcircuit is power consumption. Powering the CPU can be expensive. Modern microcircuit models consume about 40-50 watts. In some cases, this parameter has economic significance - for example, when it comes to equipping large enterprises with several hundred or thousands of computers. But power consumption is no less significant in terms of adapting processors to use on mobile devices - laptops, tablets, smartphones. The lower the corresponding indicator, the longer the battery life of the device will be.

You are currently using a computer or mobile device to read this topic. The computer or mobile device uses a microprocessor to perform these actions. The microprocessor is the heart of any device, server or laptop. There are many brands of microprocessors from a wide variety of manufacturers, but they all do about the same thing and in about the same way.
Microprocessor - also known as a processor or central processing unit, is a computing engine that is built on a single chip. The first microprocessor was the Intel 4004, which appeared in 1971 and was not as powerful. He could add and subtract, and that's only 4 bits at a time. The processor was amazing because it was executed on a single chip. You will ask why? And I’ll answer: engineers at that time made processors either from multiple chips or from discrete components (transistors were used in separate packages).

If you've ever wondered what a microprocessor does in a computer, what it looks like, or what are its differences compared to other types of microprocessors, then go under the cut - there are all the most interesting, and details.

Microprocessor Progress: Intel

The first microprocessor, which later became the heart of the simple home computer, was the Intel 8080, a complete 8-bit computer on a single chip that appeared in 1974. The first microprocessor caused a real surge in the market. Later in 1979, a new model was released - Intel 8088. If you are familiar with the PC market and its history, you know that the PC market moved from Intel 8088 to Intel 80286, and that one to Intel 80386 and Intel 80486, and then to Pentium, Pentium II, Pentium III and Pentium 4. These microprocessors are all made by Intel, and they are all improvements to the basic design of the Intel 8088. The Pentium 4 can execute any code, but it does it 5000 times faster.

In 2004, Intel introduced microprocessors with multiple cores and a million transistors, but even these microprocessors followed the same general rules as previously made chips. Additional information in the table:

  • date: is the year the processor was first introduced. Many processors were re-released, but with higher clock speeds, and this continued for many years after the original release date.
  • Transistors: This is the number of transistors on a chip. You can see that the number of transistors per die has been steadily increasing over the years.
  • Micron: width in microns of the smallest wire on the chip. For comparison, I can cite a human hair, which has a thickness of about 100 microns. As the dimensions were getting smaller and smaller, the number of transistors increased.
  • Clock frequency: the maximum speed the chip can reach. I will tell you about the clock frequency a little later.
  • Width (bus) data: is the width of the ALU (Arithmetic Logic Unit). An 8-bit ALU can add, subtract, multiply, and so on. In many cases, the data bus is the same width as the ALU, but not always. The Intel 8088 was 16-bit and had an 8-bit bus, while modern Pentium models are 64-bit.
  • MIPS: This column in the table stands for displaying the number of operations per second. It is a unit of measure for microprocessors. Modern processors can do so many different things that today's ratings, presented in the table, become meaningless. But you can feel the relative power of microprocessors of those times
From this table you can see that, in general, there is a relationship between clock speed and MIPS (operations per second). The maximum clock speed is a function of the manufacturing processor. There is also a relationship between the number of transistors and the number of operations per second. For example, Intel 8088 clocked at 5 MHz (now 2.5-3 GHz) only runs 0.33 MIPS (about one instruction for every 15 clock cycle). Modern processors can often execute two instructions per clock cycle. This increase is directly related to the number of transistors on the chip, and I will talk about this further.

What is a chip?


A chip is also called an integrated circuit. Usually it is a small, thin piece of silicon on which the transistors that make up the microprocessor have been engraved. A chip can be as small as one inch, but still contain tens of millions of transistors. Simpler processors can have several thousand transistors engraved on a chip just a few square millimeters in size.

How it works



Intel Pentium 4

To understand how a microprocessor works, it would be helpful to look inside and learn about its internals. In the process, you can also learn about assembly language, the native language of the microprocessor, and a lot of what engineers can do to increase processor speed.

The microprocessor executes a collection of machine instructions that tell the processor what to do. Based on the instructions, the microprocessor does three main things:

  • Using its ALU (Arithmetic Logic Unit), the microprocessor can perform mathematical operations. For example, addition, subtraction, multiplication and division. Modern microprocessors are capable of extremely complex operations
  • Microprocessor can move data from one memory location to another
  • The microprocessor can make decisions and move on to a new set of instructions based on those decisions


To put it bluntly, the microprocessor does complex things, but above I described three main activities. The following diagram shows a very simple microprocessor capable of doing these three things. This microprocessor has:

  • Address bus (8, 16, or 32 bits) that sends memory access
  • Data bus (8, 16 or 32 bits) that transfers data to memory or receives data from memory
  • RD (read) and WR (write) tell the memory whether they want to install or get an addressed location
  • Clock line that allows you to view the processor clock sequence
  • A reset line that resets the command counter to zero and restarts execution

Microprocessor memory

Earlier we talked about address and data buses, as well as read and write lines. All of this connects to either RAM (random access memory) or ROM (read only memory or read only memory, ROM) - usually with both. In our microprocessor example, we have a wide 8-bit address bus and an equally wide data bus - also 8 bits. This means that the microprocessor can access 2 ^ 8 to 256 bytes of memory, and can read and write 8 bits of memory at a time. Let's assume that this simple microprocessor has 128 bytes of internal memory starting at address 0 and 128 bytes of RAM starting at address 128.

Random access memory stands for read-only memory. The read-only memory chip is programmed with permanent preset bytes. The bus address tells the RAM chip which byte to get to and fit on the data bus. When the read line changes its state, the read-only memory chip presents the selected byte to the data bus.

RAM stands for RAM, lol. RAM contains a byte of information, and the microprocessor can read or write to these bytes depending on whether the read or write line is signaling. One of the problems that can be found in today's chips is that they forget everything as soon as energy is gone. Therefore, the computer must have RAM.



RAM chip or read-only memory (ROM) chip

By the way, almost all computers contain some amount of RAM. On a personal computer, read-only memory is called BIOS (Basic Input / Output System). When the microprocessor starts up, it starts to execute the instructions it finds in the BIOS. BIOS instructions, by the way, also fulfill their role: they check the hardware, and then all the information goes to the hard disk to create a boot sector. The boot sector is one small program, and the BIOS stores it in memory after reading it from disk. The microprocessor then starts executing the boot sector instructions from RAM. The boot sector program will show the microprocessor what else to take with it from the hard disk into RAM, and then it does all this and so on. This is how the microprocessor loads and runs the entire operating system.

Microprocessor instructions

Even the incredibly simple microprocessor I just described will have a fairly large set of instructions that it can execute. The collection of instructions is implemented in the form of bit patterns, each of which has a different meaning when loaded into the instruction sector. People are not particularly good at remembering bit patterns as they are a collection of short words. By the way, this set of short words is called the processor's assembly language. The assembler can translate words into a bit pattern very easily, and then the assembler's efforts are put into memory for the microprocessor for execution.

Here is a set of assembly language instructions:

  • LOADA mem - load into register with memory address
  • LOADB mem - load into register B from memory address
  • CONB mem - load constant value into register B
  • SAVEB mem - save register B to memory address
  • SAVEC mem - save register C to memory address
  • ADD - add A and B and save the result to C
  • SUB - subtract A and B and store the result in C
  • MUL - multiply A and B and store the result in C
  • DIV - split A and B and store the result in C
  • COM - compare A and B and save the result in the test
  • JUMP addr - go to address
  • JEQ addr - go, if equal, to solve
  • JNEQ addr - go, if not equal, to solve
  • JG addr - go, if more, for solution
  • JGE addr - go if greater or equal to solve
  • JL addr - go, if less, to solve
  • Jle addr - go if less or equal to solve
  • STOP - stop execution
Assembly language
The C compiler translates this C code into assembly language. Assuming that RAM starts at address 128 in this processor, and read-only memory (which contains the assembly language program) starts at address 0, then for our simple microprocessor, the assembler might look like this:

// Assume a is at address 128 // Assume F is at address 1290 CONB 1 // a \u003d 1; 1 SAVEB 1282 CONB 1 // f \u003d 1; 3 SAVEB 1294 LOADA 128 // if a\u003e 5 the jump to 175 CONB 56 COM7 JG 178 LOADA 129 // f \u003d f * a; 9 LOADB 12810 MUL11 SAVEC 12912 LOADA 128 // a \u003d a + 1; 13 CONB 114 ADD15 SAVEC 12816 JUMP 4 // loop back to if17 STOP

Read only memory (ROM)
So the question now is, "How do all of these instructions integrate with read-only memory?" I'll explain, of course: each of these assembly language instructions must be represented as a binary number. For simplicity, let's assume that each assembly language instruction assigns itself a unique number. For example, it would look like this:

  • LOADA - 1
  • LOADB - 2
  • CONB - 3
  • SAVEB - 4
  • SAVEC mem - 5
  • ADD - 6
  • SUB - 7
  • MUL - 8
  • DIV - 9
  • COM - 10
  • JUMP addr - 11
  • JEQ addr - 12
  • JNEQ addr - 13
  • JG addr - 14
  • JGE addr - 15
  • JL addr - 16
  • Jle addr - 17
  • STOP - 18
These numbers will be known as opcodes. In read-only memory, our little program will look like this:

// Assume a is at address 128 // Assume F is at address 129Addr opcode / value0 3 // CONB 11 12 4 // SAVEB 1283 1284 3 // CONB 15 16 4 // SAVEB 1297 1298 1 // LOADA 1289 12810 3 // CONB 511 512 10 // COM13 14 // JG 1714 3115 1 // LOADA 12916 12917 2 // LOADB 12818 12819 8 // MUL20 5 // SAVEC 12921 12922 1 // LOADA 12823 12824 3 // CONB 125 126 6 // ADD27 5 // SAVEC 12828 12829 11 // JUMP 430 831 18 // STOP

You can see that 7 lines of C code became 18 lines of assembler, and that all became 32 bytes in read only memory.

Decoding
The decode instruction must turn each of the opcodes into a set of signals that will drive various components inside the microprocessor. Let's take ADD's instructions as an example and see what it has to do. So:

  • 1. In the first clock cycle, you need to load the instruction itself, so the decoder needs to: activate the buffer for the command counter by three states, activate the read line (RD), activate data in the three states of the buffer in the command register
  • 2. In the second clock cycle, the ADD instruction is decoded. There is very little to do here: set the arithmetic logic unit (ALU) operation to register C
  • 3. During the third cycle, the program counter increases (in theory, this can overlap in the second cycle)
Each instruction can be broken down into a set of sequenced operations - such as we just looked at. They manipulate the microprocessor components in the correct order. Some instructions, such as the ADD instruction, may take two to three clock cycles. Others may take five or six measures.

Let's come to the end


The number of transistors has a huge impact on processor performance. As you can see above, a typical Intel 8088 microprocessor could complete 15 cycles. The more transistors, the higher the performance - it's simple. The large number of transistors also allows technology such as pipelining.

The pipeline architecture consists of command execution. It can take five cycles to execute one instruction, but there cannot be five instructions at different stages of execution at the same time. So it looks like one instruction completes each clock cycle.

All of these trends are allowing the number of transistors to grow, resulting in the multi-million dollar transistor heavyweights that are available today. Such processors can perform about a billion operations per second - just imagine. By the way, now many manufacturers have become interested in the release of 64-bit mobile processors and obviously another wave is coming, only this time 64-bit architecture is the king of fashion. Maybe I'll get to this topic in the near future and tell you how it actually works. This, perhaps, is all for today. I hope you found it interesting and learned a lot.

The tool is simpler than a machine. Often, the tool is used by hand, and the machine is driven by steam or an animal.

Charles Babbage

A computer can also be called a machine, only instead of steam power there is electricity. But programming has made the computer as simple as any tool.

The processor is the heart / brain of any computer. Its main purpose is arithmetic and logical operations, and before plunging into the jungle of the processor, you need to understand its main components and how they work.

The two main components of a processor

Control device

A control unit (CU) helps the processor control and execute instructions. The UU tells the components exactly what to do. In accordance with the instructions, it coordinates work with other parts of the computer, including the second main component - the arithmetic logic unit (ALU). All instructions are first sent to the control device.

There are two types of UU implementation:

  • UU on rigid logic (English hardwired control units). The nature of the work is determined by the internal electrical structure - the device of the printed circuit board or crystal. Accordingly, the modification of such a CU without physical intervention is impossible.
  • CU with microprogram control (English microprogrammable control units). It can be programmed for various purposes. The program part is stored in the CU memory.

Hard logic controllers are faster, but microprogrammed controllers have more flexible functionality.

Arithmetic logic unit

This device, oddly enough, performs all arithmetic and logical operations, such as addition, subtraction, logical OR, etc. ALU consists of logical elements that perform these operations.

Most logic gates have two inputs and one output.

Below is a diagram of a half adder, which has two inputs and two outputs. A and B here are inputs, S is an output, C is a transfer (to the most significant bit).

Arithmetic half adder circuit

Information storage - registers and memory

As mentioned earlier, the processor executes the commands coming to it. Commands in most cases work with data that can be intermediate, input or output. All of this data, along with instructions, is stored in registers and memory.

Registers

Register is the minimum cell of data memory. Registers are made up of latches / flip-flops. Triggers, in turn, consist of logical elements and can store 1 bit of information.

Approx. transl. Triggers can be synchronous or asynchronous. Asynchronous ones can change their state at any time, and synchronous ones only during a positive / negative edge at the synchronization input.

By their functional purpose, triggers are divided into several groups:

  • RS-flip-flop: retains its state at zero levels at both inputs and changes it when setting one at one of the inputs (Reset / Set - Reset / Set).
  • JK-flip-flop: identical to the RS-flip-flop except that when the units are fed to two inputs at once, the flip-flop changes its state to the opposite (counting mode).
  • T-flip-flop: changes its state to the opposite at each clock cycle at its only input.
  • D-flip-flop: remembers the state at the input at the time of synchronization. Asynchronous D flip-flops are meaningless.

RAM is not suitable for storing intermediate data, since it will slow down the processor. Intermediate data is sent to registers over the bus. They can store commands, output data, and even addresses of memory cells.

The principle of operation of the RS-trigger

Memory (RAM)

RAM (random access memory, English RAM) is a large group of these same registers connected together. The memory of such a storage is volatile and data from there disappear when the power is turned off. RAM takes the address of the memory location where the data needs to be placed, the data itself, and a write / read flag that triggers the triggers.

Approx. transl. Random access memory is static and dynamic - SRAM and DRAM, respectively. In static memory, cells are triggers, and in dynamic memory, capacitors. SRAM is faster and DRAM is cheaper.

Commands (instructions)

Commands are the actual actions that the computer should take. They are of several types:

  • Arithmetic: addition, subtraction, multiplication, etc.
  • brain teaser: AND (logical multiplication / conjunction), OR (logical sum / disjunction), negation, etc.
  • Information: move, input, outptut, load and store.
  • Jump commands: goto, if ... goto, call and return.
  • Stop command: halt.

Approx. transl. In fact, all arithmetic operations in ALU can be created on the basis of only two: addition and shift. However, the more basic operations an ALU supports, the faster it is.

Instructions are provided to the computer in assembly language or generated by a high-level language compiler.

In the processor, instructions are implemented at the hardware level. In one cycle, a single-core processor can execute one elementary (basic) instruction.

A group of instructions is usually called an instruction set.

CPU clock

The speed of a computer is determined by the clock speed of its processor. Clock frequency - the number of clock cycles (respectively, and executable commands) per second.

The frequency of current processors is measured in GHz (Gigahertz). 1 GHz \u003d 10 Hz - billion operations per second.

To reduce the program execution time, you need to either optimize (reduce) it, or increase the clock frequency. Some processors have the ability to increase the frequency (overclock the processor), but such actions physically affect the processor and often cause overheating and failure.

Follow instructions

The instructions are stored in RAM in sequential order. For a hypothetical processor, an instruction consists of an opcode and a memory / register address. Inside the control device there are two instruction registers, into which the command code and the address of the currently executable command are loaded. The processor also has additional registers that store the last 4 bits of the executed instructions.

Below is an example of a command set that sums two numbers:

  1. LOAD_A 8. This command saves data in RAM, say<1100 1000> ... The first 4 bits are the opcode. It is he who determines the instruction. This data is placed in the UU instruction registers. The instruction is decoded into a load_A instruction - put data 1000 (the last 4 bits of the instruction) into register A.
  2. LOAD_B 2. The situation is similar to the past. This places the number 2 (0010) in register B.
  3. ADD B A. The command adds two numbers (more precisely, adds the value of register B to register A). The UU tells the ALU to perform the sum operation and put the result back into register A.
  4. STORE_A 23. We save the value of register A to memory location 23.

These are the operations needed to add two numbers.

Tire

All data between the processor, registers, memory and I / O devices (input-output devices) is transferred over buses. To load newly processed data into memory, the processor places an address on the address bus and data on the data bus. Then you need to give permission to write on the control bus.

Cache

The processor has a mechanism for storing instructions in the cache. As we learned earlier, a processor can execute billions of instructions in a second. Therefore, if each instruction were stored in RAM, then its removal from there would take longer than processing it. Therefore, to speed up the work, the processor stores part of the instructions and data in the cache.

If the data in the cache and memory do not match, then they are marked with dirty bits.

Flow of instructions

Modern processors can process multiple instructions in parallel. While one instruction is in the decoding stage, the processor may have time to receive another instruction.

However, this solution is only suitable for instructions that are independent of each other.

If the processor is multi-core, it means that it actually contains several separate processors with some shared resources, such as cache.

CPU - This is the main working component of a computer that performs arithmetic and logical operations, controls the computing process and coordinates the work of all computer devices.

The central processor generally contains:

    arithmetic logic unit;

    data buses and address buses;

    registers;

    command counters;

    cache - very fast small memory,

    mathematical coprocessor of floating point numbers.

Modern processors are made in the form of microprocessors. Physically, a microprocessor is an integrated circuit - a thin rectangular plate of crystalline silicon with an area of \u200b\u200bonly a few square millimeters, on which circuits that implement all the processor's functions are located. The crystal strip is usually housed in a plastic or ceramic flat case and connected with gold wires to metal pins so that it can be attached to the computer's motherboard.

Main characteristics of the processor:

    Performance is the main characteristic showing the speed at which a computer performs information processing operations. It, in turn, depends on the following characteristics:

    Clock frequency - determines the number of processor cycles per second

    Bit depth - determines the size of the minimum piece of information called a machine word

    Address space - the width of the address bus, that is, the maximum amount of RAM that can be installed on a computer

8.2.3. The principle of the processor.

The processor is the main element of the computer. He directly or indirectly controls all devices and processes occurring in the computer.

The design of modern processors clearly shows a tendency to constantly increase the clock frequency. This is natural: the more operations the processor performs, the higher its performance. The limiting clock frequency is largely determined by the existing technology for the production of microcircuits (the smallest achievable element sizes, which determine the minimum signal transmission time).

In addition to increasing the clock frequency, an increase in processor performance is achieved by developers using less obvious methods associated with the invention of new architectures and information processing algorithms. We will consider some of them on the example of the Pentium processor (P5) and subsequent models.

Let's list the main features of the Pentium processor:

    conveyor processing of information;

    superscalar architecture;

    the presence of separate caches for commands and data;

    the presence of a prediction block of the branch address;

    dynamic program execution;

    the presence of a floating point computation unit;

    support for multiprocessor operation;

    availability of error detection tool.

The term "superscalar architecture" means that a processor contains more than one computational unit. These computational units are more commonly referred to as pipelines. Note that the first superscalar architecture was implemented in the domestic Elbrus-1 computer (1978).

The presence of two pipelines in the processor allows it to simultaneously execute (complete) two commands (instructions).

Each pipeline divides the command execution process into several stages (for example, five):

    fetching (reading) commands from RAM or cache memory;

    decoding (decryption) of the command, i.e. determining the code of the operation being performed;

    command execution;

    memory access;

    storing the results in memory.

For the implementation of each of the listed stages (each operation), a separate device-stage is used. Thus, there are five stages in each pipeline of the Pentium processor.

In pipelining, one clock of the synchronizing (clock) frequency is allocated for each stage. In each new cycle, the execution of one command ends and the execution of a new command begins. This execution of commands is called streaming.

Figuratively, it can be compared to a production conveyor (flow), where the same operation is always performed with different products at each site. Moreover, when the finished product leaves the conveyor, a new one comes into it, and the rest of the products at this time are at different stages of readiness. The transition of manufactured products from section to section should occur synchronously, according to special signals (in the processor, these are cycles generated by a clock generator).

The total execution time of one instruction in a five-stage pipeline would be five clock cycles. In each clock cycle, the pipeline will simultaneously process (execute) five different instructions. As a result, five commands will be executed in five clock cycles. Thus, pipelining increases processor performance, but it does not reduce the execution time of an individual instruction. The benefit comes from the fact that several teams are processed at once.

In fact, pipelining even increases the execution time of each individual instruction due to the additional overhead associated with organizing the pipeline. In this case, the clock frequency is limited by the operating speed of the slowest stage of the conveyor.

As an example, consider the process of executing an instruction in which the execution times of the stages are 60, 30, 40, 50 and 20 ns. Let's assume the additional cost of organizing pipelining is 5 ns.

If there was no pipelining, then the execution of one command took

60 + 30 + 40 + 50 + 20 \u003d 200 ns.

If a pipelined organization is used, then the cycle time should be equal to the duration of the slowest processing stage with the addition of "overhead" costs, i.e. 60 + 5 \u003d 65 ns. Thus, the resulting reduction in command execution time as a result of pipelining will be 200/65 "3.1 times.

Note that the execution time of one command by the pipeline is 5 × 65 \u003d 325 ns. This value is significantly more than 200 ns - the time of command execution without pipelining. But the simultaneous execution of five commands at once gives an average completion time of one command of 65 ns.

The Pentium processor has two L1 caches (located inside the processor). As you know, caching increases processor performance by reducing the number of waiting times for information to arrive from slow RAM. The necessary data and instructions are taken by the processor from the fast cache memory (buffer), where they are stored in advance.

The presence of a single cache in previous processor designs led to structural conflicts. The two instructions executed by the pipeline sometimes tried to read information from a single cache at the same time. Performing separate caching (buffering) for commands and data eliminates such conflicts by allowing both commands to run simultaneously.

The development of computer technology is ongoing. Designers are constantly looking for new ways to improve their products. The most valuable resource of processors is their performance. For this reason, various techniques have been invented to improve processor performance.

One of these techniques is to save time by predicting possible paths for executing a branching algorithm. This is done using a block for predicting the address of the future branch. Its idea of \u200b\u200bwork is similar to the idea of \u200b\u200bcache memory.

As you know, there are linear, cyclical and branching computational processes. In linear algorithms, commands are executed in the order they are written in RAM: sequentially one after another. For such algorithms, the branch address prediction block introduced into the processor cannot give a gain.

In branching algorithms, the choice of command is determined by the results of checking the branching conditions. If you wait for the end of the computational process at the branch point and then select the required command from RAM, then there will inevitably be a loss of time due to unproductive idle of the processor (reading the command from RAM is slow).

The branch address prediction block works ahead of time and tries to predict the branch address in advance in order to transfer the required instruction from slow RAM to a special fast branch buffer BTB (Branch Target Buffer) in advance.

When the BTB buffer contains correct prediction, the transition occurs without delay. This is similar to how the cache memory works, which also has misses. Due to misses, the operands have to be read not from the cache memory, but from the slow RAM. This is a waste of time.

The idea of \u200b\u200bprediction of the branch address is implemented in the processor by two independent prefetch buffers. They work in conjunction with the branch prediction buffer, where one of the buffers selects instructions sequentially, and the second one - according to VTB predictions.

The Pentium processor has two five-stage fixed-point pipelines. In addition, the processor has an eight-stage floating point pipeline. Such calculations are required when performing mathematical calculations, as well as for fast processing of dynamic 3D color images.

The development of processor architecture follows the path of a constant increase in the amount of cache memory of the first and second levels. The exception is the Pentium 4 processor, which has unexpectedly decreased cache size compared to the Pentium III.

To improve performance, new processor designs create two system buses operating at different clock frequencies. The fast bus is used to work with L2 cache, and the slow bus is used for traditional exchange of information with other devices, such as RAM. The presence of two buses eliminates conflicts when exchanging information between the processor and the main memory and cache memory of the second level located outside the processor die.

The next Pentium processors contain a large number of stages in the pipeline. This reduces the execution time of each operation in a separate stage, which means it allows you to raise the processor clock speed.

The Pentium Pro (P6) processor takes a new approach to the order of execution of instructions sequentially located in RAM.

The new approach is to execute commands in random order as soon as they are ready (regardless of the order in RAM). However, the final result is always formed in accordance with the original order of commands in the program. This order of command execution is called dynamic or anticipatory.

As an example, consider the following piece of a curriculum written in some (fictional) machine-oriented language.

r1 ¬mem Team 1

r3 ¬r1 + r2 Command 2

r5 ¬r5 + 1 Team 3

r6 ¬r6 - r7 Command 4

Symbols r1 ... r7 denote general purpose registers (RON), which are included in the processor register block.

The mem denotes a RAM cell.

Let's comment on the recorded program.

Command 1: write to RON r1 the contents of the RAM memory cell, the address of which is specified in RON r4.

Command 2: write to RON r3 the result of adding the contents of the registers r1 and r2.

Command 3: add one to the contents of register r5.

Command 4: decrease the content of RON r6 by the content of register r7.

Suppose that when command 1 was executed (loading an operand from memory into a general register r1), it turned out that the contents of the mem memory cell were not in the processor's cache (there was a miss, the required operand was not delivered to the buffer from RAM in advance).

In the traditional approach, the processor will start executing instructions 2, 3, 4 only after the data from the mem cell of the main memory have entered the processor (more precisely, into the r1 register). Since the reading will take place from a slow working memory, this process will take a lot of time (by the standards of the processor). While waiting for this event, the processor will be idle, not performing useful work.

In the above example, the processor cannot execute instruction 2 until instruction 1 completes, since instruction 2 uses the results of instruction 1. At the same time, the processor could have previously executed instructions 3 and 4, which do not depend on the result of executing instructions 1 and 2.

In such cases, the P6 processor works differently.

The P6 processor does not wait for the completion of the execution of instructions 1 and 2, but immediately proceeds to the extraordinary execution of instructions 3 and 4. The results of the advanced execution of instructions 3 and 4 are saved and retrieved later, after the execution of instructions 1 and 2. Thus, the processor P6 executes the instructions in accordance with their readiness for execution, regardless of their original location in the program.

Productivity is undoubtedly an important indicator of computer performance. However, it is equally important that fast computations occur with fewer errors.

The processor has a self-test device that automatically checks the functionality of most of the processor elements.

In addition, the detection of failures that occurred inside the processor is carried out using a special data format. A parity bit is added to each operand, as a result of which all numbers circulating inside the processor become even. The appearance of an odd number signals that a failure has occurred. The presence of an odd number is like the appearance of a counterfeit banknote without watermarks.

The units for measuring the speed of processors (and computers) can be:

    MIPS (Mega Instruction Per Second) - one million instructions (instructions) over numbers with a fixed point per second;

    MFLOPS (MFLOPS- Mega Floating Operation Per Second) - one million operations on floating point numbers per second;

    GFLOPS (Giga Floating Operation Per Second) - a billion operations on floating point numbers per second.

There are reports of the world's fastest computer ASCI White (corporation IBM), the speed of which reaches 12.3 teraflops (trillion operations).

One of the most important elements of a computer is the processor, which is responsible for the speed of the PC. The technical progress that took place over the years has led to the fact that it was possible to connect into a single whole billions of transistors that produce an image on the screen.

The capabilities of computers are very great. However, no matter what purpose the computer will be used for, all this is the result of the processor. The processor collects commands from the user and programs, processes them and sends them to the necessary elements of the PC. The processor can be called the brain of the computer. It is a control center that constantly processes numbers to complete tasks.

Components

A modern processor contains several types of equipment. Actuators are designed for calculations. Controls are required in order for the execution equipment to correctly recognize commands and process information.

The registers are designed to store subtotals. Almost all commands use register information. The information bus carries out the functions of combining the CPU with the rest of the PC hardware. It is the bus that transfers files to the central processor and displays the results of calculations.

The processor cache is required for the CPU to quickly jump to frequently used instructions and files. This is a high-speed memory located in a crystal of the central processor. Also, the CPU has additional modules that are required for special calculations.

Frequency

The speed of the PC is directly related to the frequency of the central processor,
which is measured in megahertz. The pulses for the CPU and buses are created by a clock generator, which is based on a quartz resonator, which is located on the motherboard. The main element of the resonator is a quartz crystal, which is embedded in a tin frame.

Under voltage, electric oscillations appear in the crystal. Their frequency varies with the shape and size of the crystal. Then the signal is transmitted to the generator, where it is converted into ordered pulses of one or more frequencies, if the buses are of different frequencies.

The clock frequency is designed to synchronize all elements of the PC. This means that the transmitting equipment must work synchronously with the receiving equipment. This is achieved when all equipment operates on one signal, which connects all elements and allows you to get a single whole.

The smallest unit of time for a CPU is a cycle. Any action requires at least one measure. The exchange of information with the RAM is performed in several clock cycles, which also include idle cycles.

Different teams need their own number of clock cycles, so comparing PCs only by frequency is not quite the right decision. With equal parameters, it is possible to compare the PC by frequency. But this must be done very carefully, since various factors can influence it. As a result, it may happen that a PC with a lower frequency will run faster than a PC with a higher frequency.

What else determines CPU productivity

In most cases, this is determined by the bitness of the components that are processed immediately. The processor includes three main elements, for which the main indicator is bitness. These are the information exchange bus, built-in registers and the memory address bus.

How much can you raise the frequency?

The speed of the CPU can be easily increased by increasing the frequency. However, do not forget that the chip can overheat. As the frequency rises, the processor's energy consumption and heating increase. In addition, increasing the frequency can increase the degree of electromagnetic interference. In other words, increasing the frequency will not increase the productivity of the CPU.

Data bus

These are connections that are meant to exchange information. The number of immediately incoming signals on the bus affects the amount of data that can move along it in a given time. For a better understanding, the tire bitness can be equated to a highway with lanes. More of them increases the throughput.

Bus width

As mentioned above, this parameter can be represented as a highway. If there is only one lane, then the bandwidth will be bad. To increase bandwidth, you need to add bandwidth in both directions. A 16-bit bus can be represented as a two-lane bus, since the bus can pass two bytes of data in a certain amount of time.

Address bus

This element is a set of connections through which the address of the memory department is moved, where information is written and read. According to the principle of a bus of information, here one bit of the address passes through each connection, which corresponds to one digit. Increasing connections leads to the availability of more memory sections to the processor.

The address bus can be thought of as a building numbering system. The number of lanes in the bus corresponds to the number of digits in the building number. If no more than 2 digits are allowed in the building number, then the number of buildings will be no more than one hundred. If you insert one number into the number, then the number of addresses will grow to 103. The PC uses a binary calculation system, so the number of memory cells is 2.

The bus address and information do not depend on each other, so the developers set them the bit rate at will. This indicator is one of the most important. It turns out that the number of bits in the data bus sets the amount of data that the CPU can process in one clock cycle, and the bitness of the address bus is the amount of memory that it can process.

Built-in registers

The amount of data that the central processor can process in a certain time is the size of the built-in registers. This is a very fast processor operating system that can be used to save information and intermediate results of miscalculations. For example, the CPU can add the numbers of two registers, and move the answer to the third.

Why is the processor heating

Each CPU contains many small transistors. Their number affects the clock speed and power consumption. Laptop processors consume little power. Computer processors can consume an order of magnitude higher. As a result, a large amount of heat is generated, which must be removed from the CPU. For this it is necessary to use a special cooling system.

There are several methods to reduce energy consumption. Some modules may turn off, frequency and voltage decrease as the load on the processor decreases. You can also shrink the processor components. But thin elements have a significant disadvantage - leaks and pickups appear in them. This generates heat.

In addition, modern materials can be used. There are also processors that operate at low voltage. In this case, the change in power directly depends on the voltage. When the voltage is reduced by 10%, the energy consumption is reduced by 20%.

How You Can Increase Processor Productivity

Several technologies can be applied to increase the speed of computation. It is necessary to speed up access to RAM and memory. If the CPU will quickly receive information and commands from memory, then less time will be spent on idle time. It turns out that the high-speed bus raises the speed of the computer.

You also need to have a fast cache. Processors store the results of their calculations in their memory. The cache frequency is equal to the CPU frequency, so it functions faster than the RAM.

Most CPUs have three cache levels. The L1 level is the fastest, but the smallest in size. L2 and L3 levels are much larger, but at the same time they work at a much lower speed, but still functions faster than the RAM. Information and commands are quickly transferred from the cache, which loads the processor as much as possible, and at the same time there is no need to stand idle, waiting for data from the RAM.

If the processor lacks its own cache, then it works with the RAM or the hard drive, which significantly reduces the productivity of the computer. It turns out that a large memory size is a very important parameter.

Conveyor processing. To increase the speed of executing instructions, processors create pipelines in them, in which orderly execution of instructions is carried out in various elements of the processor. The advantage of this method is that with the pipeline, the processor carries out not one command at a specific time interval, but several - as far as the pipeline is calculated.

The length of the pipeline affects the size of the clock frequency. But a long pipeline is not always an advantage, because if a prefetch error occurs or some situation occurs during code processing, the processor will have to flush all the data from the pipeline and load it again, which increases the running time.

In addition, you can prefetch commands and information. In this case, when executing any command, the processor tries to predict the next commands. This allows the pipeline to load faster, since there is no need to wait while the previous commands are executed. If the selected commands turned out to be erroneous, then the necessary commands and information must be searched again, while the conveyor is completely cleared and loaded again.

Parallel computing. Modern computers can have multiple cores, which can simulate multiple processors in an operating system. If the computer application is capable of supporting parallel computations, then they can be executed immediately. But multi-core processors have a certain drawback - high power consumption, which leads to fast and strong heating, and this requires a good cooling system.

Algorithms for working with multimedia content are also important. In most cases, these algorithms work on the SIMD principle. Processors with this technology are able to quickly process information that requires multiple execution of the same instructions. Video playback and graphics processing are suitable for this.

How does he still work

It is worth considering how the processor works. Below is a description of this process, but it will be simplified since only the functions of large elements without technical features will be indicated.

The processor begins to function upon receipt of the command. The fetch block, having an idea of \u200b\u200bthe command location address, tries to find it in the first level cache. If it is absent, then it goes to the second level cache, which is more massive in size than the first. If it is absent here, then it goes to the third level of the cache. If there is no command in it, then the CPU loads it from the RAM via the bus, while placing it in all of its caches. The information required to run a command is loaded in the same way.

The team then moves with the help of the sample department to the decoder. This node is required to divide large commands into a larger number of smaller ones, while each operation in executable devices will be performed in one clock cycle. The decoder transfers the finished order of small actions to the memory of decoded instructions.

Further, the fetch block needs one more command. To understand where to get another command and information for it, a prefetch block is used. After analyzing the order of actions, he is able to determine the next command.

Then the scheduler selects several operations from the decoded instructions memory and finds out their order of execution. If the calculations of some commands do not affect the results of others, then they can be performed on parallel execution tools. There are quite a few similar modules in the CPU core.

At this stage, the prefetch error can be determined. For instance,
the performed action is a conditional jump command, then the prefetch block, without the ability to find out the register value during the execution of the command, mistakenly assumes that the jump has been completed and issue the wrong address of another command to the fetch block.

A similar situation happens with information prefetching. If, during the execution of the command to load information, the values \u200b\u200bof the registers with the information address are different from the moment of prefetching, then an error will appear, since the cache contains incorrect files.

After that, the pipeline is reset and the fetch block is re-requested for the command that was before the error was received. Resetting and loading the pipeline again leads to an increase in command processing time. If a lot of prefetch failures occur during operation, the processor performance drops significantly. However, in modern CPUs, prefetch works with 95% efficiency.

If the command is executed correctly at the exit from the pipeline, then the result obtained is stored in the cache, and then transferred to the CPU RAM.

That, in principle, is all that an average user needs to know about processors and how they work.