A bottleneck in a computer. Bottleneck - How Not to Bottom in Business? A.1. "How does it work? Simpler, please!"

Technological progress does not move evenly in all areas, this is obvious. In this article, we will consider which nodes at what time improved their characteristics more slowly than others, becoming a weak link. So, today's topic is the evolution of weak links - how they emerged, influenced, and how they were eliminated.

CPU

Since the earliest personal computers, most of the computation has been done by the CPU. This was due to the fact that the chips were not very cheap, so most of the peripherals used processor time for their needs. And the periphery itself was then very few. Soon with the expansion of the scope of the PC, this paradigm was revised. The time has come for various expansion cards to flourish.



In the days of "kopecks" and "three rubles" (these are not Pentiums II and III, as young people can solve, but processors i286 and i386), the tasks were not very complicated for systems, mainly office applications and calculations. Expansion cards have already partially offloaded the processor, for example, the MPEG decoder, which decrypts files compressed in MPEG, did it without the participation of the CPU. A little later, standards began to be developed that less loaded the processor when exchanging data. An example was the PCI bus (introduced starting with the i486), which used to load the processor to a lesser extent. Other examples include PIO and (U) DMA.


Processors increased their power at a good pace, a multiplier appeared, since the system bus speed was limited, and the cache was used to mask requests in RAM operating at a lower frequency. The processor was still the weak link, and the speed of work almost entirely depended on it.



Meanwhile, Intel, after releasing a good Pentium processor, is releasing a new generation - Pentium MMX. She wanted to change the state of affairs and transfer the calculations to the processor. The MMX instruction set - MultiMedia eXtensions, which was designed to speed up work with audio and video processing, helped a lot in this. With its help, mp3 music was played normally, and it was possible to achieve acceptable MPEG4 playback using the CPU.

The first congestion in the tire

Systems based on the Pentium MMX processor were already more limited by the memory bandwidth (memory bandwidth). The 66 MHz bus was a bottleneck for the new processor, despite the move to a new type of SDRAM that improved performance per megahertz. For this reason, bus overclocking became very popular, when the bus was set to 83 MHz (or 75 MHz) and a very noticeable increase was obtained. Often, even a lower final processor frequency was compensated for by a higher bus frequency. For the first time, more speed was achieved at a lower frequency. Another bottleneck was the amount of RAM. For SIMM-memory it was a maximum of 64 MB, but more often it was 32 MB or even 16. This greatly complicated the use of programs, since each new version of Windows, as you know, likes to "eat a lot of tasty frames" (c). There are also rumors circulating about a conspiracy between memory manufacturers and Microsoft.



Meanwhile, Intel began to develop the expensive and therefore not very popular Socket8 platform, and AMD continued to develop Socket7. Unfortunately, the latter used a slow FPU (Floating Point Unit - a module for operations with fractional numbers), created by the then newly purchased company Nexgen, which entailed lagging behind the competitor in multimedia tasks - primarily games. The transfer to the 100 MHz bus gave the processors the necessary memory bandwidth, and the full-speed L2 cache of 256 KB on the AMD K6-3 processor improved the situation so much that now the system speed was characterized only by the processor frequency, not the bus. Although, in part, this was due to the slow FPU. Office applications that rely on ALU power were faster than competing solutions thanks to the fast memory subsystem.

Chipsets

Intel ditched the expensive Pentium Pro, which had an L2 cache die integrated into the processor, and released the Pentium II. This CPU had a core very similar to the Pentium MMX core. The main differences were the L2 cache, which was located on the processor cartridge and worked at half the core frequency, and the new AGTL bus. With the help of new chipsets (in particular, i440BX), we managed to increase the bus frequency to 100 MHz and, accordingly, the memory bandwidth. In terms of efficiency (ratio of random read speed to theoretical), these chipsets have become one of the best, and to this day Intel has not been able to beat this indicator. The i440BX series chipsets had one weak link - the south bridge, the functionality of which no longer met the requirements of that time. We used an old south bridge from the i430 series used in systems based on Pentium I. This circumstance, as well as the connection between chipsets via the PCI bus, prompted manufacturers to release hybrids containing the i440BX north bridge and VIA south bridge (686A / B).



In the meantime, Intel is demonstrating DVD movie playback without auxiliary cards. But the Pentium II did not receive much recognition due to its high cost. The need to produce cheap analogs became obvious. The first attempt - Intel Celeron without L2 cache - was unsuccessful: in terms of speed, Covington's were very much inferior to competitors and did not justify their prices. Then Intel makes a second attempt, which turned out to be successful - the Mendocino core, loved by overclockers, which has half the cache size (128 KB versus 256 KB for the Pentium II), but runs at twice the frequency (at the processor frequency, not half slower than the Pentium II). Due to this, the speed in most tasks was not lower, and the lower price attracted buyers.

First 3D and bus again

Immediately after the release of the Pentium MMX, the popularization of 3D technologies began. At first, these were professional applications for developing models and graphics, but the real era was opened by 3D games, or rather, Voodoo 3D accelerators, created by 3dfx. These accelerators were the first mainstream cards for creating 3D scenes, which offloaded the processor when rendering. It was from this time that the evolution of three-dimensional games began. Quite quickly, the calculation of the scene by the forces of the central processor began to lose to that performed by the means of the video accelerator, both in speed and quality.



With the advent of a new powerful subsystem - the graphics one, which has become to compete with the central processor in the amount of calculated data, a new bottleneck has emerged - the PCI bus. In particular, Voodoo 3 and older cards got an increase in speed just by overclocking the PCI bus to 37.5 or 41.5 MHz. Obviously, there is a need to provide video cards with a fast enough bus. AGP - Accelerated Graphics Port - became such a bus (or rather, a port). As the name suggests, this is a dedicated graphics bus, and by specification it could only have one slot. The first AGP version supported AGP 1x and 2x speeds, which corresponded to 1X and 2X PCI 32/66 speeds, that is, 266 and 533 MB / s. The slow version was added for compatibility, and it was with it that problems arose for quite a long time. Moreover, there were problems with all chipsets, except for those released by Intel. According to rumors, these problems were related to the fact that only this company had a license and its obstacle to the development of the competing Socket7 platform.



AGP made things better and the graphics port was no longer a bottleneck. Video cards switched to it very quickly, but the Socket7 platform suffered from compatibility problems almost to the very end. Only the latest chipsets and drivers were able to improve this situation, but even then there were some nuances.

And the screws there too!

It's time for Coppermine, frequencies have increased, performance has grown, new video cards have improved performance and increased pipelines and memory. The computer has already become a multimedia center - they played music and watched films on it. Integrated sound cards, weak in characteristics, are inferior to SBLive! Which have become the popular choice. But something prevented the complete idyll. What was it?



This factor was the hard drives, the growth of which slowed down and stopped at around 40 GB. For collectors of films (then MPEG4), this caused a problem. Soon the problem was resolved, and rather quickly - the disks grew to 80 GB and more and stopped worrying most of the users.


AMD is releasing a very good platform - Socket A and a K7 architecture processor called Athlon (technical name Argon) by marketers, as well as a budget Duron. The Athlons' strengths were the bus and the powerful FPU, which made him an excellent processor for serious calculations and games, leaving his competitor - Pentium 4 - the role of office machines, where, however, powerful systems were never required. Early Durons had a very small cache size and bus speed, which made it difficult to compete with Intel Celeron (Tualatin). But due to better scalability (due to a faster bus), they responded better to an increase in frequency, and therefore older models were already quietly outperforming Intel solutions.

Between two bridges


During this period, two bottlenecks appeared at once. The first is the bus between the bridges. Traditionally, PCI has been used for these purposes. It is worth remembering that PCI in the version used in desktop computers has a theoretical bandwidth of 133 MB / s. In fact, the speed depends on the chipset and the application, and ranges from 90 to 120 MB / s. In addition, the bandwidth is shared among all devices connected to it. If we have two IDE channels with a theoretical bandwidth of 100 Mb / s (ATA-100) connected to a bus with a theoretical bandwidth of 133 Mb / s, then the problem is obvious. LPC, PS / 2, SMBus, AC97 have low bandwidth requirements. But Ethernet, ATA 100/133, PCI, USB 1.1 / 2.0 already operate at speeds comparable to the inter-bridge interface. There was no problem for a long time. USB was not used, Ethernet was needed infrequently and mostly at 100 Mbps (12.5 Mbps), and hard drives could not even come close to the maximum interface speed. But time passed and the situation changed. It was decided to make a special inter-hub (between bridges) bus.


VIA, SiS and Intel have released their tire options. They differed, first of all, in carrying capacity. They began with PCI 32/66 - 233 Mb / s, but the main thing was done - the PCI bus was allocated only for own devices, and through it there was no need to transfer data to other buses. This improved the speed of working with peripherals (relative to the bridge architecture).


The bandwidth of the graphics port also increased. The ability to work with the Fast Writes modes was introduced, which made it possible to write data to the video memory directly, bypassing the system memory, and Side Band Addressing, which used an additional 8-bit bus part for transmission, usually intended for transferring technical data. The gain from using FW was achieved only with a high load on the processor, in other cases it gave a meager gain. Thus, the difference between the 8x mode and 4x mode was within the margin of error.

Processor dependence

Another bottleneck that has arisen, relevant to this day, is processor dependence. This phenomenon arose as a result of the rapid development of video cards and meant insufficient power of the "processor - chipset - memory" link in relation to the video card. After all, the number of frames in the game is determined not only by the video card, but also by this bundle, since it is the latter that provides the card with instructions and data that must be processed. If the bundle is not keeping pace, the video subsystem will hit the ceiling, which is mainly determined by it. Such a ceiling will depend on the power of the card and the settings used, but there are also cards that have such a ceiling for any settings in a particular game or with the same settings, but in most modern games with almost any processor. For example, the GeForce 3 was very limited by the performance of the Puntium III and Pentium 4 processors on the Willamete core. The slightly older model GeForce 4 Ti already lacked the Athlons 2100 + -2400 +, and the gain with the improvement of the bundle characteristics was quite noticeable.



How did the performance improve? At first, AMD, taking advantage of its efficient architecture, simply increased the frequency of the processors and improved the technological process, and the chipset manufacturers - the memory bandwidth. Intel continued to follow the policy of increasing clock frequencies, since the architecture of Netburst was exactly for this. Intel processors based on Willamete and Northwood cores with 400QPB (quad pumped bus) were outperformed by competing solutions with 266 MHz. After the introduction of 533QPB, the processors became equal in performance. But then Intel, instead of the 667-MHz bus introduced in server solutions, decided to transfer desktop processors to 800 MHz bus immediately in order to make a headroom to compete with the Barton core and the new top Athlon XP 3200+. Intel processors were very limited by the bus frequency, and even 533QPB was not enough to provide sufficient data flow. That is why the released 3.0-GHz CPU on the 800 MHz bus outperformed the 3.06 MHz processor on the 533 MHz bus in all, perhaps, a small number of applications.


Support for new memory frequency modes was also introduced, and dual-channel mode appeared. This was done to equalize the bandwidth of the processor bus and memory. The dual-channel DDR mode was exactly the same as the QDR at the same frequency.


For AMD, the dual-channel mode was a formality and gave a barely noticeable gain. The new Prescott core did not bring an unambiguous increase in speed and in some places lost to the old Northwood. Its main goal was to transfer to a new technical process and the possibility of further frequency growth. Heat dissipation has increased greatly due to leakage currents, which put an end to the release of a model operating at 4.0 GHz.

Through the ceiling to a new memory

The generation of Radeon 9700/9800 and GeForce 5 did not cause problems with processor dependence for processors of that time. But the generation of GeForce 6 brought most systems to their knees, since the performance gain was very noticeable, and therefore the processor dependence is higher. Top processors based on Barton cores (Athlon XP 2500+ - 3200+) and Northwood / Prescott (3.0-3.4 MHz 800FSB) hit a new limit - the memory frequency limit and the bus. AMD especially suffered from this - the 400 MHz bus was insufficient to realize the power of a good FPU. The Pentium 4 had a better situation and showed good results with minimal timings. But JEDEC was reluctant to certify higher-frequency, lower-latency memory modules. Therefore, there were two options: either a complex four-channel mode, or a switch to DDR2. The latter happened and the LGA775 (Socket T) platform was introduced. The bus remained the same, but the memory frequencies were not limited to 400 MHz, but only started from it.



AMD solved the problem better in terms of scalability. The K8 generation, which was technically called Hammer, in addition to increasing the number of instructions per cycle (partly due to a shorter pipeline), had two innovations for the future. They were the built-in memory controller (more precisely, the north bridge with most of its functionality) and the fast universal HyperTransport bus, which was used to connect the processor to the chipset or processors to each other in a multiprocessor system. The built-in memory controller allowed avoiding the weak link - the chipset - processor link. FSB as such ceased to exist, there was only a memory bus and an HT bus.


This allowed Athlons 64 to easily overtake the existing Intel solutions based on the Netburst architecture and show the flawed ideology of a long pipeline. Tejas had a lot of problems and never came out. These processors easily realized the potential of the GeForce 6 cards, as well as the older Pentium 4s.


But then there was an innovation that made the processors a weak link for a long time. Its name is multi-GPU. It was decided to revive the ideas of 3dfx SLI and turn it into NVIDIA SLI. ATI responded symmetrically and released CrossFire. These were technologies for processing scenes using two cards. The doubled theoretical power of the video subsystem and the calculations associated with splitting the frame into parts at the expense of the processor led to the system skewing. Older Athlon 64 loaded this bundle only at high resolutions. The release of the GeForce 7 and ATI Radeon X1000 further increased this imbalance.


Along the way, a new PCI Express bus was developed. This bi-directional serial bus is for peripherals and is very fast. It came to replace AGP and PCI, although it did not completely replace it. Due to its versatility, speed and low cost of implementation, it quickly replaced AGP, although it did not bring any performance gain at that time. There was no difference between them. But from the point of view of unification, it was a very good step. Boards with PCI-E 2.0 support are already being produced, with twice the bandwidth (500 Mb / s in each direction against the previous 250 Mb / s per lane). This also did not give an increase to the current video cards. The difference between the various PCI-E modes is possible only in case of insufficient video memory, which already means an imbalance for the card itself. Such a card is GeForce 8800GTS 320 MB - it reacts very sensitively to changes in PCI-E mode. But taking an unbalanced card just to estimate the gain from PCI-E 2.0 is not the most reasonable decision5. Another thing is cards with Turbocache and Hypermemory support - technologies for using RAM as video memory. Here the increase in memory bandwidth will be approximately twofold, which will have a positive impact on performance.


Whether the video card has enough memory can be seen in any review of devices with different volumes of VRAM. Where there will be a sharp drop in frames per second, there is a lack of VideoRAM. But it happens that the difference becomes very noticeable only in non-playable modes - 2560x1600 resolution and AA / AF at maximum. Then the difference between 4 and 8 frames per second will be twofold, but it is obvious that both modes are impossible in real conditions, so they should not be taken into account.

New answer to video chips

The release of the new architecture Core 2 (technical name Conroe) improved the situation with processor dependence and loaded solutions on GeForce 7 SLI without any problems. But the arrived in time Quad SLI and GeForce 8 took revenge, restoring the imbalance. This continues to this day. The situation only worsened with the release of 3-way SLI and the forthcoming Quad SLI on GeForce 8800 and Crossfire X 3-way and 4-way. The Wolfdale output slightly raised the clock speeds, but overclocking this processor is not enough to load such video systems normally. 64-bit games are very rare, and an increase in this mode is observed in isolated cases. Games that gain from four cores can be counted on the fingers of one hand of a disabled person. As usual, Microsoft pulls everyone out, loading its new OS and memory, and the processor will live for you. It is latently announced that 3-way SLI and Crossfire X technologies will work exclusively under Vista. Given the appetite for this, perhaps gamers will be forced to take quad-core processors. This is due to a more uniform loading of the cores than in Windoes XP. If it should eat up a fair share of the processor time, then let it at least eat away the cores that are not used by the game anyway. However, I doubt that the new operating system will be satisfied with data at the mercy of kernels.



The Intel platform is becoming obsolete. The four cores are already plagued by memory bandwidth constraints and latency associated with bus switches. The bus is shared, and it takes time for the core to take control of the bus. With two cores, this is tolerable, but with four cores, the influence of time losses becomes more noticeable. Also, the system bus has not kept pace with the memory bandwidth for a long time. The effect of this factor was weakened by the improvement in the efficiency of asynchronous mode, which Intel implemented well. Workstations suffer even more from this due to the failure of the chipset, the memory controller of which provides only 33% of the theoretical memory bandwidth. An example of this is the loss of the Intel Skulltrail platform in most gaming applications (3Dmark06 CPU test is not a gaming application :)) even when using the same video cards. Therefore, Intel announced a new generation of Nehalem, which will implement an infrastructure very similar to AMD's developments - an integrated memory controller and a QPI peripheral bus (technical name CSI). This will improve the scalability of the platform and will yield positive results in dual-processor and multi-core configurations.


AMD now has several bottlenecks. The first is related to the caching mechanism - because of it, there is a certain bandwidth limit, depending on the processor frequency, such that it is impossible to jump above this value, even using higher-frequency modes. For example, with an average processor, the difference in memory operation between DDR2 667 and 800 MHz can be of the order of 1-3%, for a real task it is generally negligible. Therefore, it is best to select the optimal frequency and lower the timings - the controller responds to them very well. Therefore, it makes no sense to implement DDR3 - large timings will only harm, there may be no gain at all. Also AMD's problem now is the slow (despite SSE128) processing of SIMD instructions. It is for this reason that the Core 2 outperforms the K8 / K10 very much. ALU, which has always been Intel's strong point, has become even stronger, and in some cases it can be many times faster than its Phenom counterpart. That is, the main problem of AMD processors is weak "mathematics".


Generally speaking, weak links are very task-specific. Were considered only "epoch-making". So, in some tasks the speed can be limited by the amount of RAM or the speed of the disk subsystem. Then more memory is added (the size is determined using performance counters) and RAID arrays are installed. The speed of games can be increased by disabling the built-in sound card and purchasing a normal discrete one - Creative Audigy 2 or X-Fi, which load the processor less by processing effects with their chip. This applies more to AC'97 sound cards and to a lesser extent to HD-Audio (Intel Azalia), since the latter has fixed the problem of high processor load.


Remember, the system should always be tailored to specific tasks. Often, if you can choose a balanced video card (and the choice by price category will depend on prices that vary greatly in different places), then, say, with a disk subsystem, such an opportunity is not always available. Very few people need RAID 5, but for a server it is an indispensable thing. The same applies to a dual-processor or multi-core configuration, which is useless in office applications, but this is a must have for a designer working in 3Ds Max.

Good day!

Good day, nothing boded bad. But then a problem came - the speed of some application became unacceptably low, and even a week / month / day ago everything was fine. It must be solved quickly, spending as little time as possible. Problematic server running Windows Server 2003 or later.

I hope the following writings will be quite short and understandable, and also useful for both novice administrators and more serious comrades, because you can always find something new for yourself. Don't rush to investigate the behavior of your application right away. First of all, you should see if the server performance is sufficient at the moment? Are there any bottlenecks limiting its performance?

Perfmon, a powerful enough tool that comes with Windows, will help us with this. Let's start by defining a "bottleneck" - a resource that has reached its limit of use. They usually arise from improper resource scheduling, hardware problems, or incorrect application behavior.

If we open perfmon, then we will see dozens and hundreds of all kinds of sensors, and the number of them does not contribute to a quick investigation of this problem. So to begin with, let's highlight 5 main possible bottlenecks to narrow down the list of probes under investigation.

These will be the processor, RAM, storage system (HDD / SSD), network and processes. Next, we will consider each of these points, what sensors we need and the threshold values \u200b\u200bfor them.

CPU

An overloaded processor is clearly not conducive to fast application performance. To study its resources, we will highlight only 4 sensors:

Processor \\% Processor Time

Measures the ratio of processor time to idle time as a percentage. The most understandable sensor, CPU load. MS recommends changing the processor to a faster one if the value is above 85%. But it depends on many factors, you need to know your needs and characteristics, since this value can vary.

Processor \\% User Time

Shows how much time the processor spends in user space. If the value is large, it means that applications take up a lot of processor time, it is worth taking a look at them, since the need to optimize them is imminent.

Processor \\% Interrupt Time

Measures the time the processor spends waiting for an interrupt response. This sensor can indicate the presence of "iron" problems. MS recommends starting to worry if this value is over 15%. This means that some device is starting to respond very slowly to requests and should be checked.

System \\ Processor Queue Length

Shows the number of threads in the queue waiting for their execution time. MS recommends thinking about changing the processor to one with a larger number of cores, if this value exceeds the number of cores multiplied by two.

RAM

Lack of RAM can severely affect overall system performance, forcing the system to actively use a slow HDD for swaping. But even if there seems to be a lot of RAM installed on the server, the memory can "leak". A memory leak is an uncontrolled process of decreasing the amount of free memory associated with bugs in programs. It's also worth mentioning that for Windows, the amount of virtual memory is the sum of RAM and the paging file.

Memory \\% Committed Bytes in Use

Shows the virtual memory usage. If the value has exceeded 80%, then you should think about adding RAM.

Memory \\ Available Mbytes

Shows the RAM usage, namely the number of available megabytes. If the value is less than 5%, then again you should think about adding RAM.

Memory \\ Free System Page Table Entries

The number of free elements in the page table. And it is limited, in addition, these days, pages of 2 MB or more are gaining popularity, instead of the classic 4kB, which does not contribute to their large number. A value of less than 5000 may indicate a memory leak.

Memory \\ Pool Non-Paged Bytes

The size of this pool. This is a piece of kernel memory that contains important data and cannot be swap out. If the value has exceeded 175 MB, then it is most likely a memory leak. This is usually accompanied by the appearance of 2019 events in the system log.

Memory \\ Pool Paged Bytes

Similar to the previous one, but this area can be swapped to disk (swap) if they are not used. For this counter, values \u200b\u200babove 250 MB are considered critical, usually accompanied by the appearance of 2020 events in the system log. Also talks about a memory leak.

Memory \\ Pages per Second

The number of requests (read / write) to the page file per second due to the lack of necessary data in RAM. Again, a value over 1000 hints at a memory leak.

HDD

An important enough element that can make a significant contribution to system performance.

LogicalDisk \\% Free Space

The percentage of free space. Only partitions containing system files are of interest - OS, swap file / files, etc. MS recommends taking care of increasing the disk space if the free space is less than 15%, because under critical loads it can run out abruptly (with temp files, Windows updates or the same paging file). But, as they say, “it depends” and you need to look at the really available size of the space, because the same paging file can be hard-coded, temp "s have quotas that prevent them from growing, and updates are distributed in portions and rarely, or none at all.

PhysicalDisk \\% Idle Time

Shows how long the disk has been idle. It is recommended to replace the disk with a more efficient one if this counter is below 20% of the limit.

PhysicalDisk \\ Avg. Disk Sec / Read

The average time it takes for a hard drive to read data from itself. Above 25ms is already bad, for SQL Server and Exchange 10ms or less is recommended. The recommendation is identical to the previous one.

PhysicalDisk \\ Avg. Disk Sec / Write

Identical to PhysicalDisk \\ Avg. Disk Sec / Read, write-only. The critical threshold is also 25ms.

PhysicalDisk \\ Avg. Disk Queue Length

Shows the average number of I / O operations waiting for the hard disk to become available to them. It is recommended to start worrying if this number is twice the number of spindles in the system (in the absence of raid arrays, the number of spindles is equal to the number of hard disks). The advice is the same - a more efficient HDD.

Memory \\ Cache Bytes

The amount of memory used for cache, some of which is file-based. A volume of more than 300MB may indicate a problem with HDD performance or the presence of an application that actively uses the cache.

Network

In the modern world, there is nowhere without it - a huge amount of data is broadcast over the network.

Network Interface \\ Bytes Total / Sec

The amount of data sent (send / receive) through the network adapter. A value greater than 70% of the interface bandwidth indicates a possible problem. You need to either replace the card with a more productive one, or add another one to unload the first.

Network Interface \\ Output Queue Length

Shows the number of packets waiting to be sent. If the value has exceeded 2, then you should think about replacing the card with a more productive one.

Processes

Server performance can degrade catastrophically if there is an unoptimized application or the application starts to behave "incorrectly".

Process \\ Handle Count

The number of descriptors processed by the process. These can be both files and registry keys. The number of these in excess of 10,000 may indicate that the application is not working properly.

Process \\ Thread Count

The number of threads within the process. It is worth taking a closer look at the behavior of the application if the difference between the minimum and maximum number of them exceeds 500.

Process \\ Private Bytes

Shows the amount of memory allocated by a process that cannot be allocated to other processes. If the fluctuation of this indicator exceeds 250 between the minimum and maximum, then this indicates a possible memory leak.

Most of the above counters do not have a clear indication that a bottleneck has appeared in the system. All the given values \u200b\u200bwere based on average statistical results and can vary for different systems in a wide range. In order to use these counters correctly, we must know at least the indicators of the system during its normal operation. This is called baseline performance - a perfmon log taken from a running freshly installed system (the latter is optional, it is never too late to remove this log or keep track of baseline performance changes in the long term) of a system that has no problems. This is a rather important point, often overlooked by many, although in the future it can seriously reduce the possible downtime of the system and explicitly speed up the analysis of the data obtained from the above counters.

Taken from https: //ru.intel.com/business/community/? Automodule \u003d blog & blogid \u003d 57161 & sh ...

0 0

The latest version of Windows introduces a function for determining the power rating for different PC components. This provides an overview of the performance and bottlenecks of the system. But here you will not find any details about the speed parameters of the components. In addition, this diagnostic does not allow stress testing of the hardware, which is useful for understanding the peak loads when running modern games. Third-party benchmarks of the 3DMark family also give only an estimated characteristic in conditional points. At the same time, it is no secret that many of the computer hardware manufacturers optimize the operation of video cards and other components in such a way as to get the maximum number of points precisely when passing 3DMark. This program even allows you to compare the performance of your equipment with a similar one from its database, but you will not get specific values.

Therefore, PC testing should be carried out separately, taking into account not only the performance assessment by the benchmark, but also the real technical characteristics recorded as a result of the hardware test. We have selected for you a set of utilities (both paid and free) that allow you to get specific results and identify weak links.

Image processing speed and 3D

Testing video cards is one of the most important steps in assessing the power of a PC. Manufacturers of modern video adapters equip them with special software and drivers that allow using the GPU not only for image processing, but also for other calculations, such as video encoding. Therefore, the only reliable way to find out how efficiently computer graphics are processed is to resort to a special application that measures the device's performance.

Checking the stability of the video card

Program: FurMark 1.9.1 Website: www.ozone3d.net FurMark is one of the fastest and easiest tools to test your video adapter. The utility tests the performance of the video card, based on the OpenGL technology. The proposed rendering algorithm uses multi-pass rendering, each layer of which is based on GLSL (OpenGL Shader Language).

To load the graphics card processor, this benchmark renders an abstract 3D image with a torus covered in fur. The need to process a large amount of hair leads to the maximum possible load on the device. FurMark checks the stability of the video card, and also shows the changes in the temperature of the device with increasing load.

In the FurMark settings, you can specify the resolution at which the hardware testing will be carried out, and upon completion the program will present a short report on the PC configuration with the final score in conditional points. This value is useful for general comparison of the performance of several video cards. You can also check the "standby" resolutions of 1080p and 720p.

Virtual Stereo Walk

Program: Unigine Heaven DX11 Benchmark Website: www.unigine.com One of the surest ways to test what a new computer can do is to run games on it. Modern games make full use of hardware resources - graphics card, memory and processor. However, not everyone has the opportunity and desire to spend time on such entertainment. Alternatively, you can use the Unigine Heaven DX11 Benchmark software. This test is based on the Unigine game engine (such games as Oil Rush, Dilogus: The Winds of War, Syndicates of Arkon and others are built on it), which supports graphics APIs (DirectX 9, 10, 11 and OpenGL). After launching it, the program will create a demo visualization by rendering the virtual environment in real time. The user will see a short video that will include a virtual walk through the fantasy world. These scenes are created by the graphics card. In addition to 3D objects, the engine simulates complex lighting, simulating a global system with multiple reflections of light rays from elements of the scene.

Computer testing can be performed in stereo mode, and in the benchmark settings it is possible to select the standard of volumetric video: anaglyph 3D, separate output of frames for the right and left eyes, etc.

Despite the fact that the name of the program mentions the eleventh version of DirectX, this does not mean that Unigine Heaven is intended only for modern video cards. In the settings of this test, you can select one of the earlier versions of DirectX, as well as set an acceptable level of picture detail and specify the rendering quality of shaders.

Weak link detection

In a situation where the user is seized by the desire to increase the performance of his computer, the question may arise: which component is the weakest? What will make the computer faster - replacing the video card, processor, or installing a huge amount of RAM? To answer this question, you need to test individual components and identify the "weak link" in the current configuration. Find it using a unique multitesting utility.

Load simulator

Program: PassMark PerformanceTest Website: www.passmark.com PassMark PerformanceTest analyzes virtually any device in your PC configuration, from motherboard and memory to optical drives.

A feature of PassMark PerformanceTest is that the program uses a large number of different tasks, scrupulously measuring the performance of a computer in different situations. At a certain moment, it may even seem that someone took control of the system into their own hands - windows are opened in an arbitrary way, their contents are scrolled, images are displayed on the screen. All of this is the result of a benchmark that simulates the most common tasks typically required in Windows. At the same time, the speed of data compression is checked, the time required to encrypt information is recorded, filters are applied to photographs, the speed of vector graphics rendering is set, short demo three-dimensional videos are played, etc.

At the end of the test, PassMark PerformanceTest gives a total score in points and offers to compare this result with the data obtained on a PC with different configurations. For each of the tested parameters, the application creates a diagram on which weak computer components are very clearly visible.

Disk system check

Disk system bandwidth can be the bottleneck in PC performance. Therefore, it is extremely important to know the real characteristics of these components. Testing the hard drive will not only determine its read and write speeds, but also show how reliable the device is. We recommend trying two small utilities to test your drive.

Exams for HDD

Programs: CrystalDiskInfo and CrystalDiskMark Website: http://crystalmark.info/software/index-e.html These programs are created by one developer and complement each other perfectly. Both of them are free and can work without installation on a computer, directly from a USB flash drive.

Most hard drives are equipped with SMART self-diagnostic technology, which allows predicting possible malfunctions in the drive. With the help of the CrystalDiskInfo program, you can assess the real state of your HDD in terms of reliability: it reads SMART data, determines the number of problem sectors, the number of read head positioning errors, the time it takes to spin up the disk, as well as the current temperature of the device. If the latter is too high, then the media to failure will be very short. The program also shows the firmware version and provides data on the duration of the hard disk usage.

CrystalDiskMark is a small application that measures write and read speeds. This tool for checking disks differs from similar utilities in that it makes it possible to use different conditions for writing and reading data - for example, measuring readings for blocks of different sizes. The utility also allows you to set the number of tests to run and the amount of data used for them.

Speedometer for web surfing

The real speed of a network connection usually differs from that indicated in its settings or declared by the provider, and, as a rule, in a lower direction. A lot of factors can affect the speed of data transfer - the level of electromagnetic interference in the room, the number of users simultaneously working on the network, the quality of the cable, etc.

Estimating network speed

Program: SpeedTest Website: www.raccoonworks.com If you want to know the actual data transfer rate on your local network, SpeedTest will help you. It allows you to determine whether the provider adheres to the declared parameters. The utility measures the speed of data transfer between two users' work machines, as well as between a remote server and a personal computer.

The program consists of two parts - server and client. To measure the speed of information transfer from one computer to another, the first user needs to start the server part and specify an arbitrary file (preferably large) that will be used for the test. The second test participant must start the client component and specify the server parameters - address and port. Both applications establish a connection and start exchanging data. During file transfer, SpeedTest builds a graphical dependency and collects statistics about the time it took to copy data over the network. If you test several remote PCs, the program will add new curves to the plotted graph over and over again.

In addition, SpeedTest will check the speed of the Internet: in the "Web Page" mode, the program tests the connection to any site. This parameter can also be estimated by going to the specialized resource http://internet.yandex.ru.

Malfunctions of RAM may not appear immediately, but under certain loads. To be sure that the selected modules will not let you down in any situation, it is better to test them thoroughly and choose the fastest ones.

Meme Olympiad

Program: MaxxMEM2 - PreView Website: www.maxxpi.net This program is designed to test the speed of memory. In a very short period, it performs several tests: it measures the time of copying data to RAM, determines the speed of reading and writing data, and shows the memory latency parameter. In the settings of the utility, you can set the priority of the test, and the result can be compared with the actual values \u200b\u200bobtained by other users. From the program menu, you can quickly go to online statistics on the official MaxxMEM2 website and find out which memory is the most productive.

Speed \u200b\u200bis not important for sound

When testing most devices, processing speed is usually important. But with regard to a sound card, this is not the main indicator. It is much more important for the user to check the characteristics of the analog and digital audio paths - to find out how much the sound is distorted during playback and recording, to measure the noise level, etc.

Comparison with the reference

Program: RightMark Audio Analyzer 6.2.3 Website: http://audio.rightmark.org The creators of this utility offer several ways to check audio performance. The first option is sound card self-diagnosis. The device reproduces the test signal through the audio path and immediately records it. The waveform of the received signal should ideally match the original. Deviations indicate that the sound is distorted by the audio card installed in your PC.

The second and third test methods are more accurate - using a reference sound signal generator or using an additional sound card. In both cases, the quality of the signal source is taken as a standard, although additional devices also introduce a certain error. When using a second audio card, the signal distortion factor at the output should be minimal - the device should have better characteristics than the sound card under test. At the end of the test, you can also determine such parameters as the frequency response of the audio card, its noise level, the generated harmonic distortion, etc.

In addition to the basic functions available in the free edition, the more powerful version of RightMark Audio Analyzer 6.2.3 PRO also contains support for the professional ASIO interface, four times more detailed spectrum resolution and the ability to use Kernel Streaming direct data transmission.

It is important that no one interferes

When running any performance test, keep in mind that many factors affect the final results, especially the operation of background services and applications. Therefore, for the most accurate assessment of the PC, it is recommended to first disable the antivirus scanner and close all running applications, up to the mail client. And, of course, to avoid errors in measurements, you should stop all work until the program completes testing the equipment.

FX vs. Core i7 | Looking for bottlenecks with an Eyefinity configuration

We've seen processor performance double every three to four years. Yet the most demanding game engines we've tested are as old as Core 2 Duo processors. Naturally, CPU bottlenecks should be a thing of the past, right? As it turned out, the GPU speed grows even faster than the CPU performance. Thus, the debate about buying a faster CPU or more graphics power continues.

But there always comes a point when it's pointless to argue. For us, it came when games started running smoothly on the largest monitor with a native resolution of 2560x1600. And if the faster component can provide an average of 200, and not 120 frames per second, the difference will still not be noticeable.

In response to the lack of higher resolutions for faster graphics cards, AMD introduced Eyefinity technology and Nvidia introduced Surround. Both technologies allow you to play on more than one monitor, and for high-end GPUs, working at 5760x1080 resolution has become an objective reality. Basically, three displays with a resolution of 1920x1080 will cost less and impress you more than one screen at 2560x1600. Hence there was a reason to spend additional money on more powerful graphics solutions.

But is it really necessary to have a powerful processor to play smoothly at 5760x1080? The question turned out to be interesting.

AMD recently introduced a new architecture and we bought a boxed FX-8350 ... The article AMD FX-8350 Review and Test: Will Piledriver Fix Bulldozer's Weaknesses? we liked a lot about the new processor.

From an economic point of view, in this comparison, Intel will have to prove that it is not only faster than the AMD chip in games, but also justifies the high price difference.


Both motherboards belong to the Asus Sabertooth family, but the company asks a higher price for the model with the LGA 1155 socket, which further complicates Intel's budgetary position. We have specifically selected these platforms to make the performance comparisons as fair as possible without considering cost.

FX vs. Core i7 | Configuration and tests

While we were waiting to appear in the testlab FX-8350 , conducted boxing tests. Given that the AMD processor reaches 4.4 GHz without problems, we began testing the Intel chip at the same frequency. It later turned out that we underestimated our samples, as both CPUs reached 4.5 GHz at the selected voltage level.

We didn't want to postpone the publication due to retesting at higher frequencies, so we decided to leave the test results at 4.4 GHz.

Test configuration
CPU Intel Intel Core i7-3770K (Ivy Bridge): 3.5GHz, 8MB Shared L3 Cache, LGA 1155 overclocked to 4.4GHz at 1.25V
Intel motherboard Asus Sabertooth Z77, BIOS 1504 (08/03/2012)
Intel CPU cooler Thermalright MUX-120 w / Zalman ZM-STG1 Paste
CPU AMD AMD FX-8350 (Vishera): 4.0GHz, 8MB Shared L3 Cache, Socket AM3 + Overclocked to 4.4GHz @ 1.35V
AMD motherboard Asus Sabertooth 990FX, BIOS 1604 (10/24/2012)
Cooler CPU AMD Sunbeamtech Core-Contact Freezer w / Zalman ZM-STG1 Paste
Network Integrated Gigabit LAN controller
Memory G.Skill F3-17600CL9Q-16GBXLD (16GB) DDR3-2200 CAS 9-11-9-36 1.65V
Video card 2 x MSI R7970-2PMD3GD5 / OC: GPU, 1010 MHz GDDR5-5500
Storage device Mushkin Chronos Deluxe DX 240GB SATA 6Gb / s SSD
Food Seasonic X760 SS-760KM: ATX12V v2.3, EPS12V, 80 PLUS Gold
Software and drivers
operating system Microsoft Windows 8 Professional RTM x64
Graphics driver AMD Catalyst 12.10

Thanks to their high efficiency and quick installation, we have been using Thermalright MUX-120 and Sunbeamtech Core Contact Freezer coolers for several years. However, the mounting brackets that come with these models are not interchangeable.


G.Skill F3-17600CL9Q-16GBXLD memory modules are DDR3-2200 CAS 9, and use Intel XMP profiles for semi-automatic configuration. Sabertooth 990FX uses XMP values \u200b\u200bvia Asus DOCP.

The Seasonic X760 PSU provides the high efficiency needed to assess platform differences.

StarCraft II does not support AMD Eyefinity technology, so we decided to use the older games: Aliens vs. Predator and Metro 2033.

Test configuration (3D games)
Aliens vs. Predator using AvP Tool v.1.03, SSAO / tessellation / shadows incl.
Test Set 1: Texture Quality High, no AA, 4x AF
Test Set 2: Very High Texture Quality, 4x AA, 16x AF
Battlefield 3 Campaign mode, "" Going Hunting "" 90-second Fraps
Test Setting 1: Medium Quality (no AA, 4x AF)
Test Setting 2: Ultra Quality (4x AA, 16x AF)
F1 2012 Steam version, built-in benchmark
Test Setting 1: High Quality, No AA
Test Setting 2: Ultra Quality, 8x AA
Elder Scrolls V: Skyrim Update 1.7, Celedon Aethirborn Level 6, 25-Second Fraps
Test Setting 1: DX11, High Detail Level No AA, 8x AF, FXAA On
Test Setting 2: DX11, Ultra Detail Level, 8x AA, 16x AF, FXAA On
Metro 2033 Full version, built-in benchmark, "Frontline" scene
Test Setting 1: DX11, High, AAA, 4x AF, no PhysX, no DoF
Test Setting 2: DX11, Very High, 4x AA, 16x AF, no PhysX, DoF on.

FX vs. Core i7 | Test results

Battlefield 3, F1 2012 and Skyrim

But first, let's take a look at power consumption and efficiency.

Power consumption not overclocked FX-8350 compared to the Intel chip, not so terrible, although in fact it is higher. However, we do not see the whole picture on the chart. We haven't seen the chip run at 4 GHz under constant load at base settings. Instead, while processing eight threads in Prime95, he reduced the multiplier and voltage to stay within the stated thermal package. Throttling artificially restrains CPU power consumption. Setting a fixed multiplier and voltage noticeably increases this figure for the Vishera processor during overclocking.

At the same time, not all games can use the processor's capability FX-8350 process eight streams of data simultaneously, therefore, they will never be able to bring the chip to the throttling mechanism.

As already noted, during games on not overclocked FX-8350 throttling is not activated because most games cannot fully load the processor. In fact, games take advantage of Turbo Core technology, which boosts the processor frequency to 4.2 GHz. The AMD chip performed worst of all on the average performance chart, where Intel comes out ahead.

For the efficiency chart, we use the average power consumption and average performance of all four configurations as an average. This chart shows the performance per watt of an AMD processor FX-8350 is about two-thirds of Intel's result.

FX vs. Core i7 | Can AMD FX catch up with the Radeon HD 7970?

When we talk about good and affordable hardware, we like to use phrases like "80% performance for 60% cost". These metrics are always very honest as we are already in the habit of measuring performance, power consumption and efficiency. However, they only take into account the cost of one component, and components, as a rule, cannot work alone.

Adding the components used in today's review, the price of an Intel-based system has risen to $ 1900, and AMD platforms to $ 1724, this is without taking into account the cases, peripherals and operating system. If we consider "ready-made" solutions, then it is worth adding about another $ 80 per case, and as a result we get $ 1984 from Intel and $ 1804 from AMD. Savings on the out-of-the-box configuration with an AMD processor are $ 180, which is not much as a percentage of the total system cost. In other words, the rest of the high-end personal computer is playing down the processor's better price.

As a result, we are left with two completely biased ways of comparing price and performance. We have openly confessed, so we hope that we will not be condemned for the results presented.

AMD is better off if we include only the motherboard and CPU costs and increase the benefit. You get the following diagram:

As a third alternative, consider the motherboard and processor as an upgrade, assuming that the case, power supply, memory and drives are left over from the previous system. Most likely a pair of video cards Radeon HD 7970 was not used in the old configuration, so it is most reasonable to take into account processors, motherboards, and graphics adapters. So we add two Tahiti GPUs to the list for $ 800.

AMD FX-8350 looks better than Intel (especially in games, on the settings we have chosen) in only one case: when the rest of the system is "free". Since the rest of the components cannot be free, FX-8350 also cannot become a profitable purchase for games.

Intel and AMD graphics

Our test results have long shown that ATI's graphics chips are more processor-dependent than Nvidia's. As a result, when testing high-end GPUs, we equip our test benches with Intel processors, bypassing platform flaws that can interfere with graphics performance isolation and adversely affect results.

We were hoping that the way out AMD Piledriver will make a difference, but even a few impressive enhancements weren't enough to bring the CPU team up against the performance of AMD's graphics team. Well, let's wait for the release of AMD chips based on the Steamroller architecture, which promises to be 15% more productive than Piledriver.