Card shader frequency. Features of national modernization, or what is not worth saving. Microsoft DirectX and Shader Model versions

In the first part of our beginner's guide to graphics cards, we covered the key components: interfaces, outputs, cooling, GPU, and video memory. In the second part, we will talk about the features and technologies of video cards.

Basic components of the video card:

  • outputs;
  • interfaces;
  • cooling system;
  • graphics processor;
  • video memory.

Part 2 (this article): graphics technology:

  • dictionary;
  • gPU architecture: functions
    vertex / pixel units, shaders, fill rate, texture / raster units, pipelines;
  • gPU architecture: technology
    technical process, GPU frequency, local video memory (size, bus, type, frequency), solutions with several video cards;
  • visual functions
    DirectX, high dynamic range (HDR), full screen anti-aliasing, texture filtering, high resolution textures.

Glossary of basic graphic terms

Refresh Rate

Just like in a movie theater or TV, your computer simulates motion on a monitor by displaying a sequence of frames. The refresh rate of the monitor indicates how many times per second the picture will be refreshed on the screen. For example, 75 Hz corresponds to 75 updates per second.

If the computer is processing frames faster than the monitor can display, then problems may appear in games. For example, if the computer renders 100 frames per second, and the monitor refresh rate is 75 Hz, then due to overlays, the monitor can display only part of the picture during its refresh period. As a result, visual artifacts appear.

As a solution, you can enable V-Sync (vertical sync). It limits the number of frames emitted by the computer to the refresh rate of the monitor, preventing artifacts. If you enable V-Sync, then the number of frames rendered in the game will never exceed the refresh rate. That is, at 75 Hz, the computer will output no more than 75 frames per second.

The word "Pixel" stands for " picture element "is an element of an image. It is a tiny dot on the display that can glow in a certain color (in most cases, the tint is displayed by a combination of three basic colors: red, green and blue). If the screen resolution is 1024x768, then you can see a matrix of 1024 pixels in width and 768 pixels in height.Taken together, the pixels make up the image.The picture on the screen is refreshed from 60 to 120 times per second, depending on the type of display and the data issued by the output of the video card. CRT monitors update the display line by line, and flat panel LCD monitors can update each pixel individually.

All objects in the 3D scene are composed of vertices. A vertex is a point in three-dimensional space with coordinates X, Y and Z. Several vertices can be grouped into a polygon: most often it is a triangle, but more complex shapes are also possible. Then a texture is applied to the polygon, which makes the object look realistic. The 3D cube shown in the illustration above has eight vertices. More complex objects have curved surfaces, which actually consist of a very large number of vertices.

A texture is simply a 2D image of any size that is superimposed on a 3D object to simulate its surface. For example, our 3D cube has eight vertices. Before texture mapping, it looks like a simple box. But when we apply the texture, the box becomes colored.


The pixel shader software allows the graphics card to produce impressive effects like that in Elder Scrolls: Oblivion.

Today there are two types of shaders: vertex and pixel. Vertex shaders can modify or transform 3D objects. Pixel shaders allow you to change the colors of pixels based on data. Imagine a light source in a 3D scene that makes the illuminated objects glow brighter, while casting shadows on other objects at the same time. All this is realized by changing the color information of the pixels.

Pixel shaders are used to create complex effects in your favorite games. For example, shader code can make the pixels surrounding the 3D sword glow brighter. Another shader can process all the vertices of a complex 3D object and simulate an explosion. More and more game developers are using sophisticated shaders to create realistic graphics. Almost every modern game with rich graphics uses shaders.

With the release of the next Microsoft DirectX 10 Application Programming Interface (API), a third type of shader called geometry shaders will be released. With their help, it will be possible to break objects, modify and even destroy them, depending on the desired result. The third type of shader can be programmed in the same way as the first two, but its role will be different.

Fill Rate

Very often on the box with a video card, you can find the fill rate value. Basically, the fill rate indicates how fast the GPU can deliver pixels. In older video cards, you could find the triangle fill rate. But today there are two types of fill rate: pixel fill rate and texture fill rate. As mentioned, the pixel fill rate corresponds to the pixel output rate. It is calculated as the number of raster operations (ROP) multiplied by the clock frequency.

ATi and nVidia calculate texture fill rates differently. nVidia thinks that speed is obtained by multiplying the number of pixel pipelines by the clock speed. ATi multiplies the number of texture units by the clock speed. In principle, both methods are correct, since nVidia uses one at a time texture unit per pixel shader unit (that is, one per pixel pipeline).

With these definitions in mind, let me move on and discuss the most important functions of a GPU, what they do, and why they are so important.

GPU architecture: features

The realism of 3D graphics is highly dependent on the performance of the video card. The more blocks of pixel shaders the processor contains and the higher the frequency, the more effects can be applied to a 3D scene to improve its visual perception.

The GPU contains many different functional blocks. By the number of some components, you can estimate how powerful the GPU is. Before moving on, let me review the most important functional blocks.

Vertex processors (vertex shader units)

Like pixel shader units, vertex processors execute shader code that touches vertices. Since a higher vertex budget allows for more complex 3D objects, vertex processor performance is very important in 3D scenes with complex objects or a large number of objects. However, vertex shader units are still not as obviously affecting performance as pixel processors.

Pixel Processors (Pixel Shader Units)

A pixel processor is a component of the graphics chip dedicated to processing pixel shader programs. These processors perform pixel-only calculations. Because pixels contain color information, pixel shaders can achieve impressive graphical effects. For example, most of the water effects you've seen in games are created using pixel shaders. Typically, the number of pixel processors is used to compare the pixel performance of video cards. If one card is equipped with eight pixel shader units, and the other with 16 units, then it is quite logical to assume that a video card with 16 units will process complex pixel programs faster. You should also consider the clock speed, but today doubling the number of pixel processors is more energy efficient than doubling the frequency of the graphics chip.

Unified shaders

Unified (uniform) shaders have not yet arrived in the PC world, but the upcoming DirectX 10 standard relies on a similar architecture. That is, the structure of the code of vertex, geometric and pixel programs will be the same, although shaders will execute different jobs... The new spec can be viewed on the Xbox 360, where the GPU was specially designed by ATi for Microsoft. It will be quite interesting to see what potential the new DirectX 10 has.

Texture Mapping Units (TMU)

Textures should be selected and filtered. This work is done by the texture mapping units, which work in conjunction with the pixel and vertex shader units. The TMU's job is to apply texture operations to the pixels. The number of texture units in a GPU is often used to compare the texture performance of video cards. It is quite reasonable to assume that a video card with a higher number of TMUs will give higher texture performance.

Raster Operator Units (ROPs)

RIPs are responsible for writing pixel data into memory. The rate at which this operation is performed is the fill rate. In the early days of 3D accelerators, ROPs and fill rates were very important characteristics of graphics cards. Today, ROP performance is still important, but the performance of a video card is no longer limited by these blocks, as it was before. Therefore, the performance (and number) of ROPs is already rarely used to estimate the speed of a video card.

Conveyors

Pipelines are used to describe the architecture of video cards and provide a very visual representation of the performance of the GPU.

Conveyor is not a strict technical term. The GPU uses different pipelines to perform different functions. Historically, a pipeline was understood as a pixel processor that was connected to its own texture mapping unit (TMU). For example, the Radeon 9700 video card uses eight pixel processors, each of which is connected to its own TMU, so the card is considered to have eight pipelines.

But modern processors it is very difficult to describe by the number of pipelines. Compared to previous designs, the new processors use a modular, fragmented structure. ATi can be considered an innovator in this area, which, with the X1000 line of video cards, switched to a modular structure, which allowed achieving performance gains through internal optimization. Some CPU blocks are used more than others, and to improve GPU performance, ATi has tried to balance the number of blocks needed and die area (not too large). In this architecture, the term "pixel pipeline" has lost its meaning, since the pixel processors are no longer connected to their own TMUs. For example, the ATi Radeon X1600 GPU has 12 Pixel Shaders and a total of four TMUs. Therefore, one cannot say that the architecture of this processor has 12 pixel pipelines, just like say that there are only four of them. However, by tradition, pixel pipelines are still mentioned.

Taking these assumptions into account, the number of pixel pipelines in a GPU is often used to compare video cards (with the exception of the ATi X1x00 line). For example, if we take video cards with 24 and 16 pipelines, then it is quite reasonable to assume that a card with 24 pipelines will be faster.

GPU architecture: technology

Technical process

This term refers to the size of one element (transistor) of the chip and the precision of the manufacturing process. Improving technological processes allows you to get smaller elements. For example, the 0.18 μm process technology gives the elements bigger sizethan the 0.13-micron process technology, so it is not as efficient. Smaller transistors operate on lower voltages. In turn, a decrease in voltage leads to a decrease thermal resistance, which gives a decrease in the amount of heat generated. Improving the process technology allows to reduce the distance between the functional blocks of the chip, and data transfer takes less time. Shorter distances, lower voltages, and other improvements allow higher clock speeds to be achieved.

The understanding is somewhat complicated by the fact that today both micrometers (μm) and nanometers (nm) are used to denote the technical process. In fact, everything is very simple: 1 nanometer is equal to 0.001 micrometer, so 0.09-micron and 90-nm manufacturing processes are one and the same. As noted above, a smaller process technology allows you to get higher clock speeds. For example, if we compare video cards with 0.18 micron and 0.09 micron (90 nm) chips, then it is quite reasonable to expect a higher frequency from a 90 nm card.

GPU clock speed

GPU clock speeds are measured in megahertz (MHz), which is millions of clock cycles per second.

The clock speed directly affects the performance of the GPU. The higher it is, the more work can be done in a second. For the first example, take nVidia graphics cards GeForce 6600 and 6600 GT: The 6600 GT GPU runs at 500 MHz, while the regular 6600 card runs at 400 MHz. Since the processors are technically identical, a 20% increase in the 6600 GT's clock speed translates into better performance.

But clock speed is not everything. It should be borne in mind that architecture greatly affects performance. For the second example, let's take the GeForce 6600 GT and GeForce 6800 GT video cards. The 6600 GT has a GPU frequency of 500 MHz, but the 6800 GT runs at only 350 MHz. Now let's take into account that the 6800 GT uses 16 pixel pipelines, while the 6600 GT uses only eight. Therefore, a 6800 GT with 16 pipelines at 350 MHz will give about the same performance as a processor with eight pipelines and double the clock speed (700 MHz). With that said, the clock speed can be used to compare performance.

Local video memory

Video card memory has a huge impact on performance. But different memory parameters affect differently.

Video memory size

The amount of video memory can probably be called the most overrated parameter of a video card. Inexperienced consumers often use the amount of video memory to compare different cards with each other, but in reality the amount has little effect on performance compared to such parameters as the memory bus frequency and interface (bus width).

In most cases, a card with 128 MB of video memory will perform almost the same as a card with 256 MB. Of course, there are situations where more memory leads to increased performance, but remember that more memory will not automatically lead to an increase in speed in games.

Where volume is useful is in games with high resolution textures. Game developers provide several texture sets for the game. And the more memory there is on the video card, the higher resolution the loaded textures can have. High resolution textures give higher clarity and detail in the game. Therefore, it makes sense to take a card with a large memory capacity if all other criteria are the same. Let us remind you once again that the memory bus width and its frequency have a much stronger effect on performance than the amount of physical memory on the card.

Memory bus width

Memory bus width is one of the most important aspects memory performance. Modern buses are 64 to 256 bits wide, and in some cases even 512 bits. The wider the memory bus, the more information it can transmit per clock cycle. And this directly affects performance. For example, if we take two buses with equal frequencies, then theoretically a 128-bit bus will transfer twice as much data per clock as a 64-bit bus. And the 256-bit bus is twice as large.

Higher bus bandwidth (expressed in bits or bytes per second, 1 byte \u003d 8 bits) results in more high productivity memory. That is why the memory bus is much more important than its size. At equal frequencies, the 64-bit memory bus operates at a speed of only 25% of the 256-bit one!

Let's take the following example. A video card with 128 MB of video memory, but with a 256-bit bus, gives a much higher memory performance than a 512 MB model with a 64-bit bus. It is important to note that for some ATi X1x00 cards the manufacturers indicate the specifications for the internal memory bus, but we are interested in the parameters of the external bus. For example, the X1600's internal ring bus is 256 bits wide, but the external one is only 128 bits wide. And in reality, the memory bus operates at 128-bit performance.

Memory types

Memory can be divided into two main categories: SDR (single data transfer) and DDR (double data transfer), in which data is transferred twice as fast per clock. Today, SDR single transmission technology is obsolete. Since the dDR memory data is transferred twice as fast as SDR, it is important to remember that video cards with DDR memory are most often indicated with a double frequency, and not a physical one. For example, if DDR memory has a specified frequency of 1000 MHz, then this is the effective frequency at which regular SDR memory must operate in order to give the same throughput... In fact, the physical frequency is 500 MHz.

For this reason, many are surprised when the frequency of 1200 MHz DDR is indicated for the memory of their video card, and the utilities report 600 MHz. So you have to get used to it. DDR2 and GDDR3 / GDDR4 memory works in the same way, that is, with twice the data transfer. The difference between DDR, DDR2, GDDR3 and GDDR4 lies in the manufacturing technology and some details. DDR2 can work for more high frequenciesthan DDR memory, and DDR3 is even higher than DDR2.

Memory bus frequency

Like a processor, memory (or, more precisely, a memory bus) operates on certain clock frequenciesmeasured in megahertz. Here, increasing clock speeds directly affects memory performance. And the memory bus frequency is one of the parameters used to compare the performance of video cards. For example, if all other characteristics (memory bus width, etc.) are the same, then it is quite logical to say that a video card with 700 MHz memory is faster than a 500 MHz one.

Again, clock speed isn't everything. A 700 MHz memory with a 64-bit bus will be slower than a 400 MHz memory with a 128-bit bus. The performance of 400 MHz memory on a 128-bit bus is roughly equivalent to 800 MHz memory on a 64-bit bus. It should also be remembered that the frequencies of the GPU and memory are completely different parameters, and they usually differ.

Graphics card interface

All data transferred between the video card and the processor goes through the video card interface. Today, three types of interfaces are used for video cards: PCI, AGP and PCI Express. They differ in bandwidth and other characteristics. It is clear that the higher the bandwidth, the higher the exchange rate. However, only the most modern cards can use high bandwidth, and even then only partially. At some point, the interface speed has ceased to be a "bottleneck", it is simply enough today.

The slowest bus for which video cards were produced is PCI (Peripheral Components Interconnect). If you do not go into history, of course. PCI really hurt the performance of video cards, so they switched to the AGP (Accelerated Graphics Port) interface. But even the AGP 1.0 and 2x specifications limited performance. When the standard increased the speed to AGP 4x, we began to approach the practical limit of the bandwidth that video cards can use. The AGP 8x specification doubled the bandwidth once again compared to AGP 4x (2.16 GB / s), but we did not get a tangible increase in graphics performance.

The newest and fastest bus is PCI Express. Newer graphics cards typically use PCI Express x16, which combines 16 PCI Express lanes for a total bandwidth of 4 GB / s (one direction). This is twice the bandwidth of AGP 8x. The PCI Express bus provides the mentioned bandwidth for both directions (data transfer to and from the video card). But the speed of the AGP 8x standard was already sufficient, so we have not yet encountered a situation where the transition to PCI Express gave a performance increase as compared to AGP 8x (if other hardware parameters are the same). For example, the AGP version of the GeForce 6800 Ultra will work identically to the 6800 Ultra for PCI Express.

Today it is best to buy a card with a PCI Express interface, it will hold out on the market for several more years. The most productive cards are no longer available with the AGP 8x interface, and PCI Express solutions, as a rule, are easier to find than AGP analogs, and they cost less.

Multi-GPU solutions

Using multiple graphics cards to increase graphics performance is not a new idea. In the early days of 3D graphics, the 3dfx digger entered the market with two graphics cards running in parallel. But with the disappearance of 3dfx, the technology of collaboration of several consumer video cards was consigned to oblivion, although ATi has been releasing similar systems for professional simulators since the release of the Radeon 9700. A couple of years ago, the technology returned to the market: with the advent of solutions nVidia SLI and, a little later, ATi Crossfire .

Sharing multiple graphics cards provides enough performance to run the game at high quality settings in high definition. But choosing one solution or another is not so easy.

To begin with, solutions based on multiple video cards require a lot of energy, so the power supply must be powerful enough. All this heat will have to be removed from the video card, so you need to pay attention to the PC case and cooling so that the system does not overheat.

Also, remember that SLI / CrossFire requires an appropriate motherboard (either for one technology or another), which usually costs more than standard models. The nVidia SLI configuration will only work on certain nForce4 boards, and ATi CrossFire cards will only work on motherboards with a CrossFire chipset or on certain Intel models. To complicate matters further, some CrossFire configurations require one of the cards to be special: the CrossFire Edition. After the release of CrossFire for some models of video cards, ATi allowed the inclusion of the technology of collaboration via the PCI Express bus, and with the release of new driver versions, possible combinations increases. Still, hardware CrossFire with a corresponding CrossFire Edition card gives better performance. But CrossFire Edition cards are also more expensive than regular models. For now, you can enable CrossFire software mode (no CrossFire Edition card) on radeon video cards X1300, X1600 and X1800 GTO.

There are other factors to consider. While two graphics cards working together give a performance boost, it is far from doubled. But you will give twice as much money. Most often, the productivity gain is 20-60%. And in some cases, due to additional computational costs for reconciliation, there is no gain at all. For this reason, multi-card configurations are unlikely to justify themselves with cheaper models, as a more expensive video card will usually always outperform a couple of cheaper cards. In general, it makes no sense to take an SLI / CrossFire solution for most consumers. But if you want to enable all the quality enhancement options or play at extreme resolutions, for example, 2560x1600, when you need to render more than 4 million pixels per frame, then you cannot do without two or four paired video cards.

Visual functions

In addition to purely hardware specifications, different generations and models of GPUs may differ in feature set. For example, it is often said that cards of the ATi Radeon X800 XT generation are compatible with Shader Model 2.0b (SM), while the nVidia GeForce 6800 Ultra is compatible with SM 3.0, although their hardware specifications are close to each other (16 pipelines). Therefore, many consumers make a choice in favor of one solution or another, without even knowing what this difference means. Well, let me talk about visual features and their value to the end user.

These names are most often used in controversy, but few people know what they really mean. To understand, let's start with a history of graphics APIs. DirectX and OpenGL are graphical APIs, which are Application Programming Interfaces - open code standards available to everyone.

Before the advent of graphics APIs, each GPU manufacturer used its own mechanism for communicating with games. Developers had to write separate code for each GPU they wanted to support. A very expensive and ineffective approach. To solve this problem, APIs for 3D graphics were developed so that developers write code for a specific API, and not for a particular video card. After that, compatibility problems fell on the shoulders of video card manufacturers, who had to ensure that the drivers would be API compatible.

The only complication is that today there are two different APIs, namely Microsoft DirectX and OpenGL, where GL stands for Graphics Library. Since the DirectX API is more popular in games today, we will focus on it. And this standard influenced the development of games more strongly.

DirectX is creation of Microsoft... In fact, DirectX includes several APIs, only one of which is used for 3D graphics. DirectX includes APIs for sound, music, input devices, and more. The Direct3D API is responsible for 3D graphics in DirectX. When they talk about video cards, they mean it, therefore, in this regard, the terms DirectX and Direct3D are interchangeable.

DirectX is updated periodically as graphics technology advances and game developers introduce new ways to program games. As the popularity of DirectX soared, GPU manufacturers began to tweak new product releases to match DirectX capabilities. For this reason, video cards are often tied to hardware support for one DirectX generation or another (DirectX 8, 9.0, or 9.0c).

To complicate matters, parts of the Direct3D API can change over time without changing DirectX generations. For example, the DirectX 9.0 specification specifies Pixel Shader 2.0 support. But the DirectX 9.0c update includes Pixel Shader 3.0. Thus, although the cards are classified as DirectX 9, they can support different sets of functions. For example, the Radeon 9700 supports Shader Model 2.0, and the Radeon X1800 supports Shader Model 3.0, although both cards can be attributed to the DirectX 9 generation.

Remember that when creating new games, developers take into account the owners of old machines and video cards, because if you ignore this segment of users, then the sales level will be lower. For this reason, several code paths are embedded in games. A game of the DirectX 9 class probably has a DirectX 8 path for compatibility, and even a DirectX 7 path. Usually, if the old path is chosen, some virtual effects that are on new video cards disappear in the game. But at least you can play even on old hardware.

Many new games require the latest version of DirectX to be installed, even if the graphics card is from the previous generation. That is, a new game that will use the DirectX 8 path still requires the latest version of DirectX 9 for a DirectX 8 class video card to be installed.

What are the differences between different versions Direct3D API in DirectX? Early versions DirectX - 3, 5, 6, and 7 - were relatively straightforward in terms of Direct3D API capabilities. Developers could choose visual effects from a list and then test their performance in the game. The next important step in graphics programming was DirectX 8. It introduced the ability to program a video card using shaders, so developers for the first time got the freedom to program effects the way they want. DirectX 8 supported Pixel Shader 1.0 to 1.3 and Vertex Shader 1.0. DirectX 8.1, an updated version of DirectX 8, received Pixel Shader 1.4 and Vertex Shader 1.1.

In DirectX 9, you can create even more complex shader programs. DirectX 9 supports Pixel Shader 2.0 and Vertex Shader 2.0. DirectX 9c, an updated version of DirectX 9, includes the Pixel Shader 3.0 specification.

DirectX 10, the upcoming version of the API, will accompany new version Windows Vista... On Windows XP, you won't be able to install DirectX 10.

HDR stands for "High Dynamic Range", high dynamic range. Playing with HDR lighting can produce a much more realistic picture than playing without it, and not all graphics cards support HDR lighting.

Before the advent of DirectX 9 graphics cards, GPUs were seriously limited by the accuracy of lighting calculations. Until now, lighting could only be calculated with 256 (8 bit) internal levels.

When DirectX 9 graphics cards were introduced, they were able to produce high-fidelity lighting - a full 24 bits or 16.7 million levels.

With 16.7 million levels and taking the next step in DirectX 9 / Shader Model 2.0 graphics performance, HDR lighting is now possible on computers. This is a rather complex technology, and you need to watch it in dynamics. In simple terms, HDR lighting increases contrast (dark shades appear darker, light shades brighter), while at the same time increasing the amount of lighting detail in dark and light areas. Playing with HDR lighting feels livelier and more realistic than without it.

GPUs compliant with the latest Pixel Shader 3.0 specification allow for higher 32-bit precision lighting and floating point blending. Thus, video cards of class SM 3.0 can support a special method HDR Lighting OpenEXRspecially designed for the film industry.

Some games that only support HDR lighting using OpenEXR will not support HDR lighting on Shader Model 2.0 graphics cards. However, games that do not rely on the OpenEXR method will run on any DirectX 9 graphics card. For example, Oblivion uses the OpenEXR HDR method and only allows HDR lighting on the latest graphics cards that support the Shader Model 3.0 specification. For example, nVidia GeForce 6800 or ATi Radeon X1800. Games that use the Half-Life 2 3D engine, the same Counter-Strike: Source and the upcoming Half-Life 2: Aftermath, allow you to enable HDR rendering on older DirectX 9 video cards that only support Pixel Shader 2.0. Examples include the GeForce 5 or ATi Radeon 9500 series.

Finally, keep in mind that all forms of HDR rendering require serious computing power and can bring even the most powerful GPUs to their knees. If you want to play the latest games with HDR lighting, then high-performance graphics are essential.

Full-screen anti-aliasing (AA abbreviated) allows you to eliminate the characteristic "ladders" at the boundaries of polygons. However, it should be borne in mind that full-screen anti-aliasing consumes a lot of computing resources, which leads to a drop in frame rates.

Anti-aliasing is highly dependent on video memory performance, so a high-speed video card with fast memory will be able to render full-screen anti-aliasing with less damage to performance than an inexpensive video card. Anti-aliasing can be enabled in various modes. For example, 4x anti-aliasing will give a better picture quality than 2x anti-aliasing, but this will be a big performance hit. If 2x anti-aliasing doubles the horizontal and vertical resolution, 4x mode quadruples it.

Textures are applied to all 3D objects in the game, and the larger the angle of the displayed surface, the more distorted the texture will look. To eliminate this effect, GPUs use texture filtering.

The first filtration method was called bilinear and produced characteristic stripes that were not very pleasant to the eye. The situation has improved with the introduction of trilinear filtering. Both options work on modern graphics cards with little or no performance loss.

Anisotropic filtering (AF) is the best way to filter textures today. Like full-screen anti-aliasing, anisotropic filtering can be enabled at different levels. For example, 8x AF gives more high quality filtering than 4x AF. Like full-screen anti-aliasing, anisotropic filtering requires a certain amount of processing power, which increases as the AF level rises.

All 3D games are built with specific specifications in mind, and one such requirement determines the texture memory that a game will need. All the necessary textures must fit into the memory of the video card during the game, otherwise the performance will drop dramatically, since accessing the texture in the RAM gives a considerable delay, not to mention the paging file on the hard disk. Therefore, if a game developer counts on 128 MB of video memory as the minimum requirement, then the set of active textures should not exceed 128 MB at any time.

Modern games have several sets of textures, so the game will work without problems on older video cards with less video memory, as well as on newer cards with more video memory. For example, a game may contain three sets of textures: 128 MB, 256 MB, and 512 MB. There are very few games that support 512 MB of video memory today, but they are still the most objective reason for buying a video card with this amount of memory. While the increase in memory has little or no effect on performance, you will get better visual quality if the game supports the appropriate texture set.

Perhaps, now these blocks are the main parts of the video chip. They run special programs known as shaders. Moreover, if earlier pixel shaders executed blocks of pixel shaders, and vertex shaders - vertex blocks, then for some time graphic architectures were unified, and these universal computing units began to deal with various calculations: vertex, pixel, geometric and even universal calculations.

For the first time, the unified architecture was applied in a video chip microsoft consoles Xbox 360, this GPU was developed by ATI (later purchased by AMD). And in video chips for personal computers unified shader units appeared in the NVIDIA GeForce 8800 board. And since then, all new video chips are based on a unified architecture that has a universal code for different shader programs (vertex, pixel, geometric, etc.), and the corresponding unified processors can execute any program ...

By the number of computing units and their frequency, one can compare the mathematical performance of different video cards. Most games are now limited by the performance of pixel shaders, so the number of these units is very important. For example, if one model of a video card is based on a GPU with 384 computational processors in its composition, and another from the same line has a GPU with 192 computational units, then with an equal frequency the second will process any type of shader twice as slow, and in general it will be the same more productive.

Although it is impossible to draw unambiguous conclusions about performance solely on the basis of the number of computing units alone, it is imperative to take into account the clock frequency and different architecture of units of different generations and chip manufacturers. Only these figures can be used to compare chips only within the same line of one manufacturer: AMD or NVIDIA. In other cases, you need to pay attention to performance tests in games or applications of interest.

Texture mapping units (TMU)

These GPU units work in conjunction with computational processors, they are used to select and filter texture and other data necessary for building a scene and general computing. The number of texture units in the video chip determines the texture performance - that is, the speed of fetching texels from textures.

Although recently more emphasis has been placed on mathematical calculations, and some of the textures are being replaced by procedural ones, the load on TMUs is still quite high, since in addition to the main textures, selections must also be made from normal and displacement maps, as well as off-screen render target buffers.

Taking into account the emphasis of many games, including on the performance of texturing units, we can say that the number of TMUs and the corresponding high texture performance are also one of the most important parameters for video chips. This parameter has a special effect on the rendering speed of the image when using anisotropic filtering, which require additional texture selections, as well as with complex algorithms for soft shadows and newfangled algorithms like Screen Space Ambient Occlusion.

Rasterization Operations Blocks (ROPs)

Rasterization units perform operations of recording pixels calculated by the video card into buffers and operations of their mixing (blending). As we have already noted above, the performance of ROP units affects the fill rate and this is one of the main characteristics of video cards of all time. And although recently its value has also slightly decreased, there are still cases when application performance depends on the speed and the number of ROPs. This is most often due to the active use of post-processing filters and antialiasing enabled at high game settings.

Basic components of the video card:

  • outputs;
  • interfaces;
  • cooling system;
  • graphics processor;
  • video memory.

Graphic technologies:

  • dictionary;
  • gPU architecture: functions
    vertex / pixel units, shaders, fill rate, texture / raster units, pipelines;
  • gPU architecture: technology
    technical process, GPU frequency, local video memory (size, bus, type, frequency), solutions with several video cards;
  • visual functions
    DirectX, high dynamic range (HDR), full screen anti-aliasing, texture filtering, high definition textures.

Glossary of basic graphic terms

Refresh Rate

Just like in a movie theater or TV, your computer simulates motion on a monitor by displaying a sequence of frames. The refresh rate of the monitor indicates how many times per second the picture will be refreshed on the screen. For example, 75 Hz corresponds to 75 updates per second.

If the computer is processing frames faster than the monitor can display, then problems may appear in games. For example, if the computer renders 100 frames per second, and the monitor refresh rate is 75 Hz, then due to overlays, the monitor can display only part of the picture during its refresh period. As a result, visual artifacts appear.

As a solution, you can enable V-Sync (vertical sync). It limits the number of frames emitted by the computer to the refresh rate of the monitor, preventing artifacts. If you enable V-Sync, then the number of frames rendered in the game will never exceed the refresh rate. That is, at 75 Hz, the computer will output no more than 75 frames per second.

Pixel

The word "Pixel" stands for " picture element ”is an image element. It is a tiny dot on the display that can glow in a specific color (in most cases, a hue is derived from a combination of three basic colors: red, green, and blue). If the screen resolution is 1024 × 768, then you can see a matrix of 1024 pixels in width and 768 pixels in height. Together, the pixels make up the image. The picture on the screen is updated from 60 to 120 times per second, depending on the type of display and the data produced by the output of the video card. CRT monitors update the display line by line, while flat panel LCD monitors can update each pixel individually.

Vertex

All objects in the 3D scene are composed of vertices. A vertex is a point in three-dimensional space with coordinates X, Y and Z. Several vertices can be grouped into a polygon: most often it is a triangle, but more complex shapes are also possible. Then a texture is applied to the polygon, which makes the object look realistic. The 3D cube shown in the illustration above has eight vertices. More complex objects have curved surfaces, which actually consist of a very large number of vertices.

Texture

A texture is simply a 2D image of any size that is superimposed on a 3D object to simulate its surface. For example, our 3D cube has eight vertices. Before texture mapping, it looks like a simple box. But when we apply the texture, the box becomes colored.

Shader

The pixel shader software allows the graphics card to produce impressive effects like that in Elder Scrolls: Oblivion.

Today there are two types of shaders: vertex and pixel. Vertex shaders can modify or transform 3D objects. Pixel shaders allow you to change the colors of pixels based on data. Imagine a light source in a 3D scene that makes the illuminated objects glow brighter, while casting shadows on other objects at the same time. All this is realized by changing the color information of the pixels.

Pixel shaders are used to create complex effects in your favorite games. For example, shader code can make the pixels surrounding the 3D sword glow brighter. Another shader can process all the vertices of a complex 3D object and simulate an explosion. More and more game developers are using sophisticated shaders to create realistic graphics. Almost every modern game with rich graphics uses shaders.

With the release of the next Microsoft DirectX 10 Application Programming Interface (API), a third type of shader called geometry shaders will be released. With their help, it will be possible to break objects, modify and even destroy them depending on the desired result. The third type of shader can be programmed in the same way as the first two, but its role will be different.

Fill Rate

Very often on the box with a video card, you can find the fill rate value. Basically, the fill rate indicates how fast the GPU can deliver pixels. In older video cards, you could find the triangle fill rate. But today there are two types of fill rate: pixel fill rate and texture fill rate. As mentioned, the pixel fill rate corresponds to the pixel output rate. It is calculated as the number of raster operations (ROP) multiplied by the clock frequency.

ATi and nVidia calculate texture fill rates differently. nVidia thinks that speed is obtained by multiplying the number of pixel pipelines by the clock speed. ATi multiplies the number of texture units by the clock speed. In principle, both methods are correct, since nVidia uses one texture unit per pixel shader unit (that is, one per pixel pipeline).

With these definitions in mind, let me move on and discuss the most important functions of a GPU, what they do, and why they are so important.

GPU architecture: features

The realism of 3D graphics is highly dependent on the performance of the video card. The more blocks of pixel shaders the processor contains and the higher the frequency, the more effects can be applied to a 3D scene to improve its visual perception.

The GPU contains many different functional blocks. By the number of some components, you can estimate how powerful the GPU is. Before moving on, let me review the most important functional blocks.

Vertex processors (vertex shader units)

Like pixel shader units, vertex processors execute shader code that touches vertices. Since a higher vertex budget allows for more complex 3D objects, vertex processor performance is very important in 3D scenes with complex objects or a large number of objects. However, vertex shader units are still not as obviously affecting performance as pixel processors.

Pixel Processors (Pixel Shader Units)

A pixel processor is a component of the graphics chip dedicated to processing pixel shader programs. These processors perform pixel-only calculations. Because pixels contain color information, pixel shaders can achieve impressive graphical effects. For example, most of the water effects you've seen in games are created using pixel shaders. Typically the number of pixel processors is used to compare the pixel performance of video cards. If one card is equipped with eight pixel shader units and the other with 16 units, then it is quite logical to assume that a video card with 16 units will process complex pixel programs faster. You should also consider the clock speed, but today doubling the number of pixel processors is more energy efficient than doubling the frequency of the graphics chip.

Unified shaders

Unified (uniform) shaders have not yet arrived in the PC world, but the upcoming DirectX 10 standard relies on a similar architecture. That is, the structure of the code of vertex, geometric and pixel programs will be the same, although the shaders will perform different work. The new spec can be viewed on the Xbox 360, where the GPU was specially designed by ATi for Microsoft. It will be quite interesting to see what potential the new DirectX 10 has.

Texture Mapping Units (TMU)

Textures should be selected and filtered. This work is done by the texture mapping units, which work in conjunction with the pixel and vertex shaders. The TMU's job is to apply texture operations to the pixels. The number of texture units in a GPU is often used to compare the texture performance of video cards. It is quite reasonable to assume that a video card with a higher number of TMUs will give higher texture performance.

Raster Operator Units (ROPs)

RIPs are responsible for writing pixel data into memory. The rate at which this operation is performed is the fill rate. In the early days of 3D accelerators, ROPs and fill rates were very important characteristics of graphics cards. Today, ROP performance is still important, but the performance of a video card is no longer limited by these blocks, as it used to be. Therefore, the performance (and number) of ROPs is already rarely used to estimate the speed of a video card.

Conveyors

Pipelines are used to describe the architecture of video cards and provide a very visual representation of the performance of the GPU.

Conveyor is not a strict technical term. The GPU uses different pipelines to perform different functions. Historically, a pipeline was understood as a pixel processor that was connected to its own texture mapping unit (TMU). For example, the Radeon 9700 video card uses eight pixel processors, each of which is connected to its own TMU, so the card is considered to have eight pipelines.

But it is very difficult to describe modern processors by the number of pipelines. Compared to previous designs, the new processors use a modular, fragmented structure. ATi can be considered an innovator in this area, which, with the X1000 line of video cards, switched to a modular structure, which allowed achieving performance gains through internal optimization. Some CPU blocks are used more than others, and to improve GPU performance, ATi has tried to balance the number of blocks needed and die area (this cannot be oversized). In this architecture, the term "pixel pipeline" has already lost its meaning, since pixel processors are no longer connected to their own TMUs. For example, the ATi Radeon X1600 GPU has 12 Pixel Shaders and just four TMUs. Therefore, it cannot be said that the architecture of this processor has 12 pixel pipelines, just like saying that there are only four of them. However, by tradition, pixel pipelines are still mentioned.

Taking these assumptions into account, the number of pixel pipelines in a GPU is often used to compare video cards (with the exception of the ATi X1x00 line). For example, if we take video cards with 24 and 16 pipelines, then it is quite reasonable to assume that a card with 24 pipelines will be faster.

GPU architecture: technology

Technical process

This term refers to the size of one element (transistor) of the chip and the precision of the manufacturing process. Improving technical processes allows you to get smaller elements. For example, the 0.18 micron process produces larger elements than the 0.13 micron process, so it is not as efficient. Smaller transistors operate on lower voltages. In turn, a decrease in voltage leads to a decrease in thermal resistance, which gives a decrease in the amount of heat generated. Improving the technical process allows to reduce the distance between the functional blocks of the chip, and data transfer takes less time. Shorter distances, lower voltages, and other improvements allow higher clock speeds to be achieved.

The understanding is somewhat complicated by the fact that today both micrometers (μm) and nanometers (nm) are used to denote the technical process. In fact, everything is very simple: 1 nanometer is equal to 0.001 micrometer, so 0.09-micron and 90-nm technical processes are one and the same. As noted above, a smaller process technology allows you to get higher clock speeds. For example, if we compare video cards with 0.18 micron and 0.09 micron (90 nm) chips, then it is quite reasonable to expect a higher frequency from a 90 nm card.

GPU clock speed

GPU clock speeds are measured in megahertz (MHz), which is millions of clock cycles per second.

The clock speed directly affects the performance of the GPU. The higher it is, the more work can be done in a second. For the first example, let's take nVidia GeForce 6600 and 6600 GT graphics cards: the 6600 GT GPU runs at 500 MHz, while the regular 6600 card runs at 400 MHz. Since the processors are technically identical, a 20% increase in the 6600 GT's clock speed translates into better performance.

But clock speed is not everything. It should be borne in mind that architecture greatly affects performance. For the second example, let's take the GeForce 6600 GT and GeForce 6800 GT video cards. The 6600 GT has a GPU frequency of 500 MHz, but the 6800 GT runs at only 350 MHz. Now let's take into account that the 6800 GT uses 16 pixel pipelines, while the 6600 GT uses only eight. Therefore, a 6800 GT with 16 pipelines at 350 MHz will give about the same performance as a processor with eight pipelines and twice the clock speed (700 MHz). With that said, the clock speed can be used to compare performance.

Local video memory

Video card memory has a huge impact on performance. But different memory parameters affect differently.

Video memory size

The amount of video memory can probably be called the most overrated parameter of a video card. Inexperienced consumers often use the amount of video memory to compare different cards with each other, but in reality the amount has little effect on performance compared to such parameters as the memory bus frequency and interface (bus width).

In most cases, a card with 128 MB of video memory will perform almost the same as a card with 256 MB. Of course, there are situations where more memory leads to increased performance, but remember that more memory will not automatically lead to an increase in speed in games.

Where volume is useful is in games with high resolution textures. Game developers provide several texture sets for the game. And the more memory there will be on the video card, the higher resolution the loaded textures can have. High resolution textures give higher clarity and detail in the game. Therefore, it is quite reasonable to take a card with a large amount of memory if all other criteria are the same. Let us remind you again that the memory bus width and its frequency have a much stronger effect on performance than the amount of physical memory on the card.

Memory bus width

Memory bus width is one of the most important aspects of memory performance. Modern buses are 64 to 256 bits wide, and in some cases even 512 bits. The wider the memory bus, the more information it can transmit per clock cycle. And this directly affects performance. For example, if we take two buses with equal frequencies, then theoretically a 128-bit bus will transfer twice as much data per clock as a 64-bit one. And the 256-bit bus is twice as large.

Higher bus bandwidth (expressed in bits or bytes per second, 1 byte \u003d 8 bits) results in higher memory performance. That is why the memory bus is much more important than its size. At equal frequencies, the 64-bit memory bus operates at a speed of only 25% of the 256-bit one!

Let's take the following example. A video card with 128 MB of video memory, but with a 256-bit bus, gives a much higher memory performance than a 512 MB model with a 64-bit bus. It is important to note that for some ATi X1x00 cards the manufacturers indicate the specifications of the internal memory bus, but we are interested in the parameters of the external bus. For example, the X1600's internal ring bus is 256 bits wide, but the external one is only 128 bits wide. And in reality, the memory bus operates at 128-bit performance.

Memory types

Memory can be divided into two main categories: SDR (single data transfer) and DDR (double data transfer), in which data is transferred twice as fast per clock. Today, SDR single transmission technology is obsolete. Since DDR memory transfers data twice as fast as SDR memory, it is important to remember that video cards with DDR memory are usually indicated at twice the frequency, and not the physical one. For example, if DDR memory is listed as 1000 MHz, then this is the effective frequency that regular SDR memory must operate at to give the same bandwidth. In fact, the physical frequency is 500 MHz.

For this reason, many are surprised when the frequency of 1200 MHz DDR is indicated for the memory of their video card, and the utilities report 600 MHz. So you have to get used to it. DDR2 and GDDR3 / GDDR4 memory works in the same way, that is, with twice the data transfer. The difference between DDR, DDR2, GDDR3 and GDDR4 lies in the manufacturing technology and some details. DDR2 can run at higher frequencies than DDR memory, and DDR3 can run even higher than DDR2.

Memory bus frequency

Like a processor, memory (or, more accurately, a memory bus) operates at specific clock speeds, measured in megahertz. Here, increasing clock speeds directly affects memory performance. And the memory bus frequency is one of the parameters used to compare the performance of video cards. For example, if all other characteristics (memory bus width, etc.) are the same, then it is quite logical to say that a video card with 700 MHz memory is faster than a 500 MHz one.

Again, clock speed isn't everything. A 700 MHz memory with a 64-bit bus will be slower than a 400 MHz memory with a 128-bit bus. The performance of 400 MHz memory on a 128-bit bus is roughly equivalent to 800 MHz memory on a 64-bit bus. It should also be remembered that the frequencies of the GPU and memory are completely different parameters, and they usually differ.

Graphics card interface

All data transferred between the video card and the processor goes through the video card interface. Today, three types of interfaces are used for video cards: PCI, AGP and PCI Express. They differ in bandwidth and other characteristics. It is clear that the higher the bandwidth, the higher the exchange rate. However, only the most modern cards can use high bandwidth, and even then only partially. At some point, the interface speed has ceased to be a "bottleneck", today it is simply enough.

The slowest bus for which video cards were produced is PCI (Peripheral Components Interconnect). If you do not go into history, of course. PCI really hurt the performance of video cards, so they switched to the AGP (Accelerated Graphics Port) interface. But even the AGP 1.0 and 2x specifications limited performance. When the standard increased the speed to AGP 4x, we began to approach the practical limit of the bandwidth that video cards can use. The AGP 8x specification doubled the bandwidth once again compared to AGP 4x (2.16 GB / s), but we did not get a tangible increase in graphics performance.

The newest and fastest bus is PCI Express. Newer graphics cards typically use PCI Express x16, which combines 16 PCI Express lanes for a total bandwidth of 4 GB / s (one way). This is twice the bandwidth of AGP 8x. The PCI Express bus provides the mentioned bandwidth for both directions (data transfer to and from the video card). But the speed of the AGP 8x standard was already sufficient, so we have not yet encountered a situation where the transition to PCI Express gave a performance increase as compared to AGP 8x (if other hardware parameters are the same). For example, the AGP version of the GeForce 6800 Ultra will work identically to the 6800 Ultra for PCI Express.

Today it is best to buy a card with a PCI Express interface, it will hold out on the market for several more years. The most productive cards are no longer available with the AGP 8x interface, and PCI Express solutions, as a rule, are easier to find than AGP analogs, and they cost less.

Multi-GPU solutions

Using multiple graphics cards to boost graphics performance is not a new idea. In the early days of 3D graphics, the 3dfx digger entered the market with two graphics cards running in parallel. But with the disappearance of 3dfx, the technology of collaboration of several consumer video cards was consigned to oblivion, although ATi produced similar systems for professional simulators since the release of the Radeon 9700. A couple of years ago the technology returned to the market: with the advent of nVidia SLI solutions and, a little later, ATi Crossfire.

Sharing multiple graphics cards provides enough performance to run the game at high quality settings in high definition. But choosing one solution or another is not so easy.

To begin with, solutions based on multiple video cards require a lot of energy, so the power supply must be powerful enough. All this heat will have to be removed from the video card, so you need to pay attention to the PC case and cooling so that the system does not overheat.

Also, remember that SLI / CrossFire requires an appropriate motherboard (either for one technology or another), which usually costs more than standard models. The nVidia SLI configuration will only work on certain nForce4 boards, and ATi CrossFire cards will only work on motherboards with the CrossFire chipset or on certain Intel models. To complicate matters further, some CrossFire configurations require one of the cards to be special: the CrossFire Edition. After the release of CrossFire for some models of video cards, ATi allowed enabling the technology of collaboration via the PCI Express bus, and with the release of new driver versions, the number of possible combinations increases. Still, hardware CrossFire with a corresponding CrossFire Edition card gives better performance. But CrossFire Edition cards are also more expensive than regular models. For now, you can enable CrossFire software mode (no CrossFire Edition card) on Radeon X1300, X1600 and X1800 GTO graphics cards.

There are other factors to consider. While two graphics cards working together give a performance boost, it is far from doubled. But you will give twice as much money. Most often, the productivity gain is 20-60%. And in some cases, due to additional computational costs for reconciliation, there is no gain at all. For this reason, multi-card configurations are unlikely to justify themselves with cheaper models, as a more expensive video card will usually always outperform a couple of cheaper cards. In general, it makes no sense to take an SLI / CrossFire solution for most consumers. But if you want to enable all the quality enhancement options or play at extreme resolutions, for example, 2560 × 1600, when you need to render more than 4 million pixels per frame, then you can't do without two or four paired video cards.

Visual functions

In addition to purely hardware specifications, different generations and models of GPUs may differ in feature set. For example, it is often said that cards of the ATi Radeon X800 XT generation are compatible with Shader Model 2.0b (SM), while the nVidia GeForce 6800 Ultra is compatible with SM 3.0, although their hardware specifications are close to each other (16 pipelines). Therefore, many consumers make a choice in favor of one solution or another, without even knowing what this difference means.

Microsoft DirectX and Shader Model versions

These names are most often used in controversy, but few people know what they really mean. To understand, let's start with a history of graphics APIs. DirectX and OpenGL are graphical APIs, which are Application Programming Interfaces - open code standards available to everyone.

Before the advent of graphics APIs, each GPU manufacturer used its own mechanism for communicating with games. Developers had to write separate code for each GPU they wanted to support. A very expensive and ineffective approach. To solve this problem, APIs for 3D graphics were developed so that developers write code for a specific API, and not for a particular video card. After that, compatibility problems fell on the shoulders of video card manufacturers, who had to ensure that the drivers would be API compatible.

The only complication is that today there are two different APIs, namely Microsoft DirectX and OpenGL, where GL stands for Graphics Library. Since the DirectX API is more popular in games today, we will focus on it. And this standard influenced the development of games more strongly.

DirectX is Microsoft's creation. In fact, DirectX includes several APIs, only one of which is used for 3D graphics. DirectX includes APIs for sound, music, input devices, and more. The Direct3D API is responsible for 3D graphics in DirectX. When they talk about video cards, they mean it, therefore, in this regard, the terms DirectX and Direct3D are interchangeable.

DirectX is updated periodically as graphics technology advances and game developers introduce new ways to program games. As the popularity of DirectX soared, GPU manufacturers began to tweak new product releases to match DirectX capabilities. For this reason, video cards are often tied to hardware support for one DirectX generation or another (DirectX 8, 9.0, or 9.0c).

To complicate matters, parts of the Direct3D API can change over time without changing DirectX generations. For example, the DirectX 9.0 specification specifies Pixel Shader 2.0 support. But the DirectX 9.0c update includes Pixel Shader 3.0. Thus, although the cards are classified as DirectX 9, they can support different sets of functions. For example, the Radeon 9700 supports Shader Model 2.0, and the Radeon X1800 supports Shader Model 3.0, although both cards can be attributed to the DirectX 9 generation.

Remember that when creating new games, developers take into account the owners of old machines and video cards, because if you ignore this segment of users, then the sales level will be lower. For this reason, several code paths are embedded in games. A game of the DirectX 9 class probably has a DirectX 8 path for compatibility, and even a DirectX 7 path. Usually, if the old path is chosen, some virtual effects that are on new video cards disappear in the game. But at least you can play even on the old hardware.

Many new games require the latest version of DirectX to be installed, even if the graphics card is from the previous generation. That is, a new game that will use the DirectX 8 path still requires the latest version of DirectX 9 for a DirectX 8 class video card to be installed.

What are the differences between the different versions of the Direct3D API in DirectX? Early versions of DirectX - 3, 5, 6, and 7 - were relatively simple in terms of the Direct3D APIs. Developers could choose visual effects from a list and then test their performance in the game. The next important step in graphics programming was DirectX 8. It introduced the ability to program a video card using shaders, so developers for the first time got the freedom to program effects the way they want. DirectX 8 supported Pixel Shader 1.0 to 1.3 and Vertex Shader 1.0. DirectX 8.1, an updated version of DirectX 8, received Pixel Shader 1.4 and Vertex Shader 1.1.

In DirectX 9, you can create even more complex shader programs. DirectX 9 supports Pixel Shader 2.0 and Vertex Shader 2.0. DirectX 9c, an updated version of DirectX 9, includes the Pixel Shader 3.0 specification.

DirectX 10, the upcoming version of the API, will accompany the new version of Windows Vista. On Windows XP, you won't be able to install DirectX 10.

HDR lighting and OpenEXR HDR

HDR stands for High Dynamic Range, high dynamic range. Playing with HDR lighting can produce a much more realistic picture than playing without it, and not all graphics cards support HDR lighting.

Before the advent of DirectX 9 graphics cards, GPUs were seriously limited by the accuracy of lighting calculations. Until now, lighting could only be calculated with 256 (8 bit) internal levels.

When DirectX 9 graphics cards were introduced, they were able to produce high-fidelity lighting - a full 24 bits or 16.7 million levels.

With 16.7 million levels and taking the next step in DirectX 9 / Shader Model 2.0 graphics performance, HDR lighting is now possible on computers. This is a rather complex technology, and you need to watch it in dynamics. In simple terms, HDR lighting increases contrast (dark shades appear darker, light shades brighter), while at the same time increasing the amount of lighting detail in dark and light areas. Playing with HDR lighting feels livelier and more realistic than without it.

GPUs compliant with the latest Pixel Shader 3.0 specification allow for higher 32-bit precision lighting and floating point blending. Thus, video cards of the SM 3.0 class can support the special OpenEXR HDR lighting method, specially designed for the film industry.

Some games that only support HDR lighting using OpenEXR will not support HDR lighting on Shader Model 2.0 graphics cards. However, games that do not rely on the OpenEXR method will run on any DirectX 9 graphics card. For example, Oblivion uses the OpenEXR HDR method and only allows HDR lighting on the latest graphics cards that support the Shader Model 3.0 specification. For example, nVidia GeForce 6800 or ATi Radeon X1800. Games that use the Half-Life 2 3D engine, the same Counter-Strike: Source and the upcoming Half-Life 2: Aftermath, allow you to enable HDR rendering on older DirectX 9 video cards that only support Pixel Shader 2.0. Examples include the GeForce 5 or ATi Radeon 9500 series.

Finally, keep in mind that all forms of HDR rendering require significant processing power and can bring even the most powerful GPUs to their knees. If you want to play the latest games with HDR lighting, then high-performance graphics are essential.

Full screen anti-aliasing

Full-screen anti-aliasing (AA abbreviated) allows you to eliminate the characteristic "ladders" at the boundaries of polygons. However, it should be borne in mind that full-screen anti-aliasing consumes a lot of computing resources, which leads to a drop in frame rates.

Anti-aliasing is highly dependent on video memory performance, so a high-speed video card with fast memory will be able to render full-screen anti-aliasing with less damage to performance than an inexpensive video card. Anti-aliasing can be enabled in various modes. For example, 4x anti-aliasing will give a better picture quality than 2x anti-aliasing, but this will be a big performance hit. If 2x anti-aliasing doubles the horizontal and vertical resolution, 4x mode quadruples it.

Texture filtering

Textures are applied to all 3D objects in the game, and the larger the angle of the displayed surface, the more distorted the texture will look. To eliminate this effect, GPUs use texture filtering.

The first filtration method was called bilinear and produced characteristic stripes that were not very pleasant to the eye. The situation has improved with the introduction of trilinear filtering. Both options work on modern graphics cards with little or no performance loss.

Anisotropic filtering (AF) is the best way to filter textures today. Like full-screen anti-aliasing, anisotropic filtering can be enabled at different levels. For example, 8x AF provides a better filtering quality than 4x AF. Like full-screen anti-aliasing, anisotropic filtering requires a certain amount of processing power, which increases as the AF level rises.

High resolution textures

All 3D games are built with specific specifications in mind, and one such requirement determines the texture memory that a game will need. All the necessary textures must fit into the memory of the video card during the game, otherwise the performance will drop dramatically, since accessing the texture in the RAM gives a considerable delay, not to mention the paging file on the hard disk. Therefore, if a game developer counts on 128 MB of video memory as the minimum requirement, then the set of active textures should not exceed 128 MB at any time.

Modern games have several sets of textures, so the game will work without problems on older video cards with less video memory, as well as on newer cards with more video memory. For example, a game may contain three sets of textures: 128 MB, 256 MB, and 512 MB. There are very few games that support 512 MB of video memory today, but they are still the most objective reason for buying a video card with this amount of memory. While the increase in memory has little or no effect on performance, you will get better visual quality if the game supports the appropriate texture set.

What you need to know about video cards?

In contact with

graduate work

Rasterization Operations Blocks (ROPs)

Rasterization units perform operations of recording pixels calculated by the video card into buffers and operations of their mixing (blending). As noted above, the performance of ROP units affects the fill rate and this is one of the main characteristics of video cards. And although its value has slightly decreased recently, there are still cases when application performance strongly depends on the speed and the number of ROPs. This is most often due to the active use of post-processing filters and antialiasing enabled at high image settings.

Automation of accounting of banking operations and its implementation in the program "1C Accounting"

If all the activities of the company can be divided into business processes, then the processes can be divided into smaller components. In the methodology for building business processes, this is called decomposition ...

Internal and peripherals PC

Study of a discrete population model using the Model Vision Studium program

The main "building block" of a description in MVS is a block. A block is an active object that functions in parallel and independently of other objects in continuous time. A block is an oriented block ...

Using LMS Moodle in the educational process

A central area is required for any course. There may be no left and right column with blocks. But the various blocks that make up the Moodle Learning Management System increase the functionality ...

Research of teacher opportunities in the Moodle distance learning system

To add new resources, elements, blocks or edit existing ones in your course, click the Edit button located in the control block. General view of the course window in edit mode is shown in Figure 2.5: Figure 2 ...

Simulation in software development

The UML vocabulary includes three types of building blocks: entities; relationship; charts. Entities are abstractions that are the main elements of the model ...

Modeling library work

Operators - blocks form the logic of the model. GPSS / PC has about 50 different types blocks, each of which performs its specific function. Each of these blocks has a corresponding translator subroutine ...

Key features of CSS3

You can design the text in an original way using a variety of conversational blocks, which, again, are made on the basis of CSS3 technologies. (Fig 5.) Fig 5 ...

Key features of CSS3

The effect of translucency of the element is clearly visible in the background image and has become widespread in different operating systemsbecause it looks stylish and beautiful ...

Training text document in accordance with STP 01-01

Expansion units (cards) or cards (Cards), as they are sometimes called, can be used to service devices connected to the IBM PC. They can be used to connect additional devices (display adapters, disk controller, etc.) ...

Breakage and repair of a video card

These units work in conjunction with shader processors of all the specified types, they are used to select and filter texture data required to build a scene ...

Production process registration program for automated system electronic enterprise management

There are 11 types of blocks from which a specific MES system can be made for a particular production ...

Development of a software package for calculating compensation for major repairs

At the lowest level of granularity, Oracle database data is stored in data blocks. One block of data corresponds to a certain number of bytes of physical disk space ...

Development of hardware-software security systems and management of transport platforms in Simatic Step-7

System blocks are components of the operating system. Smells can be mischievous programs (system functions, SFC) or data (system blocks of data, SDB). System blocks provide access to important system functions ...

Devices included in the computer

Expansion units (cards) or cards (Cards), as they are sometimes called, can be used to service devices connected to the IBM PC. They can be used to connect additional devices (display adapters, disk controller, etc.) ...

Modern graphics processors contain many functional blocks, the number and characteristics of which also determine the final rendering speed, which affects the comfort of the game. By the comparative number of these blocks in different video chips, you can roughly estimate how fast one or another GPU is. Video chips have a lot of characteristics, in this section we will consider only the most important of them.

Video chip clock frequency

The operating frequency of a GPU is usually measured in megahertz, that is, millions of clock cycles per second. This characteristic directly affects the performance of the video chip - the higher it is, the more work the GPU can perform per unit of time, process a larger number of vertices and pixels. Real-life example: the frequency of the video chip installed on the Radeon HD 6670 is 840 MHz, and the exact same chip in the Radeon HD 6570 runs at 650 MHz. Accordingly, all the main performance characteristics will differ. But not only the operating frequency of the chip determines the performance, its speed is strongly influenced by the graphics architecture itself: the device and the number of execution units, their characteristics, etc.

In some cases, the clock speed of individual GPU blocks differs from the clock speed of the rest of the chip. That is, different parts of the GPU operate at different frequencies, and this is done to increase efficiency, because some units are capable of operating at higher frequencies, while others are not. Most of these GPUs come with geForce video cards from NVIDIA. From recent examples, we will give a video chip in the GTX 580 model, most of which operates at a frequency of 772 MHz, and the universal computing units of the chip have a doubled frequency - 1544 MHz.

Filling rate (fill rate)

The fill rate shows how fast the GPU is capable of rendering pixels. There are two types of fillrate: pixel fill rate and texel rate. Pixel fill rate shows the speed at which pixels are drawn on the screen and depends on the operating frequency and the number of ROPs (blocks of rasterization and blending operations), and texture is the speed of fetching texture data, which depends on the frequency of operation and the number of texture units.

For example, the peak pixel fill rate of GeForce GTX 560 Ti is 822 (chip frequency) × 32 (number of ROPs) \u003d 26304 megapixels per second, and texture - 822 × 64 (number of texture units) \u003d 52608 megatexels / s. In a simplified way, the situation is as follows - the larger the first number, the faster the video card can render the finished pixels, and the larger the second, the faster the texture data is sampled.

Although the importance of "pure" fill rate has dropped noticeably lately, giving way to the speed of calculations, these parameters are still very important, especially for games with simple geometry and relatively simple pixel and vertex calculations. So both are important for modern games, but they need to be balanced. Therefore, the number of ROP units in modern video chips is usually less than the number of texture units.

Number of computing (shader) units or processors

Perhaps, now these blocks are the main parts of the video chip. They run special programs known as shaders. Moreover, if earlier pixel shaders executed blocks of pixel shaders, and vertex shaders executed vertex blocks, then for some time graphic architectures were unified, and these universal computing units began to deal with various calculations: vertex, pixel, geometric and even universal calculations.

The unified architecture was first used in the video chip of the Microsoft Xbox 360 game console, this GPU was developed by ATI (later acquired by AMD). And in video chips for personal computers, unified shader units appeared in the NVIDIA GeForce 8800 board. And since then, all new video chips are based on a unified architecture, which has a universal code for different shader programs (vertex, pixel, geometric, etc.), and the corresponding unified processors can execute any program.

By the number of computing units and their frequency, one can compare the mathematical performance of different video cards. Most games are now limited by the performance of pixel shaders, so the number of these units is very important. For example, if one model of a video card is based on a GPU with 384 computational processors in its composition, and another from the same line has a GPU with 192 computational units, then with an equal frequency the second will process any type of shader twice as slow, and in general it will be the same more productive.

Although it is impossible to draw unambiguous conclusions about performance solely on the basis of the number of computing units alone, it is imperative to take into account the clock frequency and different architecture of units of different generations and chip manufacturers. Only these figures can be used to compare chips only within the same line of one manufacturer: AMD or NVIDIA. In other cases, you need to pay attention to performance tests in games or applications of interest.

Texture mapping units (TMU)

These GPU units work in conjunction with computational processors, they are used to select and filter texture and other data necessary for building a scene and general computing. The number of texture units in the video chip determines the texture performance - that is, the speed of fetching texels from textures.

Although recently more emphasis has been placed on mathematical calculations, and some of the textures are being replaced by procedural ones, the load on TMUs is still quite high, since in addition to the main textures, selections must also be made from normal and displacement maps, as well as off-screen render target buffers.

Taking into account the emphasis of many games, including on the performance of texturing units, we can say that the number of TMUs and the corresponding high texture performance are also one of the most important parameters for video chips. This parameter has a special effect on the rendering speed of the image when using anisotropic filtering, which require additional texture selections, as well as with complex algorithms for soft shadows and newfangled algorithms like Screen Space Ambient Occlusion.

Rasterization Operations Blocks (ROPs)

Rasterization units perform operations of recording pixels calculated by the video card into buffers and operations of their mixing (blending). As we noted above, the performance of ROP units affects the fill rate and this is one of the main characteristics of video cards of all time. And although recently its value has also slightly decreased, there are still cases when application performance depends on the speed and the number of ROPs. This is most often due to the active use of post-processing filters and antialiasing enabled at high game settings.

Once again, we note that modern video chips cannot be evaluated only by the number of different blocks and their frequency. Each GPU series uses a new architecture, in which the execution units are very different from the old ones, and the ratio of the number of different units may differ. Thus, AMD ROP units in some solutions can perform more work per clock than NVIDIA units, and vice versa. The same applies to the capabilities of TMU texture units - they are different in different generations of GPUs from different manufacturers, and this should be taken into account when comparing.

Geometric Blocks

Until recently, the number of geometry processing units was not particularly important. One GPU block was enough for most tasks, since geometry in games was quite simple and the main focus of performance was mathematical calculations. The importance of parallel processing of geometry and the number of corresponding blocks grew dramatically with the advent of support for geometry tessellation in DirectX 11. NVIDIA was the first to parallelize geometric data processing, when several corresponding blocks appeared in its GF1xx chips. Then, a similar solution was released by AMD (only in the top solutions of the Radeon HD 6700 line based on Cayman chips).

Within the framework of this material, we will not go into details, they can be read in the basic materials of our site dedicated to DirectX 11-compatible graphics processors. What is important for us in this case is that the number of geometry processing units greatly affects the overall performance in the newest games that use tessellation, such as Metro 2033, HAWX 2 and Crysis 2 (with the latest patches). And when choosing a modern gaming video card, it is very important to pay attention to geometric performance.

Video memory size

Video chips use their own memory to store the necessary data: textures, vertices, buffer data, etc. It would seem that the more there is, the better. But not everything is so simple, estimating the power of a video card by the amount of video memory is the most common mistake! Inexperienced users often overestimate the value of video memory, still using it for comparison. different models video cards. It is understandable - this parameter is indicated in the lists of characteristics of ready-made systems one of the first, and on the boxes of video cards it is written in large print. Therefore, it seems to an inexperienced buyer that since the memory is twice as large, then the speed of such a solution should be twice as high. Reality differs from this myth in that memory is different types and characteristics, and the growth of productivity grows only up to a certain amount, and after reaching it, it simply stops.

So, in every game and with certain settings and game scenes there is a certain amount of video memory, which is enough for all data. And even though you put 4 GB of video memory there, it will have no reason to speed up rendering, the speed will be limited by the execution units, which were discussed above, and there will simply be enough memory. That is why, in many cases, a video card with 1.5 GB of video memory operates at the same speed as a card with 3 GB (all other things being equal).

There are situations where more memory leads to a visible increase in performance - these are very demanding games, especially at ultra-high resolutions and at maximum quality settings. But such cases do not always occur and the amount of memory must be taken into account, not forgetting that the performance simply will not increase above a certain amount. Memory chips also have more important parameters, such as the width of the memory bus and its operating frequency. This topic is so extensive that we will dwell in more detail on the choice of the amount of video memory in the sixth part of our material.

Memory bus width

The memory bus width is the most important characteristic affecting the memory bandwidth (memory bandwidth). A large width allows more information to be transferred from video memory to GPU and back per unit of time, which has a positive effect on performance in most cases. In theory, a 256-bit bus can transfer twice as much data per clock as a 128-bit bus. In practice, the difference in rendering speed, although it does not reach two times, is very close to this in many cases, with an emphasis on video memory bandwidth.

Modern gaming video cards use different bus widths: from 64 to 384 bits (previously there were chips with a 512-bit bus), depending on the price range and release time specific model GPU. For the cheapest low-end video cards, 64 and less often 128 bits are most often used, for the middle level from 128 to 256 bits, but video cards from the upper price range use buses from 256 to 384 bits wide. The bus width can no longer grow purely due to physical constraints - the size of the GPU die is not enough to lay out more than a 512-bit bus, and this is too expensive. Therefore, the memory bandwidth is now being increased by using new types of memory (see below).

Video memory frequency

Another parameter that affects memory bandwidth is its clock frequency. And increasing memory bandwidth often directly affects the performance of a video card in 3D applications. The memory bus frequency on modern video cards ranges from 533 (1066, doubling) MHz to 1375 (5500, quadrupling) MHz, that is, it can differ by more than five times! And since the memory bandwidth depends on both the memory frequency and the width of its bus, the memory with a 256-bit bus operating at 800 (3200) MHz will have a higher bandwidth compared to the memory operating at 1000 (4000) MHz with a 128-bit bus.

Particular attention should be paid to the parameters of the memory bus width, its type and frequency of operation when buying relatively inexpensive video cards, many of which are equipped with only 128-bit or even 64-bit interfaces, which has an extremely negative effect on their performance. In general, we do not recommend buying a video card using a 64-bit video memory bus for a gaming PC at all. It is advisable to give preference to at least an average level with a minimum of 128- or 192-bit bus.

Memory types

Several different types of memory are installed on modern video cards at once. Old SDR memory with a single transfer rate is nowhere to be found anywhere, but modern types of DDR and GDDR memory have significantly different characteristics. Different types of DDR and GDDR allow you to transfer two or four times more data at the same clock frequency per unit of time, and therefore the operating frequency is often indicated double or quadrupled, multiplied by 2 or 4. So, if the frequency is indicated for DDR memory 1400 MHz, then this memory operates at a physical frequency of 700 MHz, but indicates the so-called "effective" frequency, that is, the one at which the SDR memory must operate in order to provide the same bandwidth. The same is with GDDR5, but the frequency is even quadrupled here.

The main advantage of the new types of memory is the ability to work at high clock speeds, and, accordingly, an increase in bandwidth compared to previous technologies. This is achieved due to increased latency, which, however, is not so important for video cards. The first board to use DDR2 memory was NVIDIA GeForce FX 5800 Ultra. Since then, graphics memory technologies have advanced significantly, the GDDR3 standard has been developed, which is close to the DDR2 specifications, with some changes specifically for video cards.

GDDR3 is a memory specially designed for video cards, with the same technologies as DDR2, but with improved characteristics of consumption and heat dissipation, which made it possible to create chips operating at higher clock frequencies. Despite the fact that the standard was developed by ATI, the second modification of the NVIDIA GeForce FX 5700 Ultra was the first video card to use it, and the next was the GeForce 6800 Ultra.

GDDR4 is a further development of "graphics" memory that is up to twice as fast as GDDR3. The main differences between GDDR4 and GDDR3 that are significant for users are once again increased operating frequencies and reduced power consumption. Technically, GDDR4 memory doesn't differ much from GDDR3 memory, it is a further development of the same ideas. The first video cards with GDDR4 chips on board were ATI Radeon X1950 XTX, while NVIDIA did not release products based on this type of memory at all. The advantages of the new memory chips over GDDR3 are that the power consumption of the modules can be about a third lower. This is achieved at the expense of a lower rated voltage for GDDR4.

However, GDDR4 is not widely used even in AMD solutions. Starting with the GPUs of the RV7x0 family, the video card memory controllers support a new type of GDDR5 memory, operating at an effective quadruple frequency up to 5.5 GHz and higher (theoretically, frequencies up to 7 GHz are possible), which provides a bandwidth of up to 176 GB / s using 256-bit interface. If to increase the memory bandwidth of the GDDR3 / GDDR4 memory it was necessary to use a 512-bit bus, then the transition to the use of GDDR5 made it possible to double the performance with smaller die sizes and lower power consumption.

The most modern types of video memory are GDDR3 and GDDR5, it differs from DDR in some details and also works with double / quadruple data transfer. These types of memory use some special technologies to increase the operating frequency. So, GDDR2 memory usually operates at higher frequencies than DDR, GDDR3 - at even higher, and GDDR5 provides the maximum frequency and bandwidth at the moment. But inexpensive models are still equipped with “non-graphic” DDR3 memory with a much lower frequency, so you need to choose a video card carefully.