Graphics processing unit

A graphics processing unit (GPU) is a specialized electronic circuit designed for digital image processing and to accelerate computer graphics, being present either as a discrete video card or embedded on motherboards, mobile phones, personal computers, workstations, and game consoles. GPUs were later found to be useful for non-graphic calculations involving embarrassingly parallel problems due to their parallel structure. The ability of GPUs to rapidly perform vast numbers of calculations has led to their adoption in diverse fields including artificial intelligence (AI) where they excel at handling data-intensive and computationally demanding tasks. Other non-graphical uses include the training of neural networks and cryptocurrency mining.

Remove ads

History

Summarize

Perspective

1970s

Arcade system boards have used specialized graphics circuits since the 1970s. In early video game hardware, RAM for frame buffers was expensive, so video chips composited data together as the display was being scanned out on the monitor.^[1]

A specialized barrel shifter circuit helped the CPU animate the framebuffer graphics for various 1970s arcade video games from Midway and Taito, such as Gun Fight (1975), Sea Wolf (1976), and Space Invaders (1978).^[2] The Namco Galaxian arcade system in 1979 used specialized graphics hardware that supported RGB color, multi-colored sprites, and tilemap backgrounds.^[3] The Galaxian hardware was widely used during the golden age of arcade video games, by game companies such as Namco, Centuri, Gremlin, Irem, Konami, Midway, Nichibutsu, Sega, and Taito.^[4]

The Atari 2600 in 1977 used a video shifter called the Television Interface Adaptor.^[5] Atari 8-bit computers (1979) had ANTIC, a video processor which interpreted instructions describing a "display list"—the way the scan lines map to specific bitmapped or character modes and where the memory is stored (so there did not need to be a contiguous frame buffer).^{[clarification needed]}^[6] 6502 machine code subroutines could be triggered on scan lines by setting a bit on a display list instruction.^{[clarification needed]}^[7] ANTIC also supported smooth vertical and horizontal scrolling independent of the CPU.^[8]

1980s

The NEC μPD7220 was the first implementation of a personal computer graphics display processor as a single large-scale integration (LSI) integrated circuit chip. This enabled the design of low-cost, high-performance video graphics cards such as those from Number Nine Visual Technology. It became the best-known GPU until the mid-1980s.^[9] It was the first fully integrated VLSI (very large-scale integration) metal–oxide–semiconductor (NMOS) graphics display processor for PCs, supported up to 1024×1024 resolution, and laid the foundations for the PC graphics market. It was used in a number of graphics cards and was licensed for clones such as the Intel 82720, the first of Intel's graphics processing units.^[10] The Williams Electronics arcade games Robotron 2084, Joust, Sinistar, and Bubbles, all released in 1982, contain custom blitter chips for operating on 16-color bitmaps.^[11]^[12]

In 1984, Hitachi released the ARTC HD63484, the first major CMOS graphics processor for personal computers. The ARTC could display up to 4K resolution when in monochrome mode. It was used in a number of graphics cards and terminals during the late 1980s.^[13] In 1985, the Amiga was released with a custom graphics chip including a blitter for bitmap manipulation, line drawing, and area fill. It also included a coprocessor with its own simple instruction set, that was capable of manipulating graphics hardware registers in sync with the video beam (e.g. for per-scanline palette switches, sprite multiplexing, and hardware windowing), or driving the blitter. In 1986, Texas Instruments released the TMS34010, the first fully programmable graphics processor.^[14] It could run general-purpose code but also had a graphics-oriented instruction set. During 1990–1992, this chip became the basis of the Texas Instruments Graphics Architecture ("TIGA") Windows accelerator cards.

In 1987, the IBM 8514 graphics system was released. It was one of the first video cards for IBM PC compatibles that implemented fixed-function 2D primitives in electronic hardware. Sharp's X68000, released in 1987, used a custom graphics chipset^[15] with a 65,536 color palette and hardware support for sprites, scrolling, and multiple playfields.^[16] It served as a development machine for Capcom's CP System arcade board. Fujitsu's FM Towns computer, released in 1989, had support for a 16,777,216 color palette.^[17] In 1988, the first dedicated polygonal 3D graphics boards were introduced in arcades with the Namco System 21^[18] and Taito Air System.^[19]

IBM introduced its proprietary Video Graphics Array (VGA) display standard in 1987, with a maximum resolution of 640×480 pixels. In November 1988, NEC Home Electronics announced its creation of the Video Electronics Standards Association (VESA) to develop and promote a Super VGA (SVGA) computer display standard as a successor to VGA. Super VGA enabled graphics display resolutions up to 800×600 pixels, a 56% increase.^[20]

1990s

In 1991, S3 Graphics introduced the S3 86C911, which its designers named after the Porsche 911 as an indication of the performance increase it promised.^[21] The 86C911 spawned a variety of imitators: by 1995, all major PC graphics chip makers had added 2D acceleration support to their chips.^[22] Fixed-function Windows accelerators surpassed expensive general-purpose graphics coprocessors in Windows performance, and such coprocessors faded from the PC market.

In the early- and mid-1990s, real-time 3D graphics became increasingly common in arcade, computer, and console games, which led to increasing public demand for hardware-accelerated 3D graphics. Early examples of mass-market 3D graphics hardware can be found in arcade system boards such as the Sega Model 1, Namco System 22, and Sega Model 2, and the fifth-generation video game consoles such as the Saturn, PlayStation, and Nintendo 64. Arcade systems such as the Sega Model 2 and SGI Onyx-based Namco Magic Edge Hornet Simulator in 1993 were capable of hardware T&L (transform, clipping, and lighting) years before appearing in consumer graphics cards.^[23]^[24] Another early example is the Super FX chip, a RISC-based on-cartridge graphics chip used in some SNES games, notably Doom and Star Fox. Some systems used DSPs to accelerate transformations. Fujitsu, which worked on the Sega Model 2 arcade system,^[25] began working on integrating T&L into a single LSI solution for use in home computers in 1995;^[26] the Fujitsu Pinolite, the first 3D geometry processor for personal computers, released in 1997.^[27] The first hardware T&L GPU on home video game consoles was the Nintendo 64's Reality Coprocessor, released in 1996.^[28] In 1997, Mitsubishi released the 3Dpro/2MP, a GPU capable of transformation and lighting, for workstations and Windows NT desktops;^[29] ATi used it for its FireGL 4000 graphics card, released in 1997.^[30]

The term "GPU" was coined by Sony in reference to the 32-bit Sony GPU (designed by Toshiba) in the PlayStation video game console, released in 1994.^[31]

2000s

In October 2002, with the introduction of the ATI Radeon 9700 (also known as R300), the world's first Direct3D 9.0 accelerator, pixel and vertex shaders could implement looping and lengthy floating point math, and were quickly becoming as flexible as CPUs, yet orders of magnitude faster for image-array operations. Pixel shading is often used for bump mapping, which adds texture to make an object look shiny, dull, rough, or even round or extruded.^[32]

With the introduction of the Nvidia GeForce 8 series and new generic stream processing units, GPUs became more generalized computing devices. Parallel GPUs are making computational inroads against the CPU, and a subfield of research, dubbed GPU computing or GPGPU for general purpose computing on GPU, has found applications in fields as diverse as machine learning,^[33] oil exploration, scientific image processing, linear algebra,^[34] statistics,^[35] 3D reconstruction, and stock options pricing. GPGPU was the precursor to what is now called a compute shader (e.g. CUDA, OpenCL, DirectCompute) and actually abused the hardware to a degree by treating the data passed to algorithms as texture maps and executing algorithms by drawing a triangle or quad with an appropriate pixel shader.^{[clarification needed]} This entails some overheads since units like the scan converter are involved where they are not needed (nor are triangle manipulations even a concern—except to invoke the pixel shader).^{[clarification needed]}

Nvidia's CUDA platform, first introduced in 2007,^[36] was the earliest widely adopted programming model for GPU computing. OpenCL is an open standard defined by the Khronos Group that allows for the development of code for both GPUs and CPUs with an emphasis on portability.^[37] OpenCL solutions are supported by Intel, AMD, Nvidia, and ARM, and according to a report in 2011 by Evans Data, OpenCL had become the second most popular HPC tool.^[38]

2010s

In 2010, Nvidia partnered with Audi to power their cars' dashboards, using the Tegra GPU to provide increased functionality to cars' navigation and entertainment systems.^[39] Advances in GPU technology in cars helped advance self-driving technology.^[40] AMD's Radeon HD 6000 series cards were released in 2010, and in 2011 AMD released its 6000M Series discrete GPUs for mobile devices.^[41] The Kepler line of graphics cards by Nvidia were released in 2012 and were used in the Nvidia's 600 and 700 series cards. A feature in this GPU microarchitecture included GPU boost, a technology that adjusts the clock-speed of a video card to increase or decrease it according to its power draw.^[42] The Kepler microarchitecture was manufactured.

The PS4 and Xbox One were released in 2013; they both use GPUs based on AMD's Radeon HD 7850 and 7790.^[43] Nvidia's Kepler line of GPUs was followed by the Maxwell line, manufactured on the same process. Nvidia's 28 nm chips were manufactured by TSMC in Taiwan using the 28 nm process. Compared to the 40 nm technology from the past, this manufacturing process allowed a 20 percent boost in performance while drawing less power.^[44]^[45] Virtual reality headsets have high system requirements; manufacturers recommended the GTX 970 and the R9 290X or better at the time of their release.^[46]^[47] Cards based on the Pascal microarchitecture were released in 2016. The GeForce 10 series of cards are of this generation of graphics cards. They are made using the 16 nm manufacturing process which improves upon previous microarchitectures.^[48]

In 2018, Nvidia launched the RTX 20 series GPUs that added ray-tracing cores to GPUs, improving their performance on lighting effects.^[49] Polaris 11 and Polaris 10 GPUs from AMD are fabricated by a 14 nm process. Their release resulted in a substantial increase in the performance per watt of AMD video cards.^[50] AMD also released the Vega GPU series for the high end market as a competitor to Nvidia's high end Pascal cards, also featuring HBM2 like the Titan V.

In 2019, AMD released the successor to their Graphics Core Next (GCN) microarchitecture/instruction set. Dubbed RDNA, the first product featuring it was the Radeon RX 5000 series of video cards.^[51] The company announced that the successor to the RDNA microarchitecture would be incremental (a "refresh"). AMD unveiled the Radeon RX 6000 series, its RDNA 2 graphics cards with support for hardware-accelerated ray tracing.^[52] The product series, launched in late 2020, consisted of the RX 6800, RX 6800 XT, and RX 6900 XT.^[53]^[54] The RX 6700 XT, which is based on Navi 22, was launched in early 2021.^[55]

The PlayStation 5 and Xbox Series X and Series S were released in 2020; they both use GPUs based on the RDNA 2 microarchitecture with incremental improvements and different GPU configurations in each system's implementation.^[56]^[57]^[58]

2020s

In the 2020s, GPUs have been increasingly used for calculations involving embarrassingly parallel problems, such as training of neural networks on enormous datasets that are needed for large language models. Specialized processing cores on some modern workstation's GPUs are dedicated for deep learning since they have significant FLOPS performance increases, using 4×4 matrix multiplication and division, resulting in hardware performance up to 128 TFLOPS in some applications.^[59] These tensor cores are expected to appear in consumer cards, as well.^{[needs update]}^[60]

Remove ads

GPU companies

Many companies have produced GPUs under a number of brand names. In 2009,^{[needs update]} Intel, Nvidia, and AMD/ATI were the market share leaders, with 49.4%, 27.8%, and 20.6% market share respectively. In addition, Matrox^[61] produces GPUs. Chinese companies such as Jingjia Micro have also produced GPUs for the domestic market although in terms of worldwide sales, they still lag behind market leaders.^[62]

Remove ads

Computational functions

Several factors of GPU construction affect the performance of the card for real-time rendering, such as the size of the connector pathways in the semiconductor device fabrication, the clock signal frequency, and the number and size of various on-chip memory caches. Performance is also affected by the number of streaming multiprocessors (SM) for NVidia GPUs, or compute units (CU) for AMD GPUs, or Xe cores for Intel discrete GPUs, which describe the number of on-silicon processor core units within the GPU chip that perform the core calculations, typically working in parallel with other SM/CUs on the GPU. GPU performance is typically measured in floating point operations per second (FLOPS); GPUs in the 2010s and 2020s typically deliver performance measured in teraflops (TFLOPS). This is an estimated performance measure, as other factors can affect the actual display rate.^[63]

2D graphics APIs

An earlier GPU may support one or more 2D graphics API for 2D acceleration, such as GDI and DirectDraw.^[64]

GPU forms

Summarize

Perspective

Terminology

In the 1970s, the term "GPU" originally stood for graphics processor unit and described a programmable processing unit working independently from the CPU that was responsible for graphics manipulation and output.^[65]^[66] In 1994, Sony used the term (now standing for graphics processing unit) in reference to the PlayStation console's Toshiba-designed Sony GPU.^[31] The term was popularized by Nvidia in 1999, who marketed the GeForce 256 as "the world's first GPU".^[67] It was presented as a "single-chip processor with integrated transform, lighting, triangle setup/clipping, and rendering engines".^[68] Rival ATI Technologies coined the term "visual processing unit" or VPU with the release of the Radeon 9700 in 2002.^[69] The AMD Alveo MA35D features dual VPU’s, each using the 5 nm process in 2023.^[70]

In personal computers, there are two main forms of GPUs. Each has many synonyms:^[71]

Dedicated graphics also called discrete graphics.
Integrated graphics also called shared graphics solutions, integrated graphics processors (IGP), or unified memory architecture (UMA).

Dedicated graphics processing unit

Dedicated graphics processing units uses RAM that is dedicated to the GPU rather than relying on the computer’s main system memory. This RAM is usually specially selected for the expected serial workload of the graphics card (see GDDR). Sometimes systems with dedicated discrete GPUs were called "DIS" systems as opposed to "UMA" systems (see next section).^[72]

Technologies such as Scan-Line Interleave by 3dfx, SLI and NVLink by Nvidia and CrossFire by AMD allow multiple GPUs to draw images simultaneously for a single screen, increasing the processing power available for graphics. These technologies, however, are increasingly uncommon; most games do not fully use multiple GPUs, as most users cannot afford them.^[73]^[74]^[75] Multiple GPUs are still used on supercomputers (like in Summit), on workstations to accelerate video (processing multiple videos at once)^[76]^[77]^[78] and 3D rendering,^[79] for VFX,^[80] GPGPU workloads and for simulations,^[81] and in AI to expedite training, as is the case with Nvidia's lineup of DGX workstations and servers, Tesla GPUs, and Intel's Ponte Vecchio GPUs.

Integrated graphics processing unit

Integrated graphics processing units (IGPU), integrated graphics, shared graphics solutions, integrated graphics processors (IGP), or unified memory architectures (UMA) use a portion of a computer's system RAM rather than dedicated graphics memory. IGPs can be integrated onto a motherboard as part of its northbridge chipset,^[82] or on the same die (integrated circuit) with the CPU (like AMD APU or Intel HD Graphics). On certain motherboards,^[83] AMD's IGPs can use dedicated sideport memory: a separate fixed block of high performance memory that is dedicated for use by the GPU. As of early 2007^[update] computers with integrated graphics account for about 90% of all PC shipments.^[84]^{[needs update]} They are less costly to implement than dedicated graphics processing, but tend to be less capable. Historically, integrated processing was considered unfit for 3D games or graphically intensive programs but could run less intensive programs such as Adobe Flash. Examples of such IGPs would be offerings from SiS and VIA circa 2004.^[85] However, modern integrated graphics processors such as AMD Accelerated Processing Unit and Intel Graphics Technology (HD, UHD, Iris, Iris Pro, Iris Plus, and Xe-LP) can handle 2D graphics or low-stress 3D graphics.

Since GPU computations are memory-intensive, integrated processing may compete with the CPU for relatively slow system RAM, as it has minimal or no dedicated video memory. IGPs use system memory with bandwidth up to a current maximum of 128 GB/s, whereas a discrete graphics card may have a bandwidth of more than 1000 GB/s between its VRAM and GPU core. This memory bus bandwidth can limit the performance of the GPU, though multi-channel memory can mitigate this deficiency.^[86] Older integrated graphics chipsets lacked hardware transform and lighting, but newer ones include it.^[87]^[88]

On systems with "Unified Memory Architecture" (UMA), including modern AMD processors with integrated graphics,^[89] modern Intel processors with integrated graphics,^[90] Apple processors, the PS5 and Xbox Series (among others), the CPU cores and the GPU block share the same pool of RAM and memory address space.

Stream processing and general purpose GPUs (GPGPU)

It is common to use a general purpose graphics processing unit (GPGPU) as a modified form of stream processor (or a vector processor), running compute kernels. This turns the massive computational power of a modern graphics accelerator's shader pipeline into general-purpose computing power. In certain applications requiring massive vector operations, this can yield several orders of magnitude higher performance than a conventional CPU. The two largest discrete (see "Dedicated graphics processing unit" above) GPU designers, AMD and Nvidia, are pursuing this approach with an array of applications. Both Nvidia and AMD teamed with Stanford University to create a GPU-based client for the Folding@home distributed computing project for protein folding calculations. In certain circumstances, the GPU calculates forty times faster than the CPUs traditionally used by such applications.^[91]^[92]

GPU-based high performance computers play a significant role in large-scale modelling. Three of the ten most powerful supercomputers in the world take advantage of GPU acceleration.^[93]

Since 2005 there has been interest in using the performance offered by GPUs for evolutionary computation in general, and for accelerating the fitness evaluation in genetic programming in particular. Most approaches compile linear or tree programs on the host PC and transfer the executable to the GPU to be run. Typically a performance advantage is only obtained by running the single active program simultaneously on many example problems in parallel, using the GPU's SIMD architecture.^[94] However, substantial acceleration can also be obtained by not compiling the programs, and instead transferring them to the GPU, to be interpreted there.^[95]

External GPU (eGPU)

Therefore, it is desirable to attach a GPU to some external bus of a notebook. PCI Express is the only bus used for this purpose. The port may be, for example, an ExpressCard or mPCIe port (PCIe ×1, up to 5 or 2.5 Gbit/s respectively), a Thunderbolt 1, 2, or 3 port (PCIe ×4, up to 10, 20, or 40 Gbit/s respectively), a USB4 port with Thunderbolt compatibility, or an OCuLink port. Those ports are only available on certain notebook systems.^[96] eGPU enclosures include their own power supply (PSU), because powerful GPUs can consume hundreds of watts.^[97]

Remove ads

Energy efficiency

Summarize

Perspective

Graphics processing units (GPU) have continued to increase in energy usage, while CPUs designers have recently^[when?] focused on improving performance per watt. High performance GPUs may draw large amount of power, therefore intelligent techniques are required to manage GPU power consumption. Measures like 3DMark2006 score per watt can help identify more efficient GPUs.^[98] However that may not adequately incorporate efficiency in typical use, where much time is spent doing less demanding tasks.^[99]

With modern GPUs, energy usage is an important constraint on the maximum computational capabilities that can be achieved. GPU designs are usually highly scalable, allowing the manufacturer to put multiple chips on the same video card, or to use multiple video cards that work in parallel. Peak performance of any system is essentially limited by the amount of power it can draw and the amount of heat it can dissipate. Consequently, performance per watt of a GPU design translates directly into peak performance of a system that uses that design.

Since GPUs may also be used for some general purpose computation, sometimes their performance is measured in terms also applied to CPUs, such as FLOPS per watt.

Remove ads

Sales

In 2013, 438.3 million GPUs were shipped globally and the forecast for 2014 was 414.2 million. However, by the third quarter of 2022, shipments of PC GPUs totaled around 75.5 million units, down 19% year-over-year.^[100]^{[needs update]}^[101]

References

Loading content...

Sources

Loading content...

External links

Loading content...

Loading related searches...

Wikiwand - on

Seamless Wikipedia browsing. On steroids.

Remove ads

Graphics processing unit

History

1970s

1980s

1990s

2000s

2010s

2020s

GPU companies

Computational functions

2D graphics APIs

GPU forms

Terminology

Dedicated graphics processing unit

Integrated graphics processing unit

Stream processing and general purpose GPUs (GPGPU)

External GPU (eGPU)

Energy efficiency

Sales

See also

Hardware

APIs

Applications

References

Sources

External links

Wikiwand - on