Direct3D 11
Windows 7 introduces the next generation Direct3D 11 API, which is a strict superset of Direct3D 10. The Direct3D 11 API enables Windows 7 to take advantage of DirectX 11 hardware. Direct3D 11 adds features to the existing DirectX 10 (and 10.1) pipeline to improve 3‑D performance and support data-parallel computing on the GPU.
The Direct3D 11 API significantly advances graphics technology in two important areas :
- Compute Shader for data-parallel computing
- 3D graphics
In addition, it includes numerous other improvements that are based on extensive feedback from third-party hardware and software vendors. Let's take a look at the details on the advancs in Direct3D 11.
Compute Shader
As the processing power of the GPU has increased, it has become a viable processor for not only games but also for general computing applications. DirectX 11 introduces a new Compute Shader API that enables the GPU to be used for data-parallel computing.
Data-parallel programming is a way to target parallel processors with code that scales to any number of processor cores. Direct3D is based on this programming model, but until Direct3D 11, various restrictions limited developer options and flexibility. The Compute Shader in DirectX 11 enables a broader set of algorithms to make use of the graphics processor's parallel computing power.
A high-end GPU today can deliver about 4 teraflops of processing power for graphics applications, and one of those teraflops provides full 32-bit single-precision floating-point math that is generally useful. In addition, graphics processors often include a dedicated memory subsystem that has roughly 10 times the bandwidth of the CPU’s interface to main memory. Many applications beyond graphics have already achieved substantial performance improvements by using the higher processing power and the greater memory bandwidth of the GPU.
The field of general purpose computation on GPUs (GPGPU) has shown that algorithms for fast Fourier transforms (FFT), sort, and linear algebra; to name a few; can all attain higher performance levels by using Direct3D than by running on the CPU. Additional graphics-related applications that are not 3D, such as image and media processing and composition, are common uses for Direct3D 9 and Direct3D 10.
The DirectX 11 Compute Shader provides a general mechanism for applications access the computing power and bandwidth of SIMD cores such as those used in graphics processors. By using this flexible technology, many more applications can use the data-parallel processor capability of the GPU.
The Compute Shader was designed to meet the following requirements :
-
A simpler API than Direct3D. The compute shader does not require an application to set up all the parameters and state that are required for 3-D rendering. To send work to the GPU in previous DirectX versions, algorithms that are totally unrelated to graphics had to draw rectangles. With the compute shader, an application can launch a thread by explicit request.
-
Explicit separation between data-parallel code and serial code. Data-parallel code is usually run on SIMD processors and is typically used for substantially different algorithms than serial code. The application developer can completely control the programming model that each code segment uses.
-
A single consistent programming model that spans hardware implementations.
-
Automatic conversion of data types for read and write operations. Media applications often use data types that are smaller than 32-bit floating point values. Explicit conversion means that applications can avoid unnecessarily wasting bandwidth on 32-bit.
-
Interactive display of computation results, ideally updated at monitor refresh rate. Therefore, the computational work must be tightly integrated with the graphics tasks so that the total time constraint of 10-20ms is achieved (resulting in 50-100 frames per second). The DirectX compute shader is tightly integrated with Direct3D for this reason. This integration includes both the ability for compute shaders to read and write the array and surface objects typically used by DirectX, and the ability for the graphics pipeline to use scattered writes to update the more general data structures that compute-oriented algorithms rely on.
The compute shader supports runtime compilation and runtime data binding to further increase flexibility, polymorphism, and opportunities for optimization by specialization.
Some algorithms are difficult to implement because Direct3D is pure data-parallel with no shared memory constructs or atomic operations. Because the compute shader is more general and contains the core data parallel constructs for shared memory accesses with atomic operations, Direct3D 11 can execute a broader set of algorithms than previous versions, and some of these might be faster than the pure data-parallel versions.
Support Tech ARP!
If you like our work, you can help support out work by visiting our sponsors, participate in the Tech ARP Forums, or even donate to our fund. Any help you can render is greatly appreciated!
<<< Remote Rendering, Direct2D API : Previous Page | Next Page : New DirectX 1 Features For The Compute Shader >>>