Inside The Larrabee
The Larrabee is based on the Intel Pentium's short, in-order pipeline. This is a simpler pipeline design but Larrabee masks the higher latency by using many cores. In addition, each core is multi-threaded and thus, capable of handling up to four execution threads. Each core also comes with a wide vector processing unit and the entire processor has a few fixed-function units designed to handle graphics and other applications.
The multiple processing cores share a large L2 cache, which are partitioned between the cores. This allows cache data to be replicated or shared amongst the processing cores. The cores communicate with each other and other components in the processor using a bi-directional 1024 bits-wide network ring (512-bits in each direction). The ring topology is especially well-suited to handling communications between the many cores and units in the processor.
The Basic Core Design
Each core will have a 256 KB L2 cache for its own use, as well as an L1 instruction and data cache of unspecified size. This is what its x86 core block diagram (left) and the new 16-lanes wide vector processing unit block diagram (right) looks like.
|
|
The vector unit is capable of processing sixteen 32-bit operations per second. Here are the key features of the vector unit :
- Scatter/gather for vector load/store.
- Mask registers select lanes to write, which allows parallel flow control of the data. This allows the vector unit to map a separate execution kernel to each VPU lane.
- Fast read from L1 cache.
- Numeric type conversion and data replication while reading from memory.
- Rearranging the lanes on register read.
- Fused multiply-add (three arguments).
- Int32, Float32 and Float64 data.
Support Tech ARP!
If you like our work, you can help support out work by visiting our sponsors, participate in the Tech ARP Forums, or even donate to our fund. Any help you can render is greatly appreciated!
Page |
Topic |
|
1 |
||
2 |
||
3 |
||
4 |
• The
Larrabee Texture Sampler |
|
5 |
||
6 |
||
7 |
||
8 |
<<< Convergence Of CPU + GPU?, Why Many Cores? : Previous Page | Next Page : The Larrabee Texture Sampler, How Does It Differ From A GPU? >>>