Buy the ARP T-Shirt! BIOS Optimization Guide Money Savers!
 

 05 August 2008
 Intel
  http://www.intel....
 Technology Report
 Dr. Adrian Wong
 1.0
 Discuss here !
 59016
 
   
Desktop Graphics Card Comparison Guide Rev. 33.0
Covering 628 desktop graphics cards, this comprehensive comparison allows you ... Read here
BIOS Option Of The Week - Virtualization Technology
Since 1999, we have been developing the BIOS Optimization Guide, affectionately known... Read here
   
Buy The BOG Book Subscribe To The BOG! Latest Money Savers!
The Intel Larrabee Processor Tech Report
Digg! Reddit!Add to Reddit | Bookmark this article:

Inside The Larrabee

The Larrabee is based on the Intel Pentium's short, in-order pipeline. This is a simpler pipeline design but Larrabee masks the higher latency by using many cores. In addition, each core is multi-threaded and thus, capable of handling up to four execution threads. Each core also comes with a wide vector processing unit and the entire processor has a few fixed-function units designed to handle graphics and other applications.

The multiple processing cores share a large L2 cache, which are partitioned between the cores. This allows cache data to be replicated or shared amongst the processing cores. The cores communicate with each other and other components in the processor using a bi-directional 1024 bits-wide network ring (512-bits in each direction). The ring topology is especially well-suited to handling communications between the many cores and units in the processor.

 

The Basic Core Design

Each core will have a 256 KB L2 cache for its own use, as well as an L1 instruction and data cache of unspecified size. This is what its x86 core block diagram (left) and the new 16-lanes wide vector processing unit block diagram (right) looks like.


Larrabee x86 Core Block Diagram


Larrabee Vector Unit Block Diagram

The vector unit is capable of processing sixteen 32-bit operations per second. Here are the key features of the vector unit :

  • Scatter/gather for vector load/store.
  • Mask registers select lanes to write, which allows parallel flow control of the data. This allows the vector unit to map a separate execution kernel to each VPU lane.
  • Fast read from L1 cache.
  • Numeric type conversion and data replication while reading from memory.
  • Rearranging the lanes on register read.
  • Fused multiply-add (three arguments).
  • Int32, Float32 and Float64 data.

 

Support Tech ARP!

If you like our work, you can help support out work by visiting our sponsors, participate in the Tech ARP Forums, or even donate to our fund. Any help you can render is greatly appreciated!

Page

Topic

1

A Paradigm Shift?
Larrabee's Key Features

2

A Convergence Of CPU + GPU?
Why Many Cores?

3

Inside The Larrabee
The Basic Core Design

4

The Larrabee Texture Sampler
How Does It Differ From A GPU?

5

How Larrabee Renders Graphics
Performance Scalability

6

The Larrabee Binning Renderer

7

Examples Of 3D Features Supported By Larrabee

8

Conclusion



 
   
Hands On With The AMD Radeon R9 Fury X, R9 Fury X2 & R9 Nano
How Fast Is The 512 GB PCIe X4 SSD In The 2015 MacBook Pro?
Western Digital Scorpio Black 500 GB Hard Disk Drive Review
Samsung SGH-F330 Mobile Phone Overview
Intel 45nm Core 2 Desktop Processor Pre-Launch Update Rev. 2.2
The CPU & Heatsink Lapping Guide
Microsoft Windows Vista Beta 2 Review
ASUS V6000V Notebook Overview
Maxxing The Mobility Radeon 9700 Guide
ASUS AX800Pro X800 Pro Graphics Card Review

 


Copyright © Tech ARP.com. All rights reserved.