Teraflops Research Chip

Intel Teraflops Research Chip (codenamed Polaris) is a research manycore processor containing 80 cores, using a network-on-chip architecture, developed by Intel's Tera-Scale Computing Research Program.^[1] It was manufactured using a 65 nm CMOS process with eight layers of copper interconnect and contains 100 million transistors on a 275 mm² die.^[2]^[3]^[4] Its design goal was to demonstrate a modular architecture capable of a sustained performance of 1.0 TFLOPS while dissipating less than 100 W.^[3] Research from the project was later incorporated into Xeon Phi. The technical lead of the project was Sriram R. Vangal.^[4]

Quick facts General information, Launched ...

Teraflops Research Chip
General information
Launched	2006
Designed by	Intel Tera-Scale Computing Research Program
Performance
Max. CPU clock rate	5.67 GHz
Data width	38-bit
Architecture and classification
Instruction set	96-bit VLIW
Physical specifications
Transistors	100,000,000
Cores	80
Socket	custom 1248-pin LGA (343 signal pins)
History
Successor	Xeon Phi

The processor was initially presented at the Intel Developer Forum on September 26, 2006^[5] and officially announced on February 11, 2007.^[6] A working chip was presented at the 2007 IEEE International Solid-State Circuits Conference, alongside technical specifications.^[2]

The chip consists of a 10x8 2D mesh network of cores and nominally operates at 4 GHz.^{[nb 1]} Each core, called a tile (3 mm²), contains a processing engine and a 5-port wormhole-switched router (0.34 mm²) with mesochronous interfaces, with a bandwidth of 80 GB/s and latency of 1.25 ns at 4 GHz.^[2] The processing engine in each tile contains two independent, 9-stage pipeline, single-precision floating-point multiplyaccumulator (FPMAC) units, 3 KB of single-cycle instruction memory and 2 KB of data memory.^[3] Each FPMAC unit is capable of performing 2 single-precision floating-point operations per cycle. Each tile has thus an estimated peak performance of 16 GFLOPS at the standard configuration of 4 GHz. A 96-bit very long instruction word (VLIW) encodes up to eight operations per cycle.^[3] The custom instruction set includes instructions to send and receive packets into/from the chip's network and well as instructions for sleeping and waking a particular tile.^[4] Underneath each tile, a 256 KB SRAM module (codenamed Freya) was 3D stacked, thus bringing memory nearer to the processor to increase overall memory bandwidth to 1 TB/s, at the expense of higher cost, thermal stress and latency, and a small total capacity of 20 MB.^[7] The network of Polaris was shown to have a bisection bandwidth of 1.6 Tbit/s at 3.16 GHz and 2.92 Tbit/s at 5.67 GHz.^[8]

Other prominent features of the Teraflops Research chip include its fine-grained power management with 21 independent sleep regions on a tile and dynamic tile sleep, and very high energy efficiency with 27 GFLOPS/W theoretical peak at 0.6 V and 19.4 GFLOPS/W actual for stencil at 0.75 V.^[4]^[9]

More information Instruction type, Latency (cycles) ...

Instruction types and their latency^[4]
Instruction type	Latency (cycles)
FPMAC	9
LOAD/STORE	2
SEND/RECEIVE	2
JUMP/BRANCH	1
STALL/WFD	?
SLEEP/WAKE	6

More information

...

Application performance of Teraflops Research Chip^{[nb 2]}^[4]
Application	$FLOP$ count	${\text{TFLOPS}}_{avg}$	$\%{\text{TFLOPS}}_{peak}$	Active tiles
Stencil	358K	1.00	73.3%	80
SGEMM: Matrix multiplication	2.63M	0.51	37.5%	80
Spreadsheet	64.2K	0.45	33.2%	80
2D FFT	196K	0.02	2.73%	64

More information

...

Experimental results of the Teraflops Research Chip^{[nb 3]}
$V_{CC}$	$f_{max}$ ^{[nb 4]}	${\text{TFLOPS}}_{peak}$ ^{[nb 5]}	Power^{[nb 6]}	$T$	Source
0.60 V	1.0 GHz	0.32 TFLOPS	11 W	110 °C	^[2]
0.675 V	1.0 GHz	0.32 TFLOPS	15.6 W	80 °C	^[4]
0.70 V	1.5 GHz	0.48 TFLOPS	25 W	110 °C	^[2]
0.70 V	1.35 GHz	0.43 TFLOPS	18 W	80 °C	^[4]
0.75 V	1.6 GHz	0.51 TFLOPS	21 W	80 °C	^[4]
0.80 V	2.1 GHz	0.67 TFLOPS	42 W	110 °C	^[2]
0.80 V	2.0 GHz	0.64 TFLOPS	26 W	80 °C	^[4]
0.85 V	2.4 GHz	0.77 TFLOPS	32 W	80 °C	^[4]
0.90 V	2.6 GHz	0.83 TFLOPS	70 W	110 °C	^[2]
0.90 V	2.85 GHz	0.91 TFLOPS	45 W	80 °C	^[4]
0.95 V	3.16 GHz	1.0 TFLOPS	62 W	80 °C	^[4]
1.00 V	3.13 GHz	1.0 TFLOPS	98 W	110 °C	^[2]
1.00 V	3.8 GHz	1.22 TFLOPS	78 W	80 °C	^[4]
1.05 V	4.2 GHz	1.34 TFLOPS	82 W	80 °C	^[4]
1.10 V	3.5 GHz	1.12 TFLOPS	135 W	110 °C	^[2]
1.10 V	4.5 GHz	1.44 TFLOPS	105 W	80 °C	^[4]
1.15 V	4.8 GHz	1.54 TFLOPS	128 W	80 °C	^[4]
1.20 V	4.0 GHz	1.28 TFLOPS	181 W	110 °C	^[2]
1.20 V	5.1 GHz	1.63 TFLOPS	152 W	80 °C	^[4]
1.25 V	5.3 GHz	1.70 TFLOPS	165 W	80 °C	^[4]
1.30 V	4.4 GHz	1.39 TFLOPS	?	110 °C	^[2]
1.30 V	5.5 GHz	1.76 TFLOPS	210 W	80 °C	^[4]
1.35 V	5.67 GHz	1.81 TFLOPS	230 W	80 °C	^[4]
1.40 V	4.8 GHz	1.52 TFLOPS	?	110 °C	^[2]

Teraflops Research Chip

Architecture

Issues

See also

Notes

References

Wikiwand - on