Jump to content

Teraflops Research Chip

From Wikipedia, the free encyclopedia
Teraflops Research Chip
General information
Launched2006
Designed byIntel Tera-Scale Computing Research Program
Performance
Max.CPUclock rate5.67 GHz
Data width38-bit
Architecture and classification
Instruction set96-bitVLIW
Physical specifications
Transistors
  • 100,000,000
Cores
  • 80
Socket
  • custom 1248-pin LGA (343 signal pins)
History
SuccessorXeon Phi

Intel Teraflops Research Chip(codenamedPolaris) is a researchmanycore processorcontaining 80cores,using anetwork-on-chiparchitecture, developed byIntel'sTera-ScaleComputing Research Program.[1]It was manufactured using a 65 nmCMOSprocess with eight layers ofcopper interconnectand contains 100 milliontransistorson a 275 mm2die.[2][3][4]Its design goal was to demonstrate a modular architecture capable of a sustained performance of 1.0TFLOPSwhile dissipating less than 100 W.[3]Research from the project was later incorporated intoXeon Phi.The technical lead of the project was Sriram R. Vangal.[4]

The processor was initially presented at theIntel Developer Forumon September 26, 2006[5]and officially announced on February 11, 2007.[6]A working chip was presented at the 2007IEEEInternational Solid-State Circuits Conference,alongside technical specifications.[2]

Architecture

[edit]

The chip consists of a 10x8 2Dmesh networkof cores and nominally operates at 4 GHz.[nb 1]Each core, called atile(3 mm2), contains a processing engine and a 5-portwormhole-switchedrouter (0.34 mm2) withmesochronousinterfaces, with a bandwidth of 80 GB/s and latency of 1.25 ns at 4 GHz.[2]The processing engine in each tile contains two independent, 9-stagepipeline,single-precision floating-pointmultiplyaccumulator (FPMAC) units, 3 KB of single-cycle instruction memory and 2 KB of data memory.[3]Each FPMAC unit is capable of performing 2 single-precision floating-point operations percycle.Each tile has thus an estimated peak performance of 16 GFLOPS at the standard configuration of 4 GHz. A 96-bitvery long instruction word(VLIW) encodes up to eight operations per cycle.[3]The custom instruction set includes instructions to send and receive packets into/from the chip's network and well as instructions for sleeping and waking a particular tile.[4]Underneath each tile, a 256 KBSRAMmodule (codenamedFreya) was3D stacked,thus bringing memory nearer to the processor to increase overall memory bandwidth to 1 TB/s, at the expense of higher cost, thermal stress and latency, and a small total capacity of 20 MB.[7]The network of Polaris was shown to have a bisection bandwidth of 1.6 Tbit/s at 3.16 GHz and 2.92 Tbit/s at 5.67 GHz.[8]

Teraflops Research Chip's tile diagram.

Other prominent features of the Teraflops Research chip include its fine-grained power management with 21 independent sleep regions on a tile and dynamic tile sleep, and very high energy efficiency with 27 GFLOPS/W theoretical peak at 0.6 V and 19.4 GFLOPS/W actual for stencil at 0.75 V.[4][9]

Instruction types and their latency[4]
Instruction type Latency (cycles)
FPMAC 9
LOAD/STORE 2
SEND/RECEIVE 2
JUMP/BRANCH 1
STALL/WFD ?
SLEEP/WAKE 6
Application performance of Teraflops Research Chip[nb 2][4]
Application count Active tiles
Stencil 358K 1.00 73.3% 80
SGEMM:

Matrix multiplication

2.63M 0.51 37.5% 80
Spreadsheet 64.2K 0.45 33.2% 80
2DFFT 196K 0.02 2.73% 64
Experimental results of the Teraflops Research Chip[nb 3]
[nb 4] [nb 5] Power[nb 6] Source
0.60 V 1.0 GHz 0.32 TFLOPS 11 W 110 °C [2]
0.675 V 1.0 GHz 0.32 TFLOPS 15.6 W 80 °C [4]
0.70 V 1.5 GHz 0.48 TFLOPS 25 W 110 °C [2]
0.70 V 1.35 GHz 0.43TFLOPS 18W 80 °C [4]
0.75 V 1.6 GHz 0.51TFLOPS 21W 80 °C [4]
0.80 V 2.1 GHz 0.67 TFLOPS 42 W 110 °C [2]
0.80 V 2.0 GHz 0.64TFLOPS 26 W 80 °C [4]
0.85 V 2.4 GHz 0.77TFLOPS 32W 80 °C [4]
0.90 V 2.6 GHz 0.83 TFLOPS 70 W 110 °C [2]
0.90 V 2.85 GHz 0.91TFLOPS 45W 80 °C [4]
0.95 V 3.16 GHz 1.0 TFLOPS 62 W 80 °C [4]
1.00 V 3.13 GHz 1.0 TFLOPS 98 W 110 °C [2]
1.00 V 3.8 GHz 1.22TFLOPS 78 W 80 °C [4]
1.05 V 4.2 GHz 1.34TFLOPS 82W 80 °C [4]
1.10 V 3.5 GHz 1.12 TFLOPS 135 W 110 °C [2]
1.10 V 4.5 GHz 1.44TFLOPS 105W 80 °C [4]
1.15 V 4.8 GHz 1.54TFLOPS 128W 80 °C [4]
1.20 V 4.0 GHz 1.28 TFLOPS 181 W 110 °C [2]
1.20 V 5.1 GHz 1.63 TFLOPS 152 W 80 °C [4]
1.25 V 5.3 GHz 1.70TFLOPS 165W 80 °C [4]
1.30 V 4.4 GHz 1.39 TFLOPS ? 110 °C [2]
1.30 V 5.5 GHz 1.76TFLOPS 210W 80 °C [4]
1.35 V 5.67 GHz 1.81 TFLOPS 230 W 80 °C [4]
1.40 V 4.8 GHz 1.52 TFLOPS ? 110 °C [2]

Issues

[edit]

Intel aimed to help software development for the new exotic architecture by creating a newprogramming model,especially for the chip, calledCt.The model never gained the following Intel hoped for and has been eventually incorporated intoIntel Array Building Blocks,a now defunct C++ library.

See also

[edit]

Notes

[edit]
  1. ^Though the chip was later shown by Intel to run as high as 5.67 GHz.
  2. ^At 1.07 V and 4.27 GHz.
  3. ^All measurements present performance with all 80 cores active.
  4. ^Substantially higher frequencies at the same voltages (compared to the initial ISSCC report) were attained in 2008 with use of a custom cooling solution.
  5. ^Values in italic were extrapolated by,where the maximal frequency was manually extracted from plots and are thus only approximate in their nature.
  6. ^Values in italic were manually extracted from plots and are thus only approximate in their nature.

References

[edit]
  1. ^Intel Corporation."Teraflops Research Chip".Archived fromthe originalon July 22, 2010.
  2. ^abcdefghijklVangal, Sriram; Howard, Jason; Ruhl, Gregory; Dighe, Saurabh; Wilson, Howard; Tschanz, James; Finan, David; Iyer, Priya; Singh, Arvind; Jacob, Tiju; Jain, Shailendra (2007)."An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS".2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.pp. 98–589.doi:10.1109/ISSCC.2007.373606.ISBN978-1-4244-0852-8.S2CID20065641.
  3. ^abcdPeh, Li-Shiuan; Keckler, Stephen W.; Vangal, Sriram (2009), Keckler, Stephen W.; Olukotun, Kunle; Hofstee, H. Peter (eds.),"On-Chip Networks for Multicore Systems",Multicore Processors and Systems,Springer US, pp. 35–71,Bibcode:2009mps..book...35P,doi:10.1007/978-1-4419-0263-4_2,ISBN978-1-4419-0262-7,retrieved2020-05-14
  4. ^abcdefghijklmnopqrstuVangal, S.R.; Howard, J.; Ruhl, G.; Dighe, S.; Wilson, H.; Tschanz, J.; Finan, D.; Singh, A.; Jacob, T.; Jain, S.; Erraguntla, V. (2008)."An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS".IEEE Journal of Solid-State Circuits.43(1): 29–41.Bibcode:2008IJSSC..43...29V.doi:10.1109/JSSC.2007.910957.ISSN0018-9200.S2CID15672087.
  5. ^"Intel Develops Tera-Scale Research Chips".Intel News Release.2006.
  6. ^Intel Corporation(February 11, 2007)."Intel Research Advances 'Era Of Tera'".Intel Press Room.Archived fromthe originalon April 13, 2009.
  7. ^Bautista, Jerry (2008).Tera-scale computing and interconnect challenges - 3D stacking considerations.2008 IEEE Hot Chips 20 Symposium (HCS). Stanford, CA, USA: IEEE. pp. 1–34.doi:10.1109/HOTCHIPS.2008.7476514.ISBN978-1-4673-8871-9.S2CID26400101.
  8. ^Intel's Teraflops Research Chip(PDF).Intel Corporation.2007. Archived fromthe original(PDF)on February 18, 2020.
  9. ^Fossum, Tryggve (2007).High End MPSOC - The Personal Super Computer(PDF).MPSoC Conference 2007. p. 6.