Teraflops Research Chip
General information | |
---|---|
Launched | 2006 |
Designed by | Intel Tera-Scale Computing Research Program |
Performance | |
Max.CPUclock rate | 5.67 GHz |
Data width | 38-bit |
Architecture and classification | |
Instruction set | 96-bitVLIW |
Physical specifications | |
Transistors |
|
Cores |
|
Socket |
|
History | |
Successor | Xeon Phi |
Intel Teraflops Research Chip(codenamedPolaris) is a researchmanycore processorcontaining 80cores,using anetwork-on-chiparchitecture, developed byIntel'sTera-ScaleComputing Research Program.[1]It was manufactured using a 65 nmCMOSprocess with eight layers ofcopper interconnectand contains 100 milliontransistorson a 275 mm2die.[2][3][4]Its design goal was to demonstrate a modular architecture capable of a sustained performance of 1.0TFLOPSwhile dissipating less than 100 W.[3]Research from the project was later incorporated intoXeon Phi.The technical lead of the project was Sriram R. Vangal.[4]
The processor was initially presented at theIntel Developer Forumon September 26, 2006[5]and officially announced on February 11, 2007.[6]A working chip was presented at the 2007IEEEInternational Solid-State Circuits Conference,alongside technical specifications.[2]
Architecture
[edit]The chip consists of a 10x8 2Dmesh networkof cores and nominally operates at 4 GHz.[nb 1]Each core, called atile(3 mm2), contains a processing engine and a 5-portwormhole-switchedrouter (0.34 mm2) withmesochronousinterfaces, with a bandwidth of 80 GB/s and latency of 1.25 ns at 4 GHz.[2]The processing engine in each tile contains two independent, 9-stagepipeline,single-precision floating-pointmultiplyaccumulator (FPMAC) units, 3 KB of single-cycle instruction memory and 2 KB of data memory.[3]Each FPMAC unit is capable of performing 2 single-precision floating-point operations percycle.Each tile has thus an estimated peak performance of 16 GFLOPS at the standard configuration of 4 GHz. A 96-bitvery long instruction word(VLIW) encodes up to eight operations per cycle.[3]The custom instruction set includes instructions to send and receive packets into/from the chip's network and well as instructions for sleeping and waking a particular tile.[4]Underneath each tile, a 256 KBSRAMmodule (codenamedFreya) was3D stacked,thus bringing memory nearer to the processor to increase overall memory bandwidth to 1 TB/s, at the expense of higher cost, thermal stress and latency, and a small total capacity of 20 MB.[7]The network of Polaris was shown to have a bisection bandwidth of 1.6 Tbit/s at 3.16 GHz and 2.92 Tbit/s at 5.67 GHz.[8]
Other prominent features of the Teraflops Research chip include its fine-grained power management with 21 independent sleep regions on a tile and dynamic tile sleep, and very high energy efficiency with 27 GFLOPS/W theoretical peak at 0.6 V and 19.4 GFLOPS/W actual for stencil at 0.75 V.[4][9]
Instruction type | Latency (cycles) |
---|---|
FPMAC | 9 |
LOAD/STORE | 2 |
SEND/RECEIVE | 2 |
JUMP/BRANCH | 1 |
STALL/WFD | ? |
SLEEP/WAKE | 6 |
Application | count | Active tiles | ||
---|---|---|---|---|
Stencil | 358K | 1.00 | 73.3% | 80 |
SGEMM: | 2.63M | 0.51 | 37.5% | 80 |
Spreadsheet | 64.2K | 0.45 | 33.2% | 80 |
2DFFT | 196K | 0.02 | 2.73% | 64 |
[nb 4] | [nb 5] | Power[nb 6] | Source | ||
---|---|---|---|---|---|
0.60 V | 1.0 GHz | 0.32 TFLOPS | 11 W | 110 °C | [2] |
0.675 V | 1.0 GHz | 0.32 TFLOPS | 15.6 W | 80 °C | [4] |
0.70 V | 1.5 GHz | 0.48 TFLOPS | 25 W | 110 °C | [2] |
0.70 V | 1.35 GHz | 0.43TFLOPS | 18W | 80 °C | [4] |
0.75 V | 1.6 GHz | 0.51TFLOPS | 21W | 80 °C | [4] |
0.80 V | 2.1 GHz | 0.67 TFLOPS | 42 W | 110 °C | [2] |
0.80 V | 2.0 GHz | 0.64TFLOPS | 26 W | 80 °C | [4] |
0.85 V | 2.4 GHz | 0.77TFLOPS | 32W | 80 °C | [4] |
0.90 V | 2.6 GHz | 0.83 TFLOPS | 70 W | 110 °C | [2] |
0.90 V | 2.85 GHz | 0.91TFLOPS | 45W | 80 °C | [4] |
0.95 V | 3.16 GHz | 1.0 TFLOPS | 62 W | 80 °C | [4] |
1.00 V | 3.13 GHz | 1.0 TFLOPS | 98 W | 110 °C | [2] |
1.00 V | 3.8 GHz | 1.22TFLOPS | 78 W | 80 °C | [4] |
1.05 V | 4.2 GHz | 1.34TFLOPS | 82W | 80 °C | [4] |
1.10 V | 3.5 GHz | 1.12 TFLOPS | 135 W | 110 °C | [2] |
1.10 V | 4.5 GHz | 1.44TFLOPS | 105W | 80 °C | [4] |
1.15 V | 4.8 GHz | 1.54TFLOPS | 128W | 80 °C | [4] |
1.20 V | 4.0 GHz | 1.28 TFLOPS | 181 W | 110 °C | [2] |
1.20 V | 5.1 GHz | 1.63 TFLOPS | 152 W | 80 °C | [4] |
1.25 V | 5.3 GHz | 1.70TFLOPS | 165W | 80 °C | [4] |
1.30 V | 4.4 GHz | 1.39 TFLOPS | ? | 110 °C | [2] |
1.30 V | 5.5 GHz | 1.76TFLOPS | 210W | 80 °C | [4] |
1.35 V | 5.67 GHz | 1.81 TFLOPS | 230 W | 80 °C | [4] |
1.40 V | 4.8 GHz | 1.52 TFLOPS | ? | 110 °C | [2] |
Issues
[edit]Intel aimed to help software development for the new exotic architecture by creating a newprogramming model,especially for the chip, calledCt.The model never gained the following Intel hoped for and has been eventually incorporated intoIntel Array Building Blocks,a now defunct C++ library.
See also
[edit]Notes
[edit]- ^Though the chip was later shown by Intel to run as high as 5.67 GHz.
- ^At 1.07 V and 4.27 GHz.
- ^All measurements present performance with all 80 cores active.
- ^Substantially higher frequencies at the same voltages (compared to the initial ISSCC report) were attained in 2008 with use of a custom cooling solution.
- ^Values in italic were extrapolated by,where the maximal frequency was manually extracted from plots and are thus only approximate in their nature.
- ^Values in italic were manually extracted from plots and are thus only approximate in their nature.
References
[edit]- ^Intel Corporation."Teraflops Research Chip".Archived fromthe originalon July 22, 2010.
- ^abcdefghijklVangal, Sriram; Howard, Jason; Ruhl, Gregory; Dighe, Saurabh; Wilson, Howard; Tschanz, James; Finan, David; Iyer, Priya; Singh, Arvind; Jacob, Tiju; Jain, Shailendra (2007)."An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS".2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.pp. 98–589.doi:10.1109/ISSCC.2007.373606.ISBN978-1-4244-0852-8.S2CID20065641.
- ^abcdPeh, Li-Shiuan; Keckler, Stephen W.; Vangal, Sriram (2009), Keckler, Stephen W.; Olukotun, Kunle; Hofstee, H. Peter (eds.),"On-Chip Networks for Multicore Systems",Multicore Processors and Systems,Springer US, pp. 35–71,Bibcode:2009mps..book...35P,doi:10.1007/978-1-4419-0263-4_2,ISBN978-1-4419-0262-7,retrieved2020-05-14
- ^abcdefghijklmnopqrstuVangal, S.R.; Howard, J.; Ruhl, G.; Dighe, S.; Wilson, H.; Tschanz, J.; Finan, D.; Singh, A.; Jacob, T.; Jain, S.; Erraguntla, V. (2008)."An 80-Tile Sub-100-W TeraFLOPS Processor in 65-nm CMOS".IEEE Journal of Solid-State Circuits.43(1): 29–41.Bibcode:2008IJSSC..43...29V.doi:10.1109/JSSC.2007.910957.ISSN0018-9200.S2CID15672087.
- ^"Intel Develops Tera-Scale Research Chips".Intel News Release.2006.
- ^Intel Corporation(February 11, 2007)."Intel Research Advances 'Era Of Tera'".Intel Press Room.Archived fromthe originalon April 13, 2009.
- ^Bautista, Jerry (2008).Tera-scale computing and interconnect challenges - 3D stacking considerations.2008 IEEE Hot Chips 20 Symposium (HCS). Stanford, CA, USA: IEEE. pp. 1–34.doi:10.1109/HOTCHIPS.2008.7476514.ISBN978-1-4673-8871-9.S2CID26400101.
- ^Intel's Teraflops Research Chip(PDF).Intel Corporation.2007. Archived fromthe original(PDF)on February 18, 2020.
- ^Fossum, Tryggve (2007).High End MPSOC - The Personal Super Computer(PDF).MPSoC Conference 2007. p. 6.