Tensor Processing Unit

(Redirected fromTensor processing unit)

Tensor Processing Unit(TPU) is anAI acceleratorapplication-specific integrated circuit(ASIC) developed byGoogleforneural networkmachine learning,using Google's ownTensorFlowsoftware.[2]Google began using TPUs internally in 2015, and in 2018 made them available forthird-partyuse, both as part of its cloud infrastructure and by offering a smaller version of the chip for sale.

Tensor Processing Unit
Tensor Processing Unit 3.0
DesignerGoogle
Introduced2015[1]
TypeNeural network
Machine learning

Comparison to CPUs and GPUs

edit

Compared to agraphics processing unit,TPUs are designed for a high volume of lowprecisioncomputation (e.g. as little as8-bitprecision)[3]with more input/output operations perjoule,without hardware for rasterisation/texture mapping.[4]The TPUASICsare mounted in a heatsink assembly, which can fit in a hard drive slot within a data centerrack,according toNorman Jouppi.[5]

Different types of processors are suited for different types of machine learning models. TPUs are well suited forCNNs,while GPUs have benefits for some fully-connected neural networks, and CPUs can have advantages forRNNs.[6]

History

edit

The tensor processing unit was announced in May 2016 atGoogle I/O,when the company said that the TPU had already been used insidetheir data centersfor over a year.[5][4]Google's 2017 paper describing its creation cites previous systolic matrix multipliers of similar architecture built in the 1990s.[7]The chip has been specifically designed for Google'sTensorFlowframework, a symbolic math library which is used formachine learningapplications such asneural networks.[8]However, as of 2017 Google still usedCPUsandGPUsfor other types ofmachine learning.[5]OtherAI acceleratordesigns are appearing from other vendors also and are aimed atembeddedandroboticsmarkets.

Google's TPUs are proprietary. Some models are commercially available, and on February 12, 2018,The New York Timesreported that Google "would allow other companies to buy access to those chips through its cloud-computing service."[9]Google has said that they were used in theAlphaGo versus Lee Sedolseries of human-versus-machineGogames,[4]as well as in theAlphaZerosystem, which producedChess,Shogiand Go playing programs from the game rules alone and went on to beat the leading programs in those games.[10]Google has also used TPUs forGoogle Street Viewtext processing and was able to find all the text in the Street View database in less than five days. InGoogle Photos,an individual TPU can process over 100 million photos a day.[5]It is also used inRankBrainwhich Google uses to provide search results.[11]

Google provides third parties access to TPUs through itsCloud TPUservice as part of theGoogle Cloud Platform[12]and through itsnotebook-basedservicesKaggleandColaboratory.[13][14]

Products

edit
Tensor Processing Unit products[15][16][17]
TPUv1 TPUv2 TPUv3 TPUv4[16][18] TPUv5e[19] TPUv5p[20][21] Trillium[22]
Date introduced 2015 2017 2018 2021 2023 2023 2024
Process node 28 nm 16 nm 16 nm 7 nm Unstated Unstated
Diesize (mm2) 331 < 625 < 700 < 400 300-350 Unstated
On-chip memory (MiB) 28 32 32 32 48 112
Clock speed (MHz) 700 700 940 1050 Unstated 1750
Memory 8 GiBDDR3 16 GiBHBM 32 GiB HBM 32 GiB HBM 16 GB HBM 95 GB HBM 32 GB?
Memory bandwidth 34 GB/s 600 GB/s 900 GB/s 1200 GB/s 819 GB/s 2765 GB/s ~1.6 TB/s?
TDP(W) 75 280 220 170 Not Listed Not Listed
TOPS (Tera Operations Per Second) 23 45 123 275 197 (bf16) 393 (int8) 459 (bf16) 918 (int8)
TOPS/W 0.31 0.16 0.56 1.62 Not Listed Not Listed

First generation TPU

edit

The first-generation TPU is an8-bitmatrix multiplicationengine, driven withCISC instructionsby the host processor across aPCIe 3.0bus. It is manufactured on a 28nmprocess with a die size ≤ 331mm2.Theclock speedis 700MHzand it has athermal design powerof 28–40W.It has 28MiBof on chip memory, and 4MiBof32-bitaccumulatorstaking the results of a 256×256systolic arrayof 8-bitmultipliers.[7]Within the TPU package is 8GiBofdual-channel2133 MHzDDR3 SDRAMoffering 34 GB/s of bandwidth.[17]Instructions transfer data to or from the host, perform matrix multiplications orconvolutions,and applyactivation functions.[7]

Second generation TPU

edit

The second-generation TPU was announced in May 2017.[23]Google stated the first-generation TPU design was limited bymemory bandwidthand using 16GBofHigh Bandwidth Memoryin the second-generation design increased bandwidth to 600 GB/s and performance to 45 teraFLOPS.[17]The TPUs are then arranged into four-chip modules with a performance of 180 teraFLOPS.[23]Then 64 of these modules are assembled into 256-chip pods with 11.5 petaFLOPS of performance.[23]Notably, while the first-generation TPUs were limited to integers, the second-generation TPUs can also calculate infloating point,introducing thebfloat16format invented byGoogle Brain.This makes the second-generation TPUs useful for both training and inference of machine learning models. Google has stated these second-generation TPUs will be available on theGoogle Compute Enginefor use in TensorFlow applications.[24]

Third generation TPU

edit

The third-generation TPU was announced on May 8, 2018.[25]Google announced that processors themselves are twice as powerful as the second-generation TPUs, and would be deployed in pods with four times as many chips as the preceding generation.[26][27]This results in an 8-fold increase in performance per pod (with up to 1,024 chips per pod) compared to the second-generation TPU deployment.

Fourth generation TPU

edit

On May 18, 2021, Google CEO Sundar Pichai spoke about TPU v4 Tensor Processing Units during his keynote at the Google I/O virtual conference. TPU v4 improved performance by more than 2x over TPU v3 chips. Pichai said "A single v4 pod contains 4,096 v4 chips, and each pod has 10x the interconnect bandwidth per chip at scale, compared to any other networking technology.”[28]An April 2023 paper by Google claims TPU v4 is 5-87% faster than an NvidiaA100at machine learningbenchmarks.[29]

There is also an "inference" version, called v4i,[30]that does not requireliquid cooling.[31]

Fifth generation TPU

edit

In 2021, Google revealed the physical layout of TPU v5 is being designed with the assistance of a novel application ofdeep reinforcement learning.[32]Google claims TPU v5 is nearly twice as fast as TPU v4,[33]and based on that and the relative performance of TPU v4 over A100, some speculate TPU v5 as being as fast as or faster than anH100.[34]

Similar to the v4i being a lighter-weight version of the v4, the fifth generation has a "cost-efficient"[35]version called v5e.[19]In December 2023, Google announced TPU v5p which is claimed to be competitive with the H100.[36]

Sixth generation TPU

edit

In May 2024, at theGoogle I/Oconference, Google announced TPU v6 which will be available later in 2024. Google claimed a 4.7 times performance increase relative to TPU v5e,[37]via larger matrix multiplication units and an increased clock speed. High bandwidth memory (HBM) capacity and bandwidth have also doubled. A pod can contain up to 256 Trillium units.[38]

Edge TPU

edit

In July 2018, Google announced the Edge TPU. The Edge TPU is Google's purpose-builtASICchip designed to run machine learning (ML) models foredge computing,meaning it is much smaller and consumes far less power compared to the TPUs hosted in Google datacenters (also known as Cloud TPUs[39]). In January 2019, Google made the Edge TPU available to developers with a line of products under theCoralbrand. The Edge TPU is capable of 4 trillion operations per second with 2 W of electrical power.[40]

The product offerings include asingle-board computer(SBC), asystem on module(SoM), aUSBaccessory, a miniPCI-ecard, and anM.2card. TheSBCCoral Dev Board and Coral SoM both run Mendel Linux OS – a derivative ofDebian.[41][42]The USB, PCI-e, and M.2 products function as add-ons to existing computer systems, and support Debian-based Linux systems on x86-64 and ARM64 hosts (includingRaspberry Pi).

The machine learning runtime used to execute models on the Edge TPU is based onTensorFlow Lite.[43]The Edge TPU is only capable of accelerating forward-pass operations, which means it's primarily useful for performing inferences (although it is possible to perform lightweight transfer learning on the Edge TPU[44]). The Edge TPU also only supports 8-bit math, meaning that for a network to be compatible with the Edge TPU, it needs to either be trained using the TensorFlow quantization-aware training technique, or since late 2019 it's also possible to use post-training quantization.

On November 12, 2019,Asusannounced a pair ofsingle-board computer (SBCs)featuring the Edge TPU. TheAsus Tinker Edge T and Tinker Edge R Boarddesigned forIoTandedgeAI.The SBCs officially supportAndroidandDebianoperating systems.[45][46]ASUS has also demonstrated a mini PC called Asus PN60T featuring the Edge TPU.[47]

On January 2, 2020, Google announced the Coral Accelerator Module and Coral Dev Board Mini, to be demonstrated atCES 2020later the same month. The Coral Accelerator Module is amulti-chip modulefeaturing the Edge TPU, PCIe and USB interfaces for easier integration. The Coral Dev Board Mini is a smallerSBCfeaturing the Coral Accelerator Module andMediaTek 8167s SoC.[48][49]

Pixel Neural Core

edit

On October 15, 2019, Google announced thePixel 4smartphone, which contains an Edge TPU called thePixel Neural Core.Google describe it as "customized to meet the requirements of key camera features in Pixel 4", using a neural network search that sacrifices some accuracy in favor of minimizing latency and power use.[50]

Google Tensor

edit

Google followed the Pixel Neural Core by integrating an Edge TPU into a customsystem-on-chipnamedGoogle Tensor,which was released in 2021 with thePixel 6line of smartphones.[51]The Google Tensor SoC demonstrated "extremely large performance advantages over the competition" in machine learning-focused benchmarks; although instantaneous power consumption also was relatively high, the improved performance meant less energy was consumed due to shorter periods requiring peak performance.[52]

Lawsuit

edit

In 2019, Singular Computing, founded in 2009 by Joseph Bates, avisiting professoratMIT,[53]filed suit against Google allegingpatent infringementin TPU chips.[54]By 2020, Google had successfully lowered the number of claims the court would consider to just two: claim 53 ofUS 8407273filed in 2012 and claim 7 ofUS 9218156filed in 2013, both of which claim adynamic rangeof 10-6to 106for floating point numbers, which the standardfloat16cannot do (without resorting tosubnormal numbers) as it only has five bits for the exponent. In a 2023 court filing, Singular Computing specifically called out Google's use ofbfloat16,as that exceeds the dynamic range offloat16.[55]Singular claims non-standard floating point formats werenon-obviousin 2009, but Google retorts that the VFLOAT[56]format, with configurable number of exponent bits, existed asprior artin 2002.[57]By January 2024, subsequent lawsuits by Singular had brought the number of patents being litigated up to eight. Towards the end of the trial later that month, Google agreed to a settlement with undisclosed terms.[58][59]

See also

edit

References

edit
  1. ^Jouppi et al, 2017, "In-Datacenter Performance Analysis of a Tensor Processing Unit",https://arxiv.org/abs/1704.04760
  2. ^"Cloud Tensor Processing Units (TPUs)".Google Cloud.Retrieved20 July2020.
  3. ^Armasu, Lucian (2016-05-19)."Google's Big Chip Unveil For Machine Learning: Tensor Processing Unit With 10x Better Efficiency (Updated)".Tom's Hardware.Retrieved2016-06-26.
  4. ^abcJouppi, Norm (May 18, 2016)."Google supercharges machine learning tasks with TPU custom chip".Google Cloud Platform Blog.Retrieved2017-01-22.
  5. ^abcd"Google's Tensor Processing Unit explained: this is what the future of computing looks like".TechRadar.Retrieved2017-01-19.
  6. ^Wang, Yu Emma; Wei, Gu-Yeon; Brooks, David (2019-07-01). "Benchmarking TPU, GPU, and CPU Platforms for Deep Learning".arXiv:1907.10701[cs.LG].
  7. ^abc Jouppi, Norman P.; Young, Cliff; Patil, Nishant; Patterson, David; Agrawal, Gaurav; Bajwa, Raminder; Bates, Sarah; Bhatia, Suresh; Boden, Nan; Borchers, Al; Boyle, Rick; Cantin, Pierre-luc; Chao, Clifford; Clark, Chris; Coriell, Jeremy; Daley, Mike; Dau, Matt; Dean, Jeffrey; Gelb, Ben; Ghaemmaghami, Tara Vazir; Gottipati, Rajendra; Gulland, William; Hagmann, Robert; Ho, C. Richard; Hogberg, Doug; Hu, John; Hundt, Robert; Hurt, Dan; Ibarz, Julian; Jaffey, Aaron; Jaworski, Alek; Kaplan, Alexander; Khaitan, Harshit; Koch, Andy; Kumar, Naveen; Lacy, Steve; Laudon, James; Law, James; Le, Diemthu; Leary, Chris; Liu, Zhuyuan; Lucke, Kyle; Lundin, Alan; MacKean, Gordon; Maggiore, Adriana; Mahony, Maire; Miller, Kieran; Nagarajan, Rahul; Narayanaswami, Ravi; Ni, Ray; Nix, Kathy; Norrie, Thomas; Omernick, Mark; Penukonda, Narayana; Phelps, Andy; Ross, Jonathan; Ross, Matt; Salek, Amir; Samadiani, Emad; Severn, Chris; Sizikov, Gregory; Snelham, Matthew; Souter, Jed; Steinberg, Dan; Swing, Andy; Tan, Mercedes; Thorson, Gregory; Tian, Bo; Toma, Horia; Tuttle, Erick; Vasudevan, Vijay; Walter, Richard; Wang, Walter; Wilcox, Eric; Yoon, Doe Hyun (June 26, 2017).In-Datacenter Performance Analysis of a Tensor Processing Unit™.Toronto, Canada.arXiv:1704.04760.
  8. ^"TensorFlow: Open source machine learning""It is machine learning software being used for various kinds of perceptual and language understanding tasks" — Jeffrey Dean, minute 0:47 / 2:17 from Youtube clip
  9. ^Metz, Cade (12 February 2018)."Google Makes Its Special A.I. Chips Available to Others".The New York Times.Retrieved2018-02-12.
  10. ^McGourty, Colin (6 December 2017)."DeepMind's AlphaZero crushes chess".chess24.com.
  11. ^"Google's Tensor Processing Unit could advance Moore's Law 7 years into the future".PCWorld.Retrieved2017-01-19.
  12. ^"Frequently Asked Questions | Cloud TPU".Google Cloud.Retrieved2021-01-14.
  13. ^"Google Colaboratory".colab.research.google.com.Retrieved2021-05-15.
  14. ^"Use TPUs | TensorFlow Core".TensorFlow.Retrieved2021-05-15.
  15. ^Jouppi, Norman P.; Yoon, Doe Hyun; Ashcraft, Matthew; Gottscho, Mark (June 14, 2021).Ten lessons from three generations that shaped Google's TPUv4i(PDF).International Symposium on Computer Architecture. Valencia, Spain.doi:10.1109/ISCA52012.2021.00010.ISBN978-1-4503-9086-6.
  16. ^ab"System Architecture | Cloud TPU".Google Cloud.Retrieved2022-12-11.
  17. ^abcKennedy, Patrick (22 August 2017)."Case Study on the Google TPU and GDDR5 from Hot Chips 29".Serve The Home.Retrieved23 August2017.
  18. ^Stay tuned, more information on TPU v4 is coming soon,retrieved 2020-08-06.
  19. ^abCloud TPU v5e Inference Public Preview,retrieved 2023-11-06.
  20. ^Cloud TPU v5pGoogle Cloud.retrieved 2024-04-09
  21. ^Cloud TPU v5p Training,retrieved 2024-04-09.
  22. ^"Introducing Trillium, sixth-generation TPUs".Google Cloud Blog.Retrieved2024-05-29.
  23. ^abcBright, Peter (17 May 2017)."Google brings 45 teraflops tensor flow processors to its compute cloud".Ars Technica.Retrieved30 May2017.
  24. ^Kennedy, Patrick (17 May 2017)."Google Cloud TPU Details Revealed".Serve The Home.Retrieved30 May2017.
  25. ^Frumusanu, Andre (8 May 2018)."Google I/O Opening Keynote Live-Blog".Retrieved9 May2018.
  26. ^Feldman, Michael (11 May 2018)."Google Offers Glimpse of Third-Generation TPU Processor".Top 500.Retrieved14 May2018.
  27. ^Teich, Paul (10 May 2018)."Tearing Apart Google's TPU 3.0 AI Coprocessor".The Next Platform.Retrieved14 May2018.
  28. ^"Google Launches TPU v4 AI Chips".www.hpcwire.com.20 May 2021.RetrievedJune 7,2021.
  29. ^Jouppi, Norman(2023-04-20). "TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings".arXiv:2304.01433[cs.AR].
  30. ^Kennedy, Patrick (2023-08-29)."Google Details TPUv4 and its Crazy Optically Reconfigurable AI Network".servethehome.com.Retrieved2023-12-16.
  31. ^"Why did Google develop its own TPU chip? In-depth disclosure of team members".censtry.com.2021-10-20.Retrieved2023-12-16.
  32. ^Mirhoseini, Azalia; Goldie, Anna (2021-06-01)."A graph placement methodology for fast chip design"(PDF).Nature.594(7962): 207–212.doi:10.1038/s41586-022-04657-6.PMID35361999.S2CID247855593.Retrieved2023-06-04.
  33. ^Vahdat, Amin (2023-12-06)."Enabling next-generation AI workloads: Announcing TPU v5p and AI Hypercomputer".Retrieved2024-04-08.
  34. ^Afifi-Sabet, Keumars (2023-12-23)."Google is rapidly turning into a formidable opponent to BFF Nvidia — the TPU v5p AI chip powering its hypercomputer is faster and has more memory and bandwidth than ever before, beating even the mighty H100".TechRadar.Retrieved2024-04-08.
  35. ^"Expanding our AI-optimized infrastructure portfolio: Introducing Cloud TPU v5e and announcing A3 GA".2023-08-29.Retrieved2023-12-16.
  36. ^"Enabling next-generation AI workloads: Announcing TPU v5p and AI Hypercomputer".2023-12-06.Retrieved2024-04-09.
  37. ^Velasco, Alan (2024-05-15)."Google Cloud Unveils Trillium, Its 6th-Gen TPU With A 4.7X AI Performance Leap".HotHardware.Retrieved2024-05-15.
  38. ^"Introducing Trillium, sixth-generation TPUs".Google Cloud Blog.Retrieved2024-05-17.
  39. ^"Cloud TPU".Google Cloud.Retrieved2021-05-21.
  40. ^"Edge TPU performance benchmarks".Coral.Retrieved2020-01-04.
  41. ^"Dev Board".Coral.Retrieved2021-05-21.
  42. ^"System-on-Module (SoM)".Coral.Retrieved2021-05-21.
  43. ^"Bringing intelligence to the edge with Cloud IoT".Google Blog.2018-07-25.Retrieved2018-07-25.
  44. ^"Retrain an image classification model on-device".Coral.Retrieved2019-05-03.
  45. ^"Tổ 込み tổng hợp kỹ thuật triển &IoT tổng hợp kỹ thuật triển “ET & IoT Technology 2019” に xuất triển することを phát biểu ".Asus.com(in Japanese).Retrieved2019-11-13.
  46. ^Shilov, Anton."ASUS & Google Team Up for 'Tinker Board' AI-Focused Credit-Card Sized Computers".Anandtech.com.Retrieved2019-11-13.
  47. ^Aufranc, Jean-Luc (2019-05-29)."ASUS Tinker Edge T & CR1S-CM-A SBC to Feature Google Coral Edge TPU & NXP i.MX 8M Processor".CNX Software - Embedded Systems News.Retrieved2019-11-14.
  48. ^"New Coral products for 2020".Google Developers Blog.Retrieved2020-01-04.
  49. ^"Accelerator Module".Coral.Retrieved2020-01-04.
  50. ^"Introducing the Next Generation of On-Device Vision Models: MobileNetV3 and MobileNetEdgeTPU".Google AI Blog.Retrieved2020-04-16.
  51. ^Gupta, Suyog; White, Marie (November 8, 2021)."Improved On-Device ML on Pixel 6, with Neural Architecture Search".Google AI Blog.Retrieved16 December2022.
  52. ^Frumusanu, Andrei (November 2, 2021)."Google's Tensor inside of Pixel 6, Pixel 6 Pro: A Look into Performance & Efficiency | Google's IP: Tensor TPU/NPU".AnandTech.Retrieved16 December2022.
  53. ^Hardesty, Larry (2011-01-03)."The surprising usefulness of sloppy arithmetic".MIT.Retrieved2024-01-10.
  54. ^Bray, Hiawatha (2024-01-10)."Local inventor challenges Google in billion-dollar patent fight".Boston Globe.Boston.Archived fromthe originalon 2024-01-10.Retrieved2024-01-10.
  55. ^"SINGULAR COMPUTING LLC, Plaintiff, v. GOOGLE LLC, Defendant: Amended Complaint for Patent Infringement"(PDF).rpxcorp.com.RPX Corporation.2020-03-20.Retrieved2024-01-10.
  56. ^Wang, Xiaojun; Leeser, Miriam (2010-09-01)."VFloat: A Variable Precision Fixed- and Floating-Point Library for Reconfigurable Hardware".ACM Transactions on Reconfigurable Technology and Systems.3(3): 1–34.doi:10.1145/1839480.1839486.Retrieved2024-01-10.
  57. ^"Singular Computing LLC v. Google LLC".casetext.com.2023-04-06.Retrieved2024-01-10.
  58. ^Calkins, Laurel Brubaker (January 24, 2024)."Google Settles AI-Chip Suit That Had Sought Over $5 Billion".Bloomberg Law.
  59. ^Brittain, Blake; Raymond, Ray (January 24, 2024)."Google settles AI-related chip patent lawsuit that sought $1.67 bln".Reuters.


edit