ThePentium FDIV bugis ahardware bugaffecting thefloating-point unit(FPU) of theearly Intel Pentium processors.Because of the bug, the processor would return incorrect binaryfloating pointresults when dividing certain pairs ofhigh-precisionnumbers. The bug was discovered in 1994 by Thomas R. Nicely, a professor of mathematics atLynchburg College.[1]Missing values in a lookup table used by the FPU's floating-point division algorithm led to calculations acquiring small errors. While these errors would in most use-cases only occur rarely and result in small deviations from the correct output values, in certain circumstances the errors can occur frequently and lead to more significant deviations.[2]

66 MHz Intel Pentium (sSpec=SX837) with the FDIV bug

The severity of the FDIV bug is debated. Though rarely encountered by most users (Bytemagazine estimated that 1 in 9 billion floating point divides with random parameters would produce inaccurate results),[3]both the flaw and Intel's initial handling of the matter were heavily criticized by the tech community.

In December 1994, Intelrecalledthe defective processors in what was the first full recall of a computer chip.[4]In its 1994 annual report, Intel said it incurred "a $475 million pre-tax charge... to recover replacement and write-off of these microprocessors."[5]

Description

edit

In order to improve the speed of floating-point division calculations on the Pentium chip over the486DX,Intel opted to replace the shift-and-subtract division algorithm with theSweeney, Robertson, and Tocher(SRT) algorithm. The SRT algorithm can generate two bits of the division result perclock cycle,whereas the 486's algorithm could only generate one. It is implemented using aprogrammable logic arraywith 2,048 cells[citation needed],of which 1,066 cells should have been populated with one of five values:−2, −1, 0, +1, +2.When the original array for the Pentium was compiled, five values were not correctly sent to the equipment that etches the arrays into the chips[citation needed]– thus five of the array cells contained zero when they should have contained +2.[6]

As a result, calculations that rely on these five cells acquire errors; these errors can accumulate repeatedly owing to therecursivenature of the SRT algorithm. In pathological cases the error can reach the fourth significant digit of the result, although this is rare. The error is usually confined to the ninth or tenth significant digit.[3]

Only certain combinations of numerator and denominator trigger the bug. One commonly-reported example is dividing 4,195,835 by 3,145,727. Performing this calculation in any software that used the floating-point coprocessor, such asWindows Calculator,would allow users to discover whether their Pentium chip was affected.[7]

The correct value of the calculation is:

When converted to the hexadecimal value used by the processor, 4,195,835 = 0x4005FB and 3,145,727 = 0x2FFFFF. The "5" in 0x4005FB triggers the access to the "empty" array cells. As a result, the value returned by a flawed Pentium processor is incorrect at or beyond four digits:[8]

which is actually the value of.

Discovery and response

edit

Thomas Nicely, a professor of mathematics at Lynchburg College, had written code to enumerateprimes,twin primes,prime triplets,andprime quadruplets.Nicely noticed some inconsistencies in the calculations on June 13, 1994, shortly after adding a Pentium system to his group of computers, but was unable to eliminate other factors (such as programming errors,motherboardchipsets, etc.) until October 19, 1994.[1]On October 24, 1994, he reported the issue to Intel.[9]Intel had reportedly become aware of the issue independently by June 1994, and had begun fixing it at this point, but chose not to publicly disclose any details or recall affected CPUs.[10]

On October 30, 1994, Nicely sent an email describing the bug to various academic contacts, requesting reports of testing for the flaw on486-DX4s,Pentiums andPentium clones.[9]The bug was quickly verified by others, and news of it spread quickly on theInternet.The bug acquired the name "Pentium FDIV bug" from thex86 assembly language mnemonicfor floating-point division, the most frequently used instruction affected.[9]

The story first appeared in the press on November 7, 1994, in an article inElectronic Engineering Times,"Intel fixes a Pentium FPU glitch" by Alexander Wolfe,[11]and was subsequently picked up byCNNin a segment aired on November 22. It was also reported on by theNew York Timesand theBoston Globe,making the front page in the latter.[10][12]

At this point, Intel acknowledged the floating-point flaw, but claimed that it was not serious and would not affect most users. Intel offered to replace processors to users who could prove that they were affected. However, although most independent estimates found that the bug would have a very limited impact on most users, it caused significant negative press for the company. During a 2019 talk, while reflecting on development ofQuake,John Romerodescribed how frequently and persistently this bug could be reproduced byMichael Abrash.Abrash spent hours tracking down exact conditions needed to produce the bug, which would result in parts of a game level appearing unexpectedly when viewed from certain camera angles.[13]IBMpaused the sale of PCs containing Intel CPUs, and Intel's stock price decreased significantly.[14]The motive behind IBM's decision was questioned by some in the industry; IBM produced thePowerPCCPUs at the time, and potentially stood to benefit from any reputational damage to the Pentium or Intel as a company. However, the decision led to corporate buyers of PC equipment demanding replacements of existing Pentium CPUs, and soon afterwards other PC manufacturers began offering "no questions asked" replacements of flawed Pentium chips.[4]

The growing dissatisfaction with Intel's response led to the company offering to replace all flawed Pentium processors on request on December 20.[15]On January 17, 1995, Intel announced a pre-tax charge of $475 million against earnings, ostensibly the total cost associated with replacement of the flawed processors.[9]This is equivalent to $868 million in 2023.[16]Intel was criticised for barring resellers and OEMs from participating in the recall program, requiring end-users to replace chips themselves. Intel's justification for this, posted on its support web page, was that "it is the individual decision of the end user to determine if the flaw is affecting their application accuracy".[14]

A 1995 article inSciencedescribes the value of number theory problems in discovering computer bugs and gives the mathematical background and history ofBrun's constant,the problem Nicely was working on when he discovered the bug.[17]

Intel's response to the FDIV bug has been cited as a case of thepublic relationsimpact of a problem eclipsing the practical impact of said problem on customers.[18]While most users were unlikely to encounter the flaw in their day-to-day computing, the company's initial reaction to not replace chips unless customers could guarantee they were affected caused pushback from a vocal minority of industry experts. The subsequent publicity generated shook consumer confidence in the CPUs, and led to a demand for action even from people unlikely to be affected by the issue.Andy Grove,Intel's CEO at the time was quoted inThe Wall Street Journalas saying "I think the kernel of the issue we missed... was that we presumed to tell somebody what they should or shouldn't worry about, or should or shouldn't do".[4]

In the aftermath of the bug and subsequent recall, there was a marked increase in the use offormal verificationof hardware floating point operations across the semiconductor industry. Prompted by the discovery of the bug, a technique applicable to the SRT algorithm called "word-level model checking" was developed in 1996.[19]Intel went on to use formal verification extensively in the development of later CPU architectures. In the development of thePentium 4,symbolic trajectory evaluationand theorem proving were used to find a number of bugs that could have led to a similar recall incident had they gone undetected.[20]The first Intel microarchitecture to use formal verification as the primary method of validation wasNehalem,developed in 2008.[21]

Affected models

edit

The FDIV bug affects the 60 and 66 MHz Pentium P5 800 instepping levelsprior to D1, and the 75, 90, and 100 MHz Pentium P54C 600 in steppings prior to B5. The 120 MHz P54C and P54CQS CPUs are unaffected.[22][23]

Software patches

edit

Varioussoftware patcheswere produced by manufacturers to work around the bug. One specific algorithm, outlined in a paper inIEEE Computational Science & Engineering,is to check for divisors that can trigger the access to the programmable logic array cells that erroneously contain zero, and if found, multiply both numerator and denominator by 15/16. This takes them out of the 'buggy' range. This fix does carry a measurable speed penalty - worst case for a program doing nothing but FDIV operations with bad divisors the running time would double since each FDIV would take about 80 instead of 40 clock cycles. With more random divisors the average time per FDIV was approximately 50 clock cycles, i.e. 10 cycles added to check the divisor: Only 5 out of 1024 random divisors would trigger the scaling fixup. Since FDIV is a rare operation in most programs, the normal slowdown with the fix installed was typically a percent or less.[8]

The main challenge faced by software companies was implementing the fix in pre-existing software, much of which relied onlibrariesoutside their control. Some companies, such asWolfram Research,opted to directly patch themachine codeof existing executables to replace the FDIV opcode with an illegal instruction. This would then trigger an exception that an exception handler (also patched in) would catch. From here, arbitrary code could be executed to work around the bug.[2]

Microsoft offered operating system level workarounds in versions ofWindowsup to Windows XP. Utilities were included with the operating system to check for the presence of the bug and disable the FPU if found.[24][25]

See also

edit

References

edit
  1. ^abEdelman, Alan (January 1, 1997)."The Mathematics of the Pentium Division Bug"(PDF).SIAM Review.39(1): 54–67.Bibcode:1997SIAMR..39...54E.doi:10.1137/S0036144595293959.Archived(PDF)from the original on August 14, 2024.RetrievedApril 11,2021.
  2. ^ab"'A Discussion of and Fix for the Pentium FDIV Bug' from the Notebook Archive (2002) ".notebookarchive.org.Wolfram Research, Inc.RetrievedApril 11,2021.
  3. ^abTom R. Halfhill (March 1995)."An error in a lookup table created the infamous bug in Intel's latest processor".BYTE.No. March 1995. Archived fromthe originalon February 9, 2006.RetrievedDecember 19,2006.
  4. ^abcCarlton, Jim; Yoder, Stephen K. (December 21, 1994). "Computers: Humble Pie: Intel to Replace its Pentium Chips".The Wall Street Journal(Eastern ed.). p. B1.
  5. ^"1994 - Annual Report".Intel. June 20, 2020.Archivedfrom the original on February 26, 2017.RetrievedJune 20,2020.
  6. ^Sharangpani, H. P.; Barton, M. L. (November 30, 1994).Statistical Analysis of Floating Point Flaw in the Pentium Processor (1994)(PDF)(Report). Intel Corporation. Archived fromthe original(PDF)on March 19, 2022.RetrievedApril 11,2021.
  7. ^"Pentium FDIV bug – a Picture".Kansas University Institute for Policy and Social Research. November 30, 1994. Archived fromthe originalon November 3, 2021.RetrievedNovember 3,2010.
  8. ^abCoe, T.; Mathisen, T.; Moler, C.; Pratt, V. (1995)."Computational aspects of the Pentium affair"(PDF).IEEE Computational Science and Engineering.2(1): 18–30.doi:10.1109/99.372929.Archived(PDF)from the original on June 23, 2021.RetrievedApril 13,2021.
  9. ^abcdNicely, Thomas (August 19, 2011)."Pentium FDIV flaw FAQ".trnicely.net.Archived fromthe originalon June 18, 2019.RetrievedJune 18,2019.
  10. ^abMarkoff, John (November 24, 1994)."COMPANY NEWS; Flaw Undermines Accuracy of Pentium Chips".The New York Times.Archivedfrom the original on August 14, 2024.RetrievedApril 11,2021.
  11. ^Alexander Wolfe (November 9, 1994)."Intel fixes a Pentium FPU glitch".Electronic Engineering Times.Archivedfrom the original on December 18, 2010.RetrievedJanuary 19,2011.
  12. ^Moler, Cleve (Winter 1995)."A Tale of Two Numbers"(PDF).MATLAB News and Notes.MathWorks.Archived(PDF)from the original on August 14, 2024.RetrievedApril 21,2021.
  13. ^"BTD12: The Programming Principles of Id Software".TNG Technology Consulting GmbH. August 6, 2019.Archivedfrom the original on August 25, 2023.RetrievedJuly 17,2023.
  14. ^abYeraswork, Zewde (March 30, 2011)."Lessons Learned: Pentium Flaws Aid Intel In Sandy Bridge Chipset Recall".CRN.Archivedfrom the original on August 14, 2024.RetrievedApril 11,2021.
  15. ^"Intel adopts upon-request replacement policy on Pentium processors with floating point flaw; Will take Q4 charge against earnings".Business Wire. December 20, 1994. Archived fromthe originalon July 10, 2012.RetrievedDecember 24,2006.
  16. ^Johnston, Louis; Williamson, Samuel H. (2023)."What Was the U.S. GDP Then?".MeasuringWorth.RetrievedNovember 30,2023.United StatesGross Domestic Product deflatorfigures follow theMeasuringWorthseries.
  17. ^Cipra, Barry Arthur(January 13, 1995). "How number theory got the best of the Pentium chip".Science.267(5195): 175.Bibcode:1995Sci...267..175C.doi:10.1126/science.267.5195.175.PMID17791336.S2CID19898103.
  18. ^Price, D. (April 1995). "Pentium FDIV flaw-lessons learned".IEEE Micro.15(2): 86–88.doi:10.1109/40.372360.
  19. ^Clarke, E. M.; Khaira, M.; Zhao, X. (1996)."Word level model checking---avoiding the Pentium FDIV error".Proceedings of the 33rd annual conference on Design automation conference - DAC '96.pp. 645–648.doi:10.1145/240518.240640.ISBN0897917790.S2CID2500033.Archivedfrom the original on April 29, 2021.RetrievedApril 29,2021.
  20. ^O'Leary, J. (2004)."Formal verification in intel cpu design".Proceedings. Second ACM and IEEE International Conference on Formal Methods and Models for Co-Design, 2004. MEMOCODE '04.p. 152.doi:10.1109/MEMCOD.2004.1459841.ISBN0-7803-8509-8.Archivedfrom the original on April 29, 2021.RetrievedApril 29,2021.
  21. ^Kaivola, Roope; Ghughal, Rajnish; Narasimhan, Naren; Telfer, Amber; Whittemore, Jesse; Pandav, Sudhindra; Slobodová, Anna; Taylor, Christopher; Frolov, Vladimir; Reeber, Erik; Naik, Armaghan (2009)."Replacing Testing with Formal Verification in Intel® Core™ i7 Processor Execution Engine Validation".Computer Aided Verification.5643:414–429.doi:10.1007/978-3-642-02658-4_32.
  22. ^"P5 (586) Fifth-Generation Processors | Microprocessor Types and Specifications | InformIT".www.informit.com.June 8, 2001.Archivedfrom the original on April 13, 2021.RetrievedApril 13,2021.
  23. ^"FDIV Replacement Program: Frequently asked questions".Intel.March 20, 2009. Solution ID CS-012748. Archived fromthe originalon May 11, 2009.RetrievedNovember 10,2009.
  24. ^Slob, Arie."Windows 95 Troubleshooting: How to Check for a Faulty Math Coprocessor".www.helpwithwindows.com.Archivedfrom the original on August 14, 2024.RetrievedApril 23,2019.
  25. ^"Pentnt".Microsoft TechNet.Microsoft.September 11, 2009.Archivedfrom the original on February 3, 2018.RetrievedApril 23,2019.
edit