F16C
This article includes a list of generalreferences,butit lacks sufficient correspondinginline citations.(December 2013) |
TheF16C[1](previously/informally known asCVT16) instruction set is anx86instruction set architectureextension which provides support for converting betweenhalf-precisionand standard IEEEsingle-precision floating-point formats.
History
[edit]The CVT16 instruction set, announced byAMDon May 1, 2009,[2]is an extension to the 128-bitSSEcore instructions in thex86andAMD64instruction set.
CVT16 is a revision of part of theSSE5instruction set proposal announced on August 30, 2007, which is supplemented by theXOPandFMA4instruction sets. This revision makes the binary coding of the proposed new instructions more compatible withIntel'sAVXinstruction extensions, while the functionality of the instructions is unchanged.
In recent documents, the name F16C is formally used in bothIntelandAMDx86-64architecture specifications.
Technical information
[edit]There are variants that convert four floating-point values in anXMM registeror 8 floating-point values in aYMM register.
The instructions are abbreviations for "vector convert packed half to packed single" and vice versa:
VCVTPH2PS xmmreg,xmmrm64
– convert four half-precision floating point values in memory or the bottom half of an XMM register to four single-precision floating-point values in an XMM register.VCVTPH2PS ymmreg,xmmrm128
– convert eight half-precision floating point values in memory or an XMM register (the bottom half of a YMM register) to eight single-precision floating-point values in a YMM register.VCVTPS2PH xmmrm64,xmmreg,imm8
– convert four single-precision floating point values in an XMM register to half-precision floating-point values in memory or the bottom half an XMM register.VCVTPS2PH xmmrm128,ymmreg,imm8
– convert eight single-precision floating point values in a YMM register to half-precision floating-point values in memory or an XMM register.
The 8-bit immediate argument toVCVTPS2PH
selects theroundingmode. Values 0–4 select nearest, down, up, truncate, and the mode set inMXCSR.RC
.
Support for these instructions is indicated by bit 29 of ECX afterCPUID with EAX=1.
CPUs with F16C
[edit]- AMD:
- Jaguar-basedprocessors
- Puma-basedprocessors
- "Heavy Equipment" processors
- Bulldozer-basedprocessors, Q4 2011[3]
- Piledriver-basedprocessors, Q4 2012[4]
- Steamroller-basedprocessors, Q1 2014
- Excavator-basedprocessors, Q2 2015
- Zen-basedprocessors, Q1 2017, and newer
- Intel:
- Ivy Bridgeprocessors and newer
References
[edit]- ^Chuck Walbourn (September 11, 2012)."DirectXMath: F16C and FMA".
- ^"128-Bit and 256-Bit XOP, FMA4 and CVT16 Instructions"(PDF).AMD64 Architecture Programmer's Manual.Vol. 6. 2009-05-01. Archived fromthe original(PDF)on 2009-05-20.Retrieved2022-07-05.
- ^Dave Christie (2009-05-07),Striking a balance,AMD Developer blogs, archived fromthe originalon 2013-11-09,retrieved2012-01-17
- ^New "Bulldozer" and "Piledriver" Instructions(PDF),AMD, October 2012
External links
[edit]- New Bulldozer and Piledriver Instructions[1]Archived2013-01-07 at theWayback Machine
- DirectX math F16C and FMA[2]
- AMD64 Architecture Programmer's Manual Volume 1[3]Archived2013-12-14 at theWayback Machine
- AMD64 Architecture Programmer's Manual Volume 2[4]
- AMD64 Architecture Programmer's Manual Volume 3[5]Archived2013-12-14 at theWayback Machine
- AMD64 Architecture Programmer's Manual Volume 4[6]Archived2021-11-14 at theWayback Machine
- AMD64 Architecture Programmer's Manual Volume 5[7]Archived2013-12-14 at theWayback Machine
- IA32 Architectures Software Developer Manual[8]