ACOM fileis a type of simpleexecutable file.On theDigital Equipment Corporation(DEC)VAXoperating systemsof the 1970s,.COMwas used as afilename extensionfortext filescontaining commands to be issued to the operating system (similar to abatch file).[1]With the introduction ofDigital Research'sCP/M(amicrocomputeroperating system), the type of files commonly associated with COM extension changed to that of executable files. This convention was later carried over toDOS.Even when complemented by the more generalEXEfile formatfor executables, the compact COM files remained viable and frequently used under DOS.

COM
Filename extension
.COM
Internet media typeapplication/x-dosexec
Type of formatExecutable
Extended toDOS MZ executable
A number of COM files inIBM PC DOS1.0

The.COMfile name extension has no relation to the(for "commercial" ) top-level Internet domain name. However, this similarity in name has been exploited bymalwarewriters.

DOS binary format

edit

The COM format is the original binary executable format used inCP/M(includingSCPandMSX-DOS) as well asDOS.It is very simple; it has no header (with the exception of CP/M 3 files),[2]and contains no standardmetadata,only code and data. This simplicity exacts a price: thebinaryhas a maximum size of 65,280 (FF00h) bytes (256 bytes short of 64 KB) and stores all itscodeanddatain onesegment.

Since it lacksrelocationinformation, it isloadedby the operating system at a pre-set address, at offset 0100h immediately following thePSP,where it is executed (hence the limitation of the executable's size): theentry pointis fixed at 0100h.[nb 1]This was not an issue on 8-bit machines since they can address 64k of memory max, but 16-bit machines have a much larger address space, which is why the format fell out of use.

In theIntel 8080CPU architecture, only 65,536 bytes of memory could be addressed (address range 0000h to FFFFh). Under CP/M, the first 256 bytes of this memory, from 0000h to 00FFh were reserved for system use by thezero page,and any user program had to be loaded at exactly 0100h to be executed.[nb 1]COM files fit this model perfectly. Before the introduction ofMP/MandConcurrent CP/M,there was no possibility of running more than one program or command at a time: the program loaded at 0100h was run, and no other.

Although the file format is the same in DOS and CP/M,.COM files for the two operating systems are not compatible; DOS COM files containx86instructions and possibly DOSsystem calls,while CP/M COM files contain8080instructions and CP/M system calls (programs restricted to certain machines could also contain additional instructions for8085orZ80).

.COM files in DOS set all x86 segment registers to the same value and the SP (stack pointer) register to the offset of the last word available in the first 64 KB segment (typically FFFEh) or the maximum size of memory available in the block the program is loaded into for both, the program plus at least 256 bytes stack, whatever is smaller, thus the stack begins at the very top of the corresponding memory segment and works down from there.[3][4]

In the original DOS 1.xAPI,which was a derivative of the CP/M API, program termination of a.COM file would be performed by calling the INT 20h (Terminate Program) function or else INT 21h Function 0, which served the same purpose, and the programmer also had to ensure that the code and data segment registers contained the same value at program termination to avoid a potential system crash. Although this could be used in any DOS version, Microsoft recommended the use of INT 21h Function 4Ch for program termination from DOS 2.x onward, which did not require the data and code segment to be set to the same value.

It is possible to make a.COM file to run under both operating systems in form of afat binary.There is no true compatibility at the instruction level; the instructions at theentry pointare chosen to be equal in functionality but different in both operating systems, and make program execution jump to the section for the operating system in use. It is basically two different programs with the same functionality in a single file, preceded by code selecting the one to use.

Under CP/M 3, if the first byte of a COM file is C9h, there is a 256-byte header;[2]since C9h corresponds to the8080instructionRET,this means that the COM file will immediately terminate if run on an earlier version of CP/M that does not support this extension. (Because the instruction sets of the 8085 and Z80 are supersets of the 8080 instruction set, this works on all three processors.) C9h is aninvalid opcodeon the 8088/8086, and it will cause a processor-generated interrupt 6 exception inv86 modeon the386and later x86 chips. Since C9h is the opcode for LEAVE since the80188/80186and therefore not used as the first instruction in a valid program, the executable loader in some versions of DOS rejects COM files that start with C9h, avoiding a crash.

Files may have names ending in.COM, but not be in the simple format described above; this is indicated by amagic numberat the start of the file. For example, theCOMMAND.COMfile inDR DOS 6.0is actually inDOS executableformat, indicated by the first two bytes beingMZ(4Dh 5Ah), the initials ofMark Zbikowski.

Large programs

edit

UnderDOSthere is nomemory managementprovided for COM files by theloaderor execution environment. All memory is simply available to the COM file. After execution, the operating system command shell,COMMAND.COM,is reloaded. This leaves the possibilities that the COM file can either be very simple, using a singlesegment,or arbitrarily complex, providing its own memory management system. An example of a complex program is COMMAND.COM, the DOS shell, which provided a loader to load other COM orEXEprograms. In the.COM system, larger programs (up to the available memory size) can be loaded and run, but the system loader assumes that all code and data is in the first segment, and it is up to the.COM program to provide any further organization. Programs larger than available memory, or largedata segments,can be handled bydynamic linking,if the necessary code is included in the.COM program. The advantage of using the.COM rather than.EXE format is that the binary image is usually smaller and easier to program using anassembler.[5]Oncecompilersandlinkersof sufficient power became available, it was no longer advantageous to use the.COM format for complex programs.

Platform support

edit

The format is stillexecutableon many modernWindows NT-basedplatforms,but it is run in anMS-DOS-emulating subsystem,NTVDM,which is not present in64-bitvariants. COM files can be executed also on DOS emulators such asDOSBox,on any platform supported by these emulators.

Use for compatibility reasons

edit

Windows NT-based operating systems use the extension for a small number of commands carried over from MS-DOS days although they are in fact presently implemented as.exefiles. The operating system will recognize the.exe file header and execute them correctly despite their technically incorrect extension. (In fact any.exe file can be renamed and still execute correctly.) The use of the original extensions for these commands ensures compatibility with older DOS batch files that may refer to them with their full original filenames. These commands areCHCP,DISKCOMP,DISKCOPY,FORMAT,MODE,MOREandTREE.[6]

Execution preference

edit

In DOS, if a directory contains both a COM file and anEXEfile with same name, when no extension is specified the COM file is preferentially selected for execution. For example, if a directory in thesystem pathcontains two files namedfooandfoo.exe,the following would executefoo:

C:\>foo

A user wishing to runfoo.execan explicitly use the complete filename:

C:\>foo.exe

Taking advantage of this default behaviour,viruswriters and other malicious programmers have used names likenotepadfor their creations, hoping that if it is placed in the same directory as the corresponding EXE file, a command or batch file may accidentally trigger their program instead of the text editornotepad.exe.Again, these files may in fact contain a.exe format executable.

OnWindows NTand derivatives (Windows 2000,Windows XP,Windows Vista,andWindows 7), thePATHEXTvariable is used to override the order of preference (and acceptable extensions) for calling files without specifying the extension from the command line. The default value still placesfiles before.exefiles. This closely resembles a feature previously found in JP Software's line of extended command line processors4DOS,4OS2,and4NT.

Malicious usage of the extension

edit

Some computer virus writers have hoped to take advantage of modern computer users' likely lack of knowledge of thefile extension and associated binary format, along with their more likely familiarity with theInternet domain name. E-mails have been sent with attachment names similar to "example". UnwaryMicrosoft Windowsusers clicking on such an attachment would expect to begin browsing a site namedhttp:// example /,but instead would run the attached binary command file namedexample,giving it full permission to do to their machine whatever its author had in mind.[citation needed]

There is nothing malicious about the COM file format itself; this is an exploitation of the coincidental name collision betweencommand files andcommercial web sites.

See also

edit

Notes

edit
  1. ^abIn most versions ofCP/M,the start of theTPAwas at offset +100h, only preceded in memory by thezero pageat offset +0h. Some versions differed for hardware reasons including CP/M for theHeath H89,where it started at offset +4300h (for compatibility, a Magnolia Microsystems hardware modification existed to map out the ROMs at +100h after startup), or CP/M for theTRS-80 Model IandTRS-80 Model III,where programs were loaded at offset +0h.

References

edit
  1. ^Christian, Brian; Markson, Tom; Skrenta, Rich (eds.). "Section 5.3".The PDP-11 How-To Book(Revision 1 ed.).Archivedfrom the original on 2018-08-01.Retrieved2018-08-01.(NB. Has a reference for theRT-11operating system running on thePDP-11minicomputer, which shows in section 5.3 that.COM is used to refer to a command file.)
  2. ^abElliott, John C.; Lopushinsky, Jim (2002) [1998-04-11]."CP/M 3 COM file header".Seasip.info.Archived fromthe originalon 2018-08-01.
  3. ^Paul, Matthias R. (2002-10-07) [2000]."Re: Run a COM file".Newsgroup:alt.msdos.programmer.Archivedfrom the original on 2017-09-03.Retrieved2017-09-03.[1](NB. Has details on the DOS COM program calling conventions.)
  4. ^Lunt, Benjamin "Ben" D. (2020)."DOS.COM startup registers".Forever Young Software.Archivedfrom the original on 2020-11-12.Retrieved2021-12-14.
  5. ^Scanlon, Leo J. (1991). "Chapter 2".Assembly Language Subroutines for MS-DOS(2 ed.). Windcrest Books. p. 16.ISBN0-8306-7649-X.
  6. ^"Windows Commands".Microsoft. 2023-04-26.
edit