hn-classics/_stories/2008/14429800.md

37 KiB
Raw Permalink Blame History

created_at title url author points story_text comment_text num_comments story_id story_title story_url parent_id created_at_i _tags objectID year
2017-05-27T12:19:13.000Z What's the difference between the com and exe extensions? (2008) https://blogs.msdn.microsoft.com/oldnewthing/20080324-00/?p=23033 empressplay 246 89 1495887553
story
author_empressplay
story_14429800
14429800 2008

Source

Whats the difference between the COM and EXE extensions? The Old New Thing

Microsoft logo |

__

Search MSDN

Search all blogs

Search this blog

Sign in

The Old New Thing The Old New Thing

Whats the difference between the COM and EXE extensions?

★★★★★

★★★★

★★★

★★

avatar of oldnewthingRaymond Chen - MSFTMarch 24, 200842


Commenter Koro asks why you can rename a COM file to EXE without any apparent ill effects. (James MAstros asked a similar question, though there are additional issues in James' question which I will take up at a later date.)

Initially, the only programs that existed were COM files. The format of a COM file is... um, none. There is no format. A COM file is just a memory image. This "format" was inherited from CP/M. To load a COM file, the program loader merely sucked the file into memory unchanged and then jumped to the first byte. No fixups, no checksum, nothing. Just load and go.

The COM file format had many problems, among which was that programs could not be bigger than about 64KB. To address these limitations, the EXE file format was introduced. The header of an EXE file begins with the magic letters "MZ" and continues with other information that the program loader uses to load the program into memory and prepare it for execution.

And there things lay, with COM files being "raw memory images" and EXE files being "structured", and the distinction was rigidly maintained. If you renamed an EXE file to COM, the operating system would try to execute the header as if it were machine code (which didn't get you very far), and conversely if you renamed a COM file to EXE, the program loader would reject it because the magic MZ header was missing.

So when did the program loader change to ignore the extension entirely and just use the presence or absence of an MZ header to determine what type of program it is? Compatibility, of course.

Over time, programs like FORMAT.COM, EDIT.COM, and even COMMAND.COM grew larger than about 64KB. Under the original rules, that meant that the extension had to be changed to EXE, but doing so introduced a compatibility problem. After all, since the files had been COM files up until then, programs or batch files that wanted to, say, spawn a command interpreter, would try to execute COMMAND.COM. If the command interpreter were renamed to COMMAND.EXE, these programs which hard-coded the program name would stop working since there was no COMMAND.COM any more.

Making the program loader more flexible meant that these "well-known programs" could retain their COM extension while no longer being constrained by the "It all must fit into 64KB" limitation of COM files.

But wait, what if a COM program just happened to begin with the letters MZ? Fortunately, that never happened, because the machine code for "MZ" disassembles as follows:

0100 4D            DEC     BP
0101 5A            POP     DX

The first instruction decrements a register whose initial value is undefined, and the second instruction underflows the stack. No sane program would begin with two undefined operations.

Tags History


Comments (42)

  1. wades says:

March 24, 2008 at 11:14 am

“To load a COM file, the program loader merely sucked the file into memory unchanged and then jumped to the first byte. No fixups, no checksum, nothing. Just load and go.”

You left out the part that the “first byte” gets loaded at offset 0x100 relative to the value of the segment registers though. And the “no fixups” part meant that the image had to be self-relocating.

[There are plenty of details I left out since they were not relevant to the topic. -Raymond]

  1. Spire says:

March 24, 2008 at 11:30 am

I wonder if Mark Zbikowski ever thought to verify that DEC BP and POP DX were indeed undefined operations at the beginning at a program — just in case Microsoft ever decided to be sneaky and start renaming EXE files to COM files. If not, then thats a pretty happy coincidence.

In retrospect, I cant help but think that something like “É0Σ═!” (90 30 E4 CD 21) would have been a better EXE marker. That disassembles to a NOP followed by XOR AH, AH and INT 21h (a call to DOS to terminate the program).

Optionally: Allow a sequence of bytes to be inserted in between the NOP and the termination call. This would give EXE files the flexibility to contain a stub COM file that could print something like “This is an EXE program.” before terminating.

Now wheres that time machine?

[Um, you do realize that your “optionally” means that every COM program would get misdetected as an EXE? -Raymond]

  1. Mike says:

March 24, 2008 at 11:45 am

@wades:

Id say that COM images arent self-relocating at all. Self-relocating means (IMO) that you can load them at another address than 0x100 but that really doesnt work with a COM image.

  1. Kalle Olavi Niemitalo says:

March 24, 2008 at 12:12 pm

IIRC, there is a 0000H on the stack when a COM program starts, and an INT 20H at PSP:0000H.  This is so that the program can exit just by doing a RETN.  So the POP DX would not really underflow the stack.

  1. A diferen??a entre arquivos .COM e .EXE « Blog do Rocco says:

March 24, 2008 at 12:27 pm

PingBack from http://blogdorocco.wordpress.com/2008/03/24/a-diferenca-entre-arquivos-com-e-exe/

  1. Eber Irigoyen says:

March 24, 2008 at 12:28 pm

but why wasnt the loader modified ONLY for the "well known programs"?

  1. Wex says:

March 24, 2008 at 12:29 pm

But if there really is no prolog and you just jump and execute, how are you guaranteed that theres a 0000h on the top of the stack?

  1. David Walker says:

March 24, 2008 at 12:54 pm

Raymond: Your question "So when did the program loader change" is answered "Compatibility", which leads me to think that by "when", you meant "why".

  1. Stephen Eilert says:

March 24, 2008 at 12:58 pm

""To load a COM file, the program loader merely sucked the file into memory unchanged and then jumped to the first byte. No fixups, no checksum, nothing. Just load and go."

You left out the part that the "first byte" gets loaded at offset 0x100 relative to the value of the segment registers though. And the "no fixups" part meant that the image had to be self-relocating."

Interestingly, CP/M and its successors(including MSX-DOS and MS-DOS) all loaded their programs at offset 0x100. It is perhaps the only thing that can be called "standard" among .COM files, even when different processor architectures are involved.

The Z80 processor that was pretty common at the time could only adress 64KB of RAM, so no segment registers to worry about. I think it was no coincidence that the 8086 segments were created in that size.

  1. James says:

March 24, 2008 at 1:03 pm

Eber: Which well known programs? OK, maybe in the beginning this was only needed by COMMAND.COM and EDIT.COM but that list grew. Better to come up with a generic solution, allowing ANY .COM executable to exceed 64K by being in .EXE format, rather than keep updating a list of file-specific hacks!

Also, Visual C++s compiler uses a related trick at some point to provide both a GUI and command line version of itself under the same name (excluding file extension) by having both .EXE and .COM versions, with the command line trying to run the .COM version first, unlike the GUI. If the .COM/.EXE hack were filename specific, this wouldnt be possible at least without the Visual Studio team getting the OS loader updated specially for them, which would probably irritate a lot of people as well as being bad engineering in principle.

  1. mikeb says:

March 24, 2008 at 1:04 pm

But if there really is no prolog and you just jump and execute, how are you guaranteed that theres a 0000h on the top of the stack?

Now, thats quite a nitpick.  Raymond didnt actually say that the loader performed absolutely no preparation for the COM program he just said that nothing was done to the program image.

  1. mikeb says:

March 24, 2008 at 1:08 pm

Useless trivia for the day:  Either "MZ" or "ZM" was a valid EXE header signature at least in DOS.  Im not sure about Windows.

  1. doynax says:

March 24, 2008 at 1:10 pm

"Id say that COM images arent self-relocating at all. Self-relocating means (IMO) that you can load them at another address than 0x100 but that really doesnt work with a COM image."

Sure you can. With segment addresses overlapping the near 16-bit offset you could load it at any 16-byte aligned address in memory.

  1. Kenny says:

March 24, 2008 at 2:33 pm

Sure you can. With segment addresses overlapping the near 16-bit offset you could load it at any 16-byte aligned address in memory.

Nitpicking. The CS:IP would be xxxx:0100 anyway.

In retrospect, I cant help but think that something like "É0Σ═!" (90 30 E4 CD 21) would have been a better EXE marker. That disassembles to a NOP followed by XOR AH, AH and INT 21h (a call to DOS to terminate the program).

In retrospect, that would be overkill.  Its not like everyone was going to rename EXEs in COMs everyday.

  1. reader says:

March 24, 2008 at 2:44 pm

Interestingly, CP/M and its successors(including MSX-DOS and MS-DOS) all loaded their programs at offset 0x100. It is perhaps the only thing that can be called "standard" among .COM files, even when different processor architectures are involved.

I believe the historic reason is that the memory from 0x00 to 0xFF is used for the stack, which in turn originated in certain old CPU architecture (Z80 for example I think) where the stack pointer is only 8-bit.  Anyway, MS-DOS was derived from CP/M so naturally it followed the same convention as CP/M for COM images.

Also, Visual C++s compiler uses a related trick at some point to provide both a GUI and command line version of itself under the same name (excluding file extension) by having both .EXE and .COM versions, with the command line trying to run the .COM version first, unlike the GUI.

That sounds inaccurate to me.  16-bit Windows executables uses the NE format, which builds on top of the MS-DOS exe format such that theres an MS-DOS exe "stub" (which is really just arbitrary code) that gets run if you run the program under MS-DOS, and the new NE-specific stuff essentially follows after the stub.  It makes much more sense for VC++ to make use of that, rather than .COM/.EXE, to support the dual-UI feature.

  1. Philip Newton says:

March 24, 2008 at 2:56 pm

@reader: The stack started at 0xFFFE and grew downwards.

0x00 to 0xFF was the Program Segment Prefix, which included such things as the command line arguments and two File Control Blocks, at least the first of which was helpfully filled in for you (IIRC) if the first argument looked like a filename.

Another backwards-compatibility tidbit was that address 0x0005 contained a jump to an interrupt routine so that CP/M-like programs which did "CALL 0005" rather than "INT 21H" would also work.

See also http://en.wikipedia.org/wiki/Program_Segment_Prefix .

  1. poochner says:

March 24, 2008 at 3:49 pm

CP/M didnt start out on on the Zilog (Z80) CPU.  It ran on the 8080 (8-bit precursor to the Intel 8086).  It just so happened that the Z80, (coincidentally you know, not planned or anything… ahem), that the Z80 was a superset of the 8080 and could run 8080 programs just fine, thanks.  Plus it had a couple of other registers and a few extra instructions.  But CP/M ran on it, and that was the important part.  Under CP/M, the OS (such as you could call it one) owned the memory below 0x100.  It had bios call tables, the command line / default disk buffer, and a bunch of other undocumented things that people depended on not to move or change ever again.  (Raymonds compatibility problems go back at least that far).  I dont recall exactly what SP was set to, but if you saved it you could return directly to the "command interpreter," rather than doing a reset that required the interpreter to be reloaded.  The stack pointer was most definitely 16 bits.  There were some older chips that had 8 bit stack pointers, though.  Some of those are still around being used as micro-controllers (like toasters and exercise bikes).  When you want a bunch of them, getting them for a nickel is a good thing.

Sorry, Im rambling again.  Age will do that.

  1. afaucher says:

March 24, 2008 at 3:58 pm

Thought I would mention, I found the book "Virus Research & Defense" published by the Symantec press (I forget the author name) to have quite an informative history of how code files evolved.  Including lots of details on windows PE format.  It of course focuses on how they were abused over time, but it is still quite relevant.

  1. Mats Gefvert says:

March 24, 2008 at 4:02 pm

I would be curious to know when exactly COMMAND.COM was renamed into CMD.EXE and how that affected compatibility? It seems like a much bigger, breaking change than renaming FORMAT.COM into FORMAT.EXE.

  1. Mats Gefvert says:

March 24, 2008 at 4:05 pm

And, just because of that, I noticed that COMMAND.COM is still around. Huh, never realized that…

  1. josh says:

March 24, 2008 at 4:18 pm

"I believe the historic reason is that the memory from 0x00 to 0xFF is used for the stack, which in turn originated in certain old CPU architecture (Z80 for example I think) where the stack pointer is only 8-bit."

Hm, the z80 had a 16-bit SP.  6502 had an 8-bit SP, but its stack was at 0x100 0x1FF, just above zero page.

But 0x000 0x0FF tended to have things hardwired in it like rst vectors.

"And the no fixups part meant that the image had to be self-relocating."

Well, they could be loaded in any segment, and I recall a lot of push cs/pop ds… or es…  I dont remember exactly which registers…  But I might be thinking of boot records, where its just smaller than loading the address you already know youre at.

But anyway, thats a bit different from being really position independent or self-relocating.  With only one segment, addressing is flat and fixed so theres nothing to patch internally.

  1. Blake Coverett says:

March 24, 2008 at 4:19 pm

 "That sounds inaccurate to me.  16-bit Windows executables uses the NE format, which builds on top of the MS-DOS exe format such that theres an MS-DOS exe "stub" (which is really just arbitrary code) that gets run if you run the program under MS-DOS, and the new NE-specific stuff essentially follows after the stub.  It makes much more sense for VC++ to make use of that, rather than .COM/.EXE, to support the dual-UI feature."

James was correct.  There were other, older, DOS/Win16 Microsoft tools that used the MZ stub and the NE executable to provide dual-mode behavior, but what devenv did was a different sort of hack.  There was both devenv.com and devenv.exe, both were in fact PEs with a standard stub, but because .COM files were found first by CMD.EXE (given the default PATHEXT) when you typed devenv from a command prompt you got devenv.com, the console subsystem PE executable, but the start menu/etc shortcuts were to devenv.exe the windows subsystem PE executable.

It was a hack.

  1. Xepol says:

March 24, 2008 at 4:39 pm

Or, they could have provided small .com stubs to launch the .exe files.

Nah, that would have been sane and kept things simple instead of doing something overly complicated and prone to strange side effects.

And hey, as long as it all fits on a floppy, right?

  1. mikeb says:

March 24, 2008 at 5:21 pm

Or, they could have provided small .com stubs to launch the .exe files.

Nah, that would have been sane and kept things simple instead of doing something overly complicated and prone to strange side effects. <<

How is that simpler than simply having the loader look for the MZ signature?  What strange side effects does what Raymond described have?

Having a small .com stub shell out to the real .exe is probably the first solution that would have come to my mind, but it has the downside that now you have to make sure 2 executables are available (and then youd have Raymonds article explaining "Why do some standard executables have both a .com file and a .exe file, such as format.com and format.exe?").  Personally, I think having the loader not care about the extension is much cleaner and preferable.

  1. anonymous says:

March 24, 2008 at 5:23 pm

Btw, is Cmd.exe also in "maintenance mode" or legacy stuff. Why isnt it being improved?

  1. Spire says:

March 24, 2008 at 6:11 pm

[Um, you do realize that your “optionally” means that every COM program would get misdetected as an EXE? -Raymond]

No, only every COM program that starts with a NOP.

[Ah, right, sorry; I missed that part. It does make parsing the header significantly more difficult, however, since locating the header becomes O(n). -Raymond]

  1. Dan says:

March 24, 2008 at 6:13 pm

command.com seems to have more compatibility stuff than cmd.exe, which wouldnt need it since cmd.exe wasnt around in DOS/9x/ME.

At least, I recall I prefer cmd.exe over command.com and I think that was the reason why.

I always knew about the "MZ" header but didnt realize it was some guys initials.

Heres some fun: http://www.eicar.org/anti_virus_test_file.htm

The ASCII string which is actually a binary COM file.  You can paste it into notepad and save it as a .COM file and run it to see if your anti-virus catches it (its a harmless "Hello World!" style program used as a test for anti-virus products).  I always thought it was kinda neat how it didnt have any control codes or >= 0x80 characters.

  1. Dan says:

March 24, 2008 at 6:16 pm

anon: Theres your problem.  Starting a COM file with a NOP is perfectly acceptable since COM files have no syntax.  The only way your idea would work is if you started the file with something that would be impossible to use at the beginning of the COM file because it wouldnt work… like "MZ".  Of course if you use that or any variant your idea no longer works since its based on the idea that the file starts with acceptable COM code!

  1. Neil says:

March 24, 2008 at 8:03 pm

And hey, as long as it all fits on a floppy, right?

To which end it appears that something along the lines of PKSFX was used to compress the executable.

As for using the real-mode stub for a PE executable, Ive only seen it done once, I think it was for some old version of Excel that shipped with its own copy of Windows (since most people didnt have Windows then) and the job of the stub was simply to execute "win excel".

I guess fitting on a floppy was the main reason why Windows 95 xcopy.exe launched xcopy32.exe instead of being dual-mode.

  1. Maxie says:

March 24, 2008 at 8:04 pm

The ZM signature was valied in DOS but not in Windows.

  1. Brian Reiter says:

March 24, 2008 at 8:37 pm

Also, Visual C++s compiler uses a related trick at some point to provide both a GUI and command line version of itself under the same name (excluding file extension) by having both .EXE and .COM versions, with the command line trying to run the .COM version first, unlike the GUI.

That sounds inaccurate to me.  

And yet it is true. If you use a command shell to execute "devenv /build mysolution.sln" then youll get devenv.com and it will be a text mode build. Thats because .COM comes before .EXE in the PATHEXT environmental variable.

One persons backwards compatibility hack becomes anothers feature.

PS> ($env:PATHEXT)

.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC

PS> get-command devenv | fl

Name            : devenv.com

CommandType     : Application

Definition      : C:Program Files (x86)Microsoft Visual Studio 8Common7IDEdevenv.com

Extension       : .com

Path            : C:Program Files (x86)Microsoft Visual Studio 8Common7IDEdevenv.com

FileVersionInfo : File:             C:Program Files (x86)Microsoft Visual Studio 8Common7IDEdevenv.com

                 InternalName:     DEVENV.COM

                 OriginalFilename: DEVENV.COM

                 FileVersion:      8.0.50727.42 built by: RTM

                 FileDescription:  Microsoft Visual Studio Command Line

                 Product:          Microsoft® Visual Studio® 2005

                 ProductVersion:   8.0.50727.42

                 Debug:            False

                 Patched:          False

                 PreRelease:       True

                 PrivateBuild:     True

                 SpecialBuild:     False

                 Language:         English (United States)

Name            : devenv.exe

CommandType     : Application

Definition      : C:Program Files (x86)Microsoft Visual Studio 8Common7IDEdevenv.exe

Extension       : .exe

Path            : C:Program Files (x86)Microsoft Visual Studio 8Common7IDEdevenv.exe

FileVersionInfo : File:             C:Program Files (x86)Microsoft Visual Studio 8Common7IDEdevenv.exe

                 InternalName:     devenv.exe

                 OriginalFilename: devenv.exe

                 FileVersion:      8.0.50727.867 built by: vsvista

                 FileDescription:  Microsoft Visual Studio 2005

                 Product:          Microsoft® Visual Studio® 2005

                 ProductVersion:   8.0.50727.867

                 Debug:            False

                 Patched:          False

                 PreRelease:       True

                 PrivateBuild:     True

                 SpecialBuild:     False

                 Language:         English (United States)

  1. reader says:

March 24, 2008 at 10:01 pm

@reader: The stack started at 0xFFFE and grew downwards.

0x00 to 0xFF was the Program Segment Prefix,

Oops, youre right, I remembered incorrectly.  My bad.

  1. Anon says:

March 24, 2008 at 10:37 pm

I always liked the MZ+LE trick where you could write a VxD with a Dos stub was actually the Dos version of the program. The idea was that if you run it in Dos only the MZ part was used.  But that hooks int 2fh and uses that hook to load a small VxD when with just enough logic to make it keep working after Windows has virtualized everything.

http://support.microsoft.com/kb/74516

  1. Gabe says:

March 24, 2008 at 11:03 pm

I think xcopy32 was around just to give 16-bit xcopy long filename support.

  1. Robbie Mosaic says:

March 25, 2008 at 12:45 am

cmd.exe is good enough, right?  I dont think it requires much improvement.  We can build simple tools that drive cmd.exe such as my project winrosh to make it more fun.  OTOH, xcopy, copy and move commands can be improved (to make copying/moving files more capable than using GUI operations), but not replaced with a single robocopy command.

  1. Mike Dimmick says:

March 24, 2008 at 8:41 pm

The .com + .exe trick for Visual Studio was introduced at least as early as VC6 (msdev.com, msdev.exe) and adapted for eMbedded Visual C++ (evc.com/.exe) then adopted also for the unified IDE of VS.NET 2002. VS 2008 still ships devenv.com and devenv.exe. The .com file is a small stub which loads the .exe, passing it a handle to the console that the .com was loaded in, so that Visual Studios build system can send output to that console. The ability to attach a program not already associated with a console to an existing console, using the AttachConsole function, was only added in Windows XP. The devenv.com program itself is a renamed console-subsystem Windows executable (PE file).

@Mats Gefvert: CMD.EXE is a console-mode subsystem command interpreter. It isnt required for running console-mode programs. x64 systems have a 32-bit version in %SystemRoot%SysWOW64 and a 64-bit build in %SystemRoot%System32.

COMMAND.COM is the 16-bit DOS interpreter, and it is loaded for a DOS environment as DOS programs expected it to be there. x64 systems do not contain COMMAND.COM as they have no Virtual DOS Machine environment (ntvdm), as the required processor submode was removed by AMD. (Its still there if you boot the processor in 32-bit protected mode, but a 64-bit OS cannot access it.) If you type COMMAND into the run box rather than CMD, you get a less functional, slower command interpreter on 32-bit, and an error on 64-bit. Use CMD.

  1. Anon says:

March 25, 2008 at 4:51 am

@Mike Dimmick

"x64 systems do not contain COMMAND.COM as they have no Virtual DOS Machine environment (ntvdm), as the required processor submode was removed by AMD. (Its still there if you boot the processor in 32-bit protected mode, but a 64-bit OS cannot access it.)"

True enough. But did you know that HAL in Windows XP x64 actually emulates 16 bit Bios code in software so that video drivers which still need to can call it? I guess by the time Vista 64 shipped the video card vendors had had enough time to find another way to get whatever information they wanted, because the emulator is no longer present there.

  1. John Elliott says:

March 25, 2008 at 5:59 am

When 8-bit CP/M needed an expanded COM file format, the magic number used was 0xC9, which in 8080 machine code is RET. Later another extension was added by third-party developers, and that used 0xC7 (RST 0; it would be like starting a DOS COM file with 0xCD 0x20, INT 20h).

I think theres another criterion for a file being treated as EXE rather than COM; its got to be big enough to contain the EXE header. So this: 4d 5a ba 0c 01 b1 09 e8 fb fe cd 20 48 65 6c 6c 6f 24 is run as a COM file (at least on XP), despite starting MZ.

  1. Yuhong Bao says:

March 25, 2008 at 10:55 am

"But did you know that HAL in Windows XP x64 actually emulates 16 bit Bios code in software so that video drivers which still need to can call it? I guess by the time Vista 64 shipped the video card vendors had had enough time to find another way to get whatever information they wanted, because the emulator is no longer present there."

Indeed the call for doing this was removed in WDDM.

  1. me says:

March 25, 2008 at 2:12 pm

Just a little nitpicking: .COM files are not limited to 64 KB. They can grow larger and address all of the space, at least on MS-DOS 3.2 and after. The only problem is that the writer of the .COM file has to handle all the segment arithmetic on his own, as the DOS loader did not perform any adjustments (as it does with .EXE files).

Of course, switching to the MZ-EXE was a good move in the first place.

  1. Ulric says:

March 25, 2008 at 3:20 pm

DEVENV.exe vs DEVENV.com

this is true.

DevEnv.com is the one that correctly handles piping/buffering the output so you can pipe it in something else.  

So when you build at the command line, it uses devenv.com and you see the build output as it  progresses..  if you used DevEnv.exe, you only got the result at the end.

  1. Wampiryczny blog says:

May 23, 2008 at 4:10 pm

Wiem już, że kilka plików, które były wykorzystywane podczas infekcji, nie jest widocznych w systemie plików. Oznacza to (najprawdopodobniej), że zostały one usunięte… Czego szukam… Pliki, których szukam: C:Documents and SettingsAdmini

Comments are closed.

Skip to main content

Follow Us

News

Holy cow, I wrote a book

Basics

Categories

Archives

Privacy & Cookies Terms of Use Trademarks

© 2018 Microsoft