hn-classics/_stories/2006/8091290.md

463 lines
20 KiB
Markdown
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
created_at: '2014-07-26T23:40:19.000Z'
title: How retiring segmentation in AMD64 long mode broke VMware (2006)
url: http://www.pagetable.com/?p=25
author: userbinator
points: 53
story_text: ''
comment_text:
num_comments: 12
story_id:
story_title:
story_url:
parent_id:
created_at_i: 1406418019
_tags:
- story
- author_userbinator
- story_8091290
objectID: '8091290'
year: 2006
---
[Source](http://www.pagetable.com/?p=25 "Permalink to How retiring segmentation in AMD64 long mode broke VMware | pagetable.com")
# How retiring segmentation in AMD64 long mode broke VMware | pagetable.com
# [pagetable.com][1]
## Some Assembly Required
### Menu
[Skip to content][2]
* [Home][3]
* [About][4]
# How retiring segmentation in AMD64 long mode broke VMware
[32 Replies][5]
UNIX, Windows NT, and all the operating systems in their class rely on virtual memory, or paging, in order to provide every process on the system a complete address space of its own. An easier way to protect processes from each other is segmentation: The 4 GB address space of a 32 bit CPU is divided into segments (consisting of a physical base address and a limit), one for each process, and every process may only access their own segment. This is what the 286 did.
The 386 then introduced virtual memory, but segmentation was still possible, either instead of, or on top of the paged virtual address space. Today, no modern operating system for the x86 uses segmentation any more, so for every process, the base for the code and data segments is set to 0, and the limit is set to 0xFFFFFFFF.
The AMD64 architecture, while still being fully compatible in 32 bit mode, retired a lot of legacy functionality in the new 64 bit long mode, including most of segmentation. The CS (code), DS (data 1), ES (data 2) and SS (stack) segment registers are practically gone, and the FS and GS segments still support a base (which can be used in tricks to quickly access data at a constant position, like the TCB), but the limit is no longer enforced. Now operating systems dont have to save and restore most of these segment registers any more when switching contexts, making these switches faster.
But this broke VMware. While VMware could still virtualize 32 bit operating systems on AMD64 CPUs, they could not virtualize 64 bit operating systems, because they required segment limits.
In a nutshell, this is how VMware works: All user mode code of the guest runs in exactly the environment it expects; VMware makes sure the page mappings of the user mode address spaces are correct. All kernel mode code of the guest will be run in user mode, and again, VMware must layout memory as the guest kernel expects it to be. In both modes of operation, there can be exceptions, like system calls (by guest user mode code) or page table modifications (by guest kernel mode code). These have to be trapped by the virtual machine monitor, and the respective functionality has to be carried out in a modified way, so that they still seem to have the correct effect to the guest, but dont interfere with the host operating system.
The virtual machine monitors trap handler must reside in the guests address space, because an exception cannot switch address spaces. So VMwares trap handler sits at the very top of the every guests address space, which is unused by all major operating systems. According to Popek and Goldbergs definition of virtualization, there must be no way for code inside a virtual machine to escape, and modify the hosts state in any way not directly controlled by the monitor. Therefore, it must be made sure that the guest code cannot write to the trap handler code. VMware does this using segment limits: The limits of all segment registers are set to something like 0xFFFFEFFF to protect the uppermost 4 KB of the address space where the trap handler resides.
With no segment limits any more for 64 bit code, this way to protect the trap handler was impossible. Unable to comply with Popek and Goldbergs security requirement, VMware chose not to support 64 bit virtualization until AMD reintroduced (optional) segment limits on later models of their Opteron and Athlon 64 CPUs. Intel never implemented 64 bit segment limits on their EM64T/Intel64 CPUs, because their 64 bit processors soon implemented VT/Vanderpool, which also worked around the problem. So this is why VMware requires a certain model and stepping of the AMD CPU line or a VT-enabled Intel CPU in order to support 64 bit virtualization.
Now the question is: Why dont they protect the uppermost page using the permission bits in the page table? This is how all operating systems protect themselves from user mode processes. If you have an answer on this, or otherwise have thoughts, please comment on this post. ![:-\)][6]
References: [1][7], [2][8], [3][9], [4][10], [5][11]
This entry was posted in [puzzle][12], [trivia][13] on [November 9, 2006][14] by [Michael Steil][15].
### Post navigation
←[ Strange SSE3 opcodes][16] [Switching modes with Style →][17]
## 32 thoughts on “How retiring segmentation in AMD64 long mode broke VMware”
1. **James Ideal** [November 13, 2006 at 09:46][18]
Mmmm… good question. My uniformed guess will be that with the segment protection model you have a finer grained protection model, but in the case of the MMU model you only have one bit per PDT: (U)ser/(S)ystem where 3=U and 0,1,2 = S, so it might come to excesive overhead if vmware also needs to babysit the MMU traps to implement yet another protection level.
[Reply][19] ↓
2. **seppel** [November 20, 2006 at 12:35][20]
You wrote: “The virtual machine monitor’s trap handler must reside in the guest’s address space, because an exception cannot switch address spaces.” I still think that this is wrong. I havent tried it but Im quite sure you can place a task gate in the IDT pointing to a TSS which contains cr3 and so can switch address spaces.
[Reply][21] ↓
3. **Matthias** [December 27, 2006 at 02:55][22]
I think they prefer segments because they can prevent the OS from reading VMware code, which should ideally be impossible. As far as I know, you cant do executable-without-read with the MMU.
[Reply][23] ↓
4. **Ghostwriter** [January 5, 2007 at 22:11][24]
James Ideal is dead on, though he incorrectly implies that you can solve the issue by interposing on MMU traps. x86 may have 4 rings, but as far as paging is concerned, it only has 0 and 1-3. If you use ring0 to protect your monitor, you have no way of protecting guest kernelspace from guest userspace.
Interestingly, this means that Xen/x86-64 (since even AMD64 revDs re-enabling of segmentation doesnt re-enable %fss limit) runs both guest kernel space and guest user space in ring3 (and the monitor in ring0, of course), with two (mostly) disjoint sets of page tables. This makes system calls quite a bit more expensive, but does provide all the various isolation youd expect from both the OS and the VMM.
[Reply][25] ↓
5. **dex** [January 16, 2007 at 11:40][26]
Using WinDbg, I dumped CS GDT descriptor (index 1 and index 3) under VMware, the segment size is 4GB.
How come?
[Reply][27] ↓
6. [**Myria][3]** [January 27, 2007 at 19:14][28]
dex: WinDbg uses GetThreadSelectorEntry (NtQueryInformationThread), an operating system call. The kernel reads the GDT to get this information for you, which is the virtual GDT.
Try using the “lsl” instruction in user mode, which cant be faked because it doesnt cause an exception. I dont have VMware at home so I cant try this right now.
[Reply][29] ↓
7. **ZPulse** [February 17, 2007 at 08:40][30]
Ghostwriter, isnt rings 0-2 or 3 for paging (and not 0,1-3)? Hence, guest OS needs to reside in 3 so as not to affect the VMM.
[Reply][31] ↓
8. Pingback: [penicillin][32]
9. Pingback: [low cost meridia][33]
10. **walken** [May 8, 2007 at 01:53][34]
A pure trap handler would not work, because some instructions that expose privileged state do not trap on x86. vmware uses binary translation to work around this, but this requires to run translated code in the protected upper area, which is why they dont use permission bits. The asplos paper you cited as your 4th reference gives a small example of binary translation at work.
[Reply][35] ↓
11. [**LolitochkaBC][36]** [May 10, 2007 at 00:24][37]
Аааану-ка ребяти голоусме!!!
Призннавайтесь проказники и владельцы сайта <http://www.pagetable.com> ))))
ЧТО вы будет делать этимм летомп?!
[Reply][38] ↓
12. [**rtkiiggfnm][39]** [June 21, 2007 at 04:36][40]
Hello! Good Site! Thanks you! hcquzstxrlphjr
[Reply][41] ↓
13. [**bralgangego][42]** [June 23, 2007 at 14:59][43]
Soprry pleaase (:
Wrong category…
will bew caerflu
[Reply][44] ↓
14. Pingback: [71b39b1b9e9b64828c198e6252dea268][45]
15. **lourtubrort** [July 31, 2007 at 00:44][46]
<http://www.pagetable.com> the best site !
I like your great site <http://www.pagetable.com> .
I think it wasnt easy to post here so much information.
regards
Ismail
[Reply][47] ↓
16. Pingback: [Hairy Old Snatch][48]
17. Pingback: [Old Moms And Teens][49]
18. [**freeeeringtones][50]** [August 18, 2007 at 11:17][51]
popular free ringtones
<http://www.thehotstop.info>
signature…
[Reply][52] ↓
19. [**AltaGid][53]** [August 20, 2007 at 14:05][54]
Hello! Help solve the problem.
Very often try to enter the site, but says that the password is not correct.
Regrettably use of remembering. Give like to be?
Thank you!
[Reply][55] ↓
20. **OVGuillermo** [August 25, 2007 at 14:21][56]
Thank you for your site. I have found here much useful information.
Good site ! ![;\)][57]
[Reply][58] ↓
21. [**TaferyPupki][59]** [November 2, 2007 at 00:28][60]
Hello
stgdgedr
restartos
[Reply][61] ↓
22. [**JAICYAVANILIB][62]** [November 15, 2007 at 17:12][63]
Sounds like your finances are becoming a problem [debt settlement program][64] is the solution. … 2007 USA Debt Consolidation.
Real debt help is not quick or easy …
[Reply][65] ↓
23. **mr.search** [November 24, 2007 at 06:58][66]
Hello
Please prompt where it is possible to buy online viagra
[Reply][67] ↓
24. **Tipilsanin** [November 26, 2007 at 04:33][68]
add
[Reply][69] ↓
25. **Yuhong Bao** [December 17, 2007 at 22:21][70]
Anyone have a reference to programming info on how to take advantage of the support for the 64-bit segment limits?
[Reply][71] ↓
26. **Interested** [December 19, 2007 at 02:02][72]
So uh, i just happen to be running a virtual x86/amd64 machine in vmware, on a host 32bit OS. I use my virtual machines to do alot of tinkering with programs etc, because *nix is obviously the choice OS for this. Anyway, my question is, could this account for a segfault in some assembly code im tinkering with? Like ive read that when writing asm, the .data segment should be read/write. Knowing this, i couldnt figure out this segfault till i found valgrind for some debugging. Im new to assembly(very new even) and i thought for sure i was the one at fault. But then valgrind reports that i dont have proper permissions to modify this memory location(the .data segment, im doing a self modifying string.) So I guess my actual question is, am i at fault, is it something particular with amd64 assembly, or is it something to do with running a 64bit guest OS on a 32bit host OS in vmware?
[Reply][73] ↓
27. **Interested** [December 19, 2007 at 10:17][74]
Was able to get the results i wanted by manually passing the address of .data to ld by compiling, and finding my original string. This still seems strange to me tho, it doesnt seem to matter how i define the .data section in the source it gets screwed up so i have to manually examine the compiled binary and then do the whole process over each time i change things. Any examples i can follow for nasm syntax to see if im messing up the section declaration? Im already into the nasm docs, but im reading slowly over a whole base of knowledge trying to get a good jump into asm. Still kind of new at this tho, so im wondering is this kind of thing the usual? Seems like I might be keeping my headache for a while if it is.
[Reply][75] ↓
28. **adx** [May 13, 2008 at 01:07][76]
Myria, they have to virtualize LSL just like they have to virtualize MOV AX, CS and similar. (Unless they do something extraordinary tricky, like running Ring-3 with selector like on Ring-0 … is that possible?)
As far as I know, there is a thing called code scanning (up to branches and then again), although this is not very clear to me. For if code scanning put INT3 opcodes, then theres again the problem what to do to avoid seeing changed code by the guest…
[Reply][77] ↓
29. **adx** [May 13, 2008 at 01:08][78]
Ill have to debug it more with my SoftICE!
[Reply][79] ↓
30. [**alman][80]** [January 31, 2013 at 08:58][81]
Is this a reason why L4Ka Pistachio is buggy on some x64 emulators?
[Reply][82] ↓
31. [**chaoxifer][83]** [October 29, 2013 at 17:05][84]
This makes me to solve some questions:
1\. Why segmentation still exists?
2\. The “segmentation” can be supported by other architecture?
3\. is that possible to occur overhead by using segmentation while translating between virtual address and physical address?
Thanks for posting Good information.
[Reply][85] ↓
32. **MSFT** [July 27, 2014 at 09:17][86]
Retiring the bi-endian-ness flag of the PowerPC, on the PowerPC G5, broke VirtualPC, too.
Virtual machines are arguably the most brutal piece of software you can run. Nothing taxes more obscure parts of your system than a VM.
[Reply][87] ↓
### Leave a Reply [Cancel reply][88]
Your email address will not be published. Required fields are marked *
Name *
Email *
Website
Comment
You may use these HTML tags and attributes: `<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> `
Notify me of follow-up comments by email.
Notify me of new posts by email.
Search for:
### Recent Posts
* [Murdlok: A new old adventure game for the C64][89]
* [Commodore KERNAL History][90]
* [The Ultimate Apollo Guidance Computer Talk [video]][91]
* [The Ultimate Apollo Guidance Computer Talk @ 34C3][92]
* [62 Reverse-Engineered C64 Assembly Listings][93]
### github
### Blogroll
* [debugmode][94]
* [Alex Ionescu's Blog][95]
* [Fun with virtualization][96]
* [OS/2 Museum][97]
### Categories
* [6502][98]
* [archeology][99]
* [default][100]
* [digital video][101]
* [hacks][102]
* [literature][103]
* [puzzle][104]
* [SCUMM][105]
* [security][106]
* [tricks][107]
* [trivia][108]
* [Uncategorized][109]
* [whines][110]
### Meta
* [Log in][111]
* [Entries RSS][112]
* [Comments RSS][113]
* [WordPress.org][114]
[Proudly powered by WordPress][115]
[1]: http://www.pagetable.com/ "pagetable.com"
[2]: http://www.pagetable.com#content "Skip to content"
[3]: http://www.pagetable.com/
[4]: http://www.pagetable.com/?page_id=5
[5]: http://www.pagetable.com/?p=25#comments "Comment on How retiring segmentation in AMD64 long mode broke VMware"
[6]: http://www.pagetable.com/wp-includes/images/smilies/icon_smile.gif
[7]: http://kb.vmware.com/KanisaPlatform/Publishing/73/1901_f.SAL_Public.html
[8]: http://www.linode.com/xen/irc/logs/xen.log-2006-01-12
[9]: http://download3.vmware.com/vmworld/2005/pac346.pdf
[10]: http://www.vmware.com/pdf/asplos235_adams.pdf
[11]: http://x86vmm.blogspot.com/2006/08/blue-pill-is-quasi-illiterate.html
[12]: http://www.pagetable.com/?cat=8 "View all posts in puzzle"
[13]: http://www.pagetable.com/?cat=12 "View all posts in trivia"
[14]: http://www.pagetable.com/?p=25 "00:22"
[15]: http://www.pagetable.com/?author=1 "View all posts by Michael Steil"
[16]: http://www.pagetable.com/?p=24
[17]: http://www.pagetable.com/?p=26
[18]: http://www.pagetable.com/?p=25#comment-1175
[19]: /?p=25&replytocom=1175#respond
[20]: http://www.pagetable.com/?p=25#comment-1176
[21]: /?p=25&replytocom=1176#respond
[22]: http://www.pagetable.com/?p=25#comment-1177
[23]: /?p=25&replytocom=1177#respond
[24]: http://www.pagetable.com/?p=25#comment-1178
[25]: /?p=25&replytocom=1178#respond
[26]: http://www.pagetable.com/?p=25#comment-1179
[27]: /?p=25&replytocom=1179#respond
[28]: http://www.pagetable.com/?p=25#comment-1180
[29]: /?p=25&replytocom=1180#respond
[30]: http://www.pagetable.com/?p=25#comment-1181
[31]: /?p=25&replytocom=1181#respond
[32]: http://gcreativestudios.com/SCtests/1/
[33]: http://imeridia.createmybb.com/index.php
[34]: http://www.pagetable.com/?p=25#comment-1184
[35]: /?p=25&replytocom=1184#respond
[36]: http://www.google.com
[37]: http://www.pagetable.com/?p=25#comment-1185
[38]: /?p=25&replytocom=1185#respond
[39]: http://rtkiiggfnm.com
[40]: http://www.pagetable.com/?p=25#comment-1186
[41]: /?p=25&replytocom=1186#respond
[42]: http://www.gmail.com/
[43]: http://www.pagetable.com/?p=25#comment-1187
[44]: /?p=25&replytocom=1187#respond
[45]: http://www.google.com/71b39b1b9e9b64828c198e6252dea268
[46]: http://www.pagetable.com/?p=25#comment-1189
[47]: /?p=25&replytocom=1189#respond
[48]: http://hairyoldsnatch.com/?id=doremi
[49]: http://oldmomsandteens.com/?id=doremi
[50]: http://www.thehotstop.info
[51]: http://www.pagetable.com/?p=25#comment-1192
[52]: /?p=25&replytocom=1192#respond
[53]: http://www.site.ru
[54]: http://www.pagetable.com/?p=25#comment-1193
[55]: /?p=25&replytocom=1193#respond
[56]: http://www.pagetable.com/?p=25#comment-1194
[57]: http://www.pagetable.com/wp-includes/images/smilies/icon_wink.gif
[58]: /?p=25&replytocom=1194#respond
[59]: http://linkzpage.ueuo.com/
[60]: http://www.pagetable.com/?p=25#comment-1195
[61]: /?p=25&replytocom=1195#respond
[62]: http://deptmanagment.com
[63]: http://www.pagetable.com/?p=25#comment-1196
[64]: http://www.pagetable.com/deptmanagment.com
[65]: /?p=25&replytocom=1196#respond
[66]: http://www.pagetable.com/?p=25#comment-1197
[67]: /?p=25&replytocom=1197#respond
[68]: http://www.pagetable.com/?p=25#comment-1198
[69]: /?p=25&replytocom=1198#respond
[70]: http://www.pagetable.com/?p=25#comment-1199
[71]: /?p=25&replytocom=1199#respond
[72]: http://www.pagetable.com/?p=25#comment-1200
[73]: /?p=25&replytocom=1200#respond
[74]: http://www.pagetable.com/?p=25#comment-1201
[75]: /?p=25&replytocom=1201#respond
[76]: http://www.pagetable.com/?p=25#comment-1202
[77]: /?p=25&replytocom=1202#respond
[78]: http://www.pagetable.com/?p=25#comment-1203
[79]: /?p=25&replytocom=1203#respond
[80]: http://l4os.ru
[81]: http://www.pagetable.com/?p=25#comment-1204
[82]: /?p=25&replytocom=1204#respond
[83]: http://chaoxifer.egloos.com
[84]: http://www.pagetable.com/?p=25#comment-1205
[85]: /?p=25&replytocom=1205#respond
[86]: http://www.pagetable.com/?p=25#comment-1207
[87]: /?p=25&replytocom=1207#respond
[88]: /?p=25#respond
[89]: http://www.pagetable.com/?p=940
[90]: http://www.pagetable.com/?p=926
[91]: http://www.pagetable.com/?p=922
[92]: http://www.pagetable.com/?p=919
[93]: http://www.pagetable.com/?p=904
[94]: http://debugmo.de
[95]: http://www.alex-ionescu.com
[96]: http://virtuallyfun.superglobalmegacorp.com
[97]: http://www.os2museum.com/wp/
[98]: http://www.pagetable.com/?cat=2 "View all posts filed under 6502"
[99]: http://www.pagetable.com/?cat=3 "View all posts filed under archeology"
[100]: http://www.pagetable.com/?cat=4 "View all posts filed under default"
[101]: http://www.pagetable.com/?cat=5 "View all posts filed under digital video"
[102]: http://www.pagetable.com/?cat=6 "View all posts filed under hacks"
[103]: http://www.pagetable.com/?cat=7 "View all posts filed under literature"
[104]: http://www.pagetable.com/?cat=8 "View all posts filed under puzzle"
[105]: http://www.pagetable.com/?cat=9 "View all posts filed under SCUMM"
[106]: http://www.pagetable.com/?cat=10 "View all posts filed under security"
[107]: http://www.pagetable.com/?cat=11 "View all posts filed under tricks"
[108]: http://www.pagetable.com/?cat=12 "View all posts filed under trivia"
[109]: http://www.pagetable.com/?cat=1 "View all posts filed under Uncategorized"
[110]: http://www.pagetable.com/?cat=13 "View all posts filed under whines"
[111]: http://www.pagetable.com/wp-login.php
[112]: http://www.pagetable.com/?feed=rss2 "Syndicate this site using RSS 2.0"
[113]: http://www.pagetable.com/?feed=comments-rss2 "The latest comments to all posts in RSS"
[114]: http://wordpress.org/ "Powered by WordPress, state-of-the-art semantic personal publishing platform."
[115]: http://wordpress.org/ "Semantic Personal Publishing Platform"
[*HTML]: HyperText Markup Language
[*RSS]: Really Simple Syndication