The Inquirer-Home

Intel Vanderpoo: More roses, roses

Part III Sic quod erat misunderstandum
Sun Feb 27 2005, 15:05
THERE HAS GOT TO BE a better way, and that is where Intel came up with Vanderpool. It aims to plug as many of these 'virtualization holes' as possible with the least amount of pain to the programmer. The solution is VT-x for x86 and VT-i for Itanium, they introduce a new mode for each respective CPU. For now, I will ignore VT-i, but I am told it functions in much the same way as VT-x.

The new mode is called VMX, and the Virtual Machine Monitor (VMM) runs in it. It sits at a privilege level below ring 0, so you can think of it as a ring -1, or possibly a method of running that steps sideways from rings. The hosted OS and all programs run in VMX mode, while the VMM runs in VMX root mode.

Any OS running in VMX mode has all the things an OS running alone on a normal non-VT system has. It is in ring 0, has access to everything it normally does, and is blissfully unaware of anything running beside it. When situations warrant, the CPU drops into to VMX root mode, and the VMM can switch to another OS running in another VMX instance. Collectively, these changes are called VM entry and VM exit.

The magic of VT comes into play when you think about how the entry and exit from VMX to VMX root and back again is handled. There is not only the problem of triggering it, you have to remember that as far as the hosted OS is concerned, it is alone in it's own world. You must save the entire state of that virtual universe and reload it when you come back. While there is a lot to do in VT, it was designed for the task, so it is a fairly straightforward and painless process.

Since each OS instance is running in the 'right' place, and is not being watched like a misbehaving child, the four big problems from before go away. The associated workarounds go away also, as does the overhead of looking out for them. This can speed things up considerably, but it is not free, simply much less costly.

To initiate a new hosted OS, you need to set aside a 4KB memory block and pass it to a VMPTLRD command. This chunk becomes the place where all of the states and important bits of the OS instance is stored when it is not active. This area stays alive as long as the OS instance does, basically until a VMCLEAR is run on it. This sets up the virtual machine instance.

If you want to pass control to the virtual machine, you run either of two commands which enter VMX non-root, or simply VMX mode. These commands, collectively referred to as a VM-Entry, are VMLAUNCH and VMRESUME. It doesn't take a genius to figure out the difference between these two. VMRESUME simply loads up the processor state from the guest state area, that 4K block that you initialized earlier, and passes control on to the host OS. The VMLAUNCH also does that, but also sets the Virtual Machine Control Structure (VMCS) to 'launched'. This involves some behind the scenes bookkeeping intended only for setting up the VM, and since these things cost time, they are avoided in subsequent entries by using VMRESUME.

From this point on, the host OS goes about its merry way, running as it would, blissfully unaware of the other things that may or may not be going on around it. As was planned, it is in its own world, and runs at full speed, or pretty close thereto. Life for this OS is good. The only problem is how do you break out of this blissful state and shunt the OS off to a corner so the other OSes running on the box can actually run? That is where the complex parts of VT come into play, specifically several bitmaps in the VMCS.

These bitmaps are several 32-bit fields where each bit flags a certain event. If that event is triggered, and the relevant bit is set, the CPU triggers a VM-Exit, and passes control back to the VMM running in VMX root mode. The VMM does whatever it wants, and then passes a VMRESUME to either the next hosted OS, or the one it just left. This OS runs on its merry way in blissful ignorance until it triggers another VM-Exit. Rinse and repeat, thousands of times a second.

What are these triggers? They are pin-based, processor based, exceptions and page fault based events, all of which can trigger a VM-Exit. Part of the beauty of VT is the flexibility of the system, you can have it swap tasks on just about as granular level as you would like. A good analogy is setting a breakpoint in a debugger, you can set it on dozens of events, or none, take you pick.

Pin-based events do just about what they sound like they would do, they trigger an exit on either an external interrupt, or a Non-Maskable Interrupt. Theoretically, with this level of control, you could put a big red button on the side of your computer, and it would switch to a different hosted OS every time you pressed the button. You could do this tens of times a minute, and it would be pretty much an utter waste of time, effort and engineering resources. It would make a really cool parlor trick though, I see an IDF demo in the making.

The next class of exit triggers are processor based. If you set any bit in this field, when the corresponding processor state is reached, it triggers a VM-Exit. While most of the instructions need to be set, there are a few that unconditionally cause a VM-Exit. This is a very granular level of control over the VM, allowing you to make the entry and exit happen just about as often as you need.

The exception bitmap is simply a 32-bit field with one bit for each IA-32 exception. If the bit is set, and the exception is thrown, it causes a VM-Exit. If the bit is clear, the hosted OS goes about it's merry way the way it always does. This is a very low overhead way to drop out of VMX mode into VMX root mode.

Lastly we have the page fault based exits. These are very much like the exception exits but have two 32-bit fields to control the behavior. The bits in these fields map to each of the possible page fault error codes, so you can again pick and choose what you want to exit on. Once again, granular and low overhead.

In a nutshell, VT works by making a mode more privileged than the traditional highest privilege state, ring 0. Any hosted OS can run in the old structure unchanged and unaware of the control programs running above it. When certain user set triggers are encountered, control is passed to the VMM running in the higher VMX root state. Because it is a passively triggered event, not something that is an actively watched for event, overhead is vastly minimized.

VT provides a fairly easy to set up and take down environment that provides far greater stability than the old VM model. If you need to run virtualized OSes, there is no good reason not to do it on a Vanderpool capable CPU, software VMs are suddenly vastly less attractive in all regards.

It may be good, but you have to remember, it is not free. Each entry means loading up a portion of the 4K block of memory, and each exit up to a 4K write. This may seem excessive, but compared to the old way, it is amazingly speedy. Intel would not say specifically how much of the 4K block needs to be saved, but strongly intoned that it is not a large percentage .The 4K block came about because it is a 'nice' computer number, and a lot of it is not currently used, but may be with future enhancements to VT.

This is part three of four parts. Part the first can be found here. Part 2 can be found here. Part 4 cannot be found until Monday - a holiday in Old Taipei .


Share this:

blog comments powered by Disqus
Subscribe to INQ newsletters

Sign up for INQbot – a weekly roundup of the best from the INQ

INQ Poll

Heartbleed bug discovered in OpenSSL

Have you reacted to Heartbleed?