Xen HVM Guest configuration for Windows Vista

I will try to sumarize the step needed to run a HVM Vista machine on a Ubuntu 9.04 distribution, 2.6.31.5 Dom0 paravirt-ops kernel under Xen 3.4.1. I will note the few specifics pitfalls, expecially with the network under a Point to Point Ethernet Link (a Usb Modem), and some errors I encountered along the way, in the hope that this will help you save some time and better understand the Xen requirements.

  • Hardware Virtualization Processor requirement

A HVM guest needs support from the processor, in order for the virtualization software to catch unallowed machine instructions in guest space, as well as to redirect I/O access from the devices firmware to the correct guest address. through configuring some sort of IOMMU by the hypervisor software. To check if the virtualization support exists or has been activated on your CPUs do the following, while in Dom0 Ubuntu distribution:

# for a Intel CPU
matmih@Express2:~$ grep vmx /proc/cpuinfo
# or, for a AMD CPU
matmih@Express2:~$ grep svm /proc/cpuinfo

If nothing is shown then most likely the virtualization features have not been enabled. So reboot, enter BIOS (F8 key on my laptop after initial boot image) and look for the options to enable virtualization. Note*: On my Intel T9600 dual core processor it was something like Intel Virtualization Support option in my Processor Preferences tab, that I had to enable and save, and it was disabled by default, but the option name may change depending on the BIOS type and version. If you are certain that your processor has hardware virtualization support and your BIOS does not show any options, you may want to upgrade your BIOS firmware, as Dom0 and Xen will not be aware of this feature if it cannot read it from the BIOS software, like the Dom0 kernel exposes though the /proc/cpuinfo filesystem.

  • Xen HVM configuration

Xen install has already provided 2 templates in /etc/xen directory, xmexample.hvm for running a HVM guest under a userspace Qemu emulator, and xmexample.hvm-stubdom for the guest I/O emulation to take place in a dedicate Qemu domain/guest. You can check out the comments for the options in the template files but I will comment on my selection. For my initial needs I create a HVM guest for a Windows Vista system, running under a userspace Qemu emulator, with 1 virtual CPU, 2 GB RAM, 1 Ethernet Link, USB enbled with 10GB of hard disk space, backed up by a dom0 ext3 file. Though I placed my xen_vista.cfg config file in ~/Work/XenImages directory, if you want Xen to automatically launch the domain when Dom0 boots, you can place the config or create a symbolic link in /etc/xen/auto directory. This auto directory Xend configuration option as well as other common domain options, expecially related to transitions between states, migrating and so on can be found in /etc/sysconfig/xendomains file. In the following sections I will describe and comment the format of my xen_vista.cfg HVM guest domain:

a) Initial Domain Builder

name=XenVista
import os, re
arch_libdir = 'lib'
arch = os.uname()[4]
if os.uname()[0] == 'Linux' and re.search('64', arch):
   arch_libdir = 'lib64'
kernel = "/usr/lib/xen/boot/hvmloader"
builder='hvm'

The inital options check what is the current library director(arch_libdir variable) on your Linux Dom0 distribution lib or lib64 to be used later on for getting the correct I/O device model. It also specifies the firmware the domain builder will use (the kernel option – you can find the sources for the HVM Qemu firmware in xen3.4.1/tools/firmware/hvmloader) as well which domain builder function the python utility xm tool to use for creating this guest domain – a HVM guest uses ‘hvm‘ entry specified by the builder parameter. You can check out the xen3.4.1/tools/python/xen/xm/create.py script for the configure_hvm function to see the creation flow: it will basically look for certain options in the config file (such as apic or acpi), it will copy the firmware in the guest domain space and it will launch Qemu emulation for the guest domain using the below specified config variables.

b) Guest layout – cpu, ram, controllers

vcpus = 1
cpus  = "1"
memory = 2048
pae  = 0
acpi = 1
apic = 1

This will configure the firmware, through Xen hypercalls to let the guest see one virtual cpu, running on the first physical cpu (out of the available 2 since we are running on a Intel dual core processor), with 2GB of RAM (we let Xen choose the default size for the shadow_memory parameter required for the Xen hypervisor to keep in non swapped memory internal information for this domain like cached and active guest TLB tables, with no page address extension support (pae option – required for our 32 bit Vista kernel version), with the Advanced Control and Power Interface BIOS functions set and also with the Advanced Programmable Interrupt Controller so the guest will see something that has more capabilities than the default 8259 controller. This settings will reflect the internal configuration of the guest firmware/BIOS that will be configured by Xen through the means of hypercalls – instead of probbing the hardware the firmware will make hypercalls to Xen to check its capabilities.

c) Disk and Ethernet settings

vif  = [ 'type=ioemu, ip=192.168.1.2' ]
disk = [ 'tap:aio:/home/matmih/Work/XenImages/xen_vista.img,hda,w', 'phy:/dev/scd0,hdc:cdrom,r' ]
boot = "dc"

The first entry creates a virtual Ethernet interface that the guest domain will have access. The type=ioemu parameter specifies to Xend tool that this interface is not a netfront paravirtualized ethernet driver but will be emulated by Qemu itself. This means that when Qemu is started by Xend it’s commnad line will contain “-net nic,vlan=0 -net tap,vlan=0,ifname=tap1.0″ meaning that a tap virtual Ethernet adapter will be created in the Dom0 Linux and will be liked by the Qemu Emulation to the actual guest ethernet interface. In order to configure the newly created tap1.0 adapter /etc/xen/qemu-ifup script will be called. You can always check the /var/log/xen/xend.log and /var/log/xen/qemu-dm-{domain_name}.log to see the actual command passed to Qemu. The ip parameter is the ip address that the guest system will use for it’s static network configuration, and will by used localy, in Dom0 Linux, to set up routes to the tap1.0 interface, depending on the network type chosen for the virtual machines (I will elaborate more in the networking section

Now when it comes to the disk drives that the guest sees things become more complicated. Qemu emulates basic IDE hardware that the guest Vista will configure. What we can configure in xen_vista.cfg domain file is what is the backend support for what Qemu is emulated. It can be a physical mounted drive (parameter phy), and Qemu emulator binaries will forward the emulated disk I/O the requests to the native device drivers of the Dom0 Linux, or it can be a file-backed VBD (Virtual Block Device). In order for Qemu to use this storage type, that has the advantage of being more flexible (can even be a network file), but can have performance penalities for intersive I/O requests, the must be a driver loaded in the Dom0 Kernel that supports this functionality. At the moment there are 2 different ways this can be done:

i) Loopback driver supporting raw files

file:/home/matmih/Work/XenImages/xen_vista.img,hda,w'

Has the advantage of the loopback driver being precompiled in most kernels, but will have a default number of suppported loopdevices of 8 (mounted in dev/loop*) and it is known to buffer a lot and be quite slow for heavy I/O workloads. You may also find the below commands usefull:

# to manually mount the disk image in Dom0 Linux
mount -o loop /home/matmih/Work/XenImages/xen_vista.img /mnt/VistaHda
# to create an additional loop device
mknod -m660 /dev/loopNew b 7 8

ii) Blktap driver supporting raw files or qcow Qemu images

tap:aio:/home/matmih/Work/XenImages/xen_vista.img,hda,w'   # for raw images
tap:qcow:/home/matmih/Work/XenImages/xen_vista.qcow,hda,w' # for Qemu images

Even though it has to be expecially ported to a Xen kernel and recompiled it offers higher performance than the loopback driver, is more scalable and in strongly recomended by the Xen team. You can always use this driver to mount the image to your Dom0 Linux filesystem as well:

xm block-attach 0 tap:aio:/home/matmih/Work/XenImages/xen_vista.img /dev/blkVista   # to create a blktap device node for a raw image
xm block-attach 0 tap:qcow:/home/matmih/Work/XenImages/xen_vista.qcow /dev/blkVista # to create a blktap device node for a qcow image
# to actual mount the device to a filesystem location
mount /dev/blkVista /mnt/VistaHda

d) Device emulation

device_model = '/usr/' + arch_libdir + '/xen/bin/qemu-dm'
sdl = 1
vnc = 0
opengl = 0
stdvga = 0
monitor = 1
usb = 1
usbtablet = 1
serial = 'pty'

The last part of the domain configuration file contains the specific device options that are passed to Qemu for I/O emulation. The device_model options indicates the Qemu process binary to be launched. In order to have the ability to talk to Qemu itself we we’ll launch the Vista domain in a SDL window (libsdl package required), instead of a VNC client.Both SDL and VNC work very well in terms of displaying Windows in a graphical console, although VNC has some distinct advantages over SDL. Firstly, VNC provides greater flexibility than SDL in terms of remote access to the domainU graphical console. With VNC it is possible to connect to the graphical console from other systems, either on the local network or even over the internet. Secondly, when you close a VNC viewer window the guest domain continues to run allowing you to simply reconnect to carry on where you left off. Closing an SDL window, however, immediately terminates the guest domainU system resulting in possible data loss. In the SDL window in order to switch from the Vista domain view to the Qemu monitor console (monitor=1 option) use Ctrl + Alt + 2, and you can issue the commands best describe in the Qemu User Manual. We also chose to emulate a cirrus logic device (stdvga=0), instead of a vesa one, and did not enable any graphical accelaration in the SDL window (opengl=0) because I am missing the flgrx module described in another post, correponding to the ATI drivers for my video card. Alse we ask Qemu to emulate a UHCI controller (usb=1) to the guest Domain in order to have the ability to add new UsB devices, and we also emulate an USB mouse for guest user input requirements (usbtablet=1), a PS2 keyboard being already emulated by default.

  • Virtual machines networking

In order to have a working Ethernet interface on a Xen powered virtual machine one must first customize how the network will look like between the virtual machines and the dom0 priviledged host, and how the outgoing network interface(s) will be used. Xen ca be configured to allow its virtual machines to reuse an existing network and the IP addresses (bridged networking) or to create a new private network for internal guest use (nat and routed networking). The 3 types of network configuration templates have already pre-customized scripts in /etc/xen/scripts directory and either configuration can be selected from the /etc/xen/xend-config.sxp config file prior to starting xend utility. I will briefly describe the 3 configurations and will describe the problems I ecountered and overcame for configuring HVM’s network to work with my Point-to-Point Ethernet Link for my USB Modem. One thing to mention is that the network-script entry in the xend configuration file is called once when xend is started to create the necessary bridges and configure the needed forwarding rules for the private virtual network, if any, and the vif-script line calls the script each time a new virtual machine is created to configure the newly added network interface in priviledged guest that coresponds to the virtual machine interface. One thing to note is that when type=ioemu is specified in the vif configuration the Qemu emulator will bring up a tap virtual adapter connected to the emulated guest domain Ethernet interface and /etc/xen/qemu-ifup script will be called to configure the tap interface (the first parameter will be the newly created tap name and the second one the bridge option specified in the vif domain config). A very good description of the networking explained below can also be found at http://wiki.kartbuilding.net/index.php/Xen_Networking.

a) Bridge Networking

This will allow the Virtual Machines to be in the same network as the bridged outgoing interface, so all guests will have an IP address in the ip and subnet mask of the original network interface. This will require bridge-utils package as a new Ethernet bridge is created for theoriginal interface and the virtual interfaces created and added to each virtual machine will be added as well to the bridge. The default bridge name is xenbr0 and the default outgoing interface is eth0 but the name can be overwritten from the xend-config.sxp:

# /etc/xen/xend-config.sxp
(network-script network-bridge)
(vif-script vif-bridge)
# ~/Work/XenImages/xen-vista.cfg
vif  = [ 'ip=192.168.0.6' ]

Because the network-bridge script tries to take down (ifdown) my ppp0 interface in order to create a temporary bridge and rename that bridge to the default xenbr0 name, the initial script failed and I was left only with an unconfigured tmpbridge interface (as in ifconfig command display), instead of the default xenbr0. Also the following message appeared when I tried to start xend:

matmih@Express2:~/Work/XenImages$ sudo xend start
ifdown: interface ppp0 not configured
RTNETLINK answers: Device or resource busy

To correct this error I would have to manually edit the network-bridge script, which seemed to be a little too complicated and was not worth the time for my initial needs. So basically network-bridge script is run when xend starts and will create a xenbr0 bridge adding the outgoing interface to that bridge. The vif-bridge is run when a new paravirtualied backend net interface is added to Dom0 (vifx.x name) and will be configured with the provided ip (must be in the same network as the outgoing interface network) and will be added to xenbr0 bridge as well. This vifx.x backend is used for the paravirtualized frontend found the the guest domain. For HVM domains the newly created tapx.x virtual net adapter will be added to the bridge in qemu-ifup script and there is no need for a vifx.x interface to be created.

b) Routed Networking with NAT

This allows the Virtual Machines to be in a private network with and comunicate with each other through Dom0 exposed private network for the configured backend vifx.x interfaces. Trafic from the vifx.x iterfaces with be NATed to the outgoing interface, meaning that the virtual machines ip address will be hidden with the outgoing interface ip address.

# /etc/xen/xend-config.sxp
(network-script network-nat)
(vif-script vif-nat)
# ~/Work/XenImages/xen-vista.cfg
vif  = [ 'ip=10.0.0.2' ]

The first script will configure a NAT rule for all ip addresses that go to the outgoing interface. To see the actual NAT rule do the following:

iptables -nvL -t nat ; route -n

The vif-nat script will add a random private network ip address to the vifx.x interface. Even if the ip address differs from the assigned ip address in the domain config, you should use the one in the domain file when configuring the guest domain interface, as the routes for 10.0.0.2 ip address are the one being configured in Dom0 routing tables. One thing to check is that forwarding has been enabled in Dom0 Linux, so uncomment the following line in your sysctl file, if any:

# /etc/sysctl.conf
net.ipv4.conf.default.forwarding=1
root@Express2$~> sysctl -p

c) Two-way Routed Network

As in the NAT configuration the Virtual Machines are in a private network and can access each other of the outgoing network. The only difference is that there ip is no lomger NATed and hidden with the Dom0 ip address. So any outgoing packet of the guests will have the private network address, and any hosts on the public outgoing network can access the Virtual Machines with the private ip by adding a routing rule to the private guest network on the default gateway of the public network. This is done in vif-route script by configuring the new vifx.x network interface backend in Dom0 for a new guest to an ip address in the public network range and configuring a route for the actual 10.0.0.2 private guests network to go on vifx.x interface.

# /etc/xen/xend-config.sxp
(network-script network-route)
(vif-script vif-route)
# ~/Work/XenImages/xen-vista.cfg
vif  = [ 'ip=10.0.0.2' ]

d) Custom solution for HVM guests

Unfortunatly scenario a) does not work for my Point-to-Point USB modem connection and b) and c) scripts are broken for a Qemu HVM guests due to the fact the Qemu itselfs brings up a new tap virtual adapter interface (tapx.x) to emulate the guest’s Ethernet and the scripts bring a a vifx.x backend to work with a paravirtualized net fronted driver found in a paravirtualized guests. To work around this issue and keep the NAT’ed configuration for xend tool, in order to be able to add additional paravirtualized guests I will manually configure my tapx.x interface to work with my USB modem, making the vifx.x backend unecessary. First let’s see my current network configuration:

matmih@Express2:~$ ifconfig
lo        Link encap:Local Loopback
 inet addr:127.0.0.1  Mask:255.0.0.0
 inet6 addr: ::1/128 Scope:Host
 UP LOOPBACK RUNNING  MTU:16436  Metric:1
 RX packets:0 errors:0 dropped:0 overruns:0 frame:0
 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
 collisions:0 txqueuelen:0
 RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
ppp0      Link encap:Point-to-Point Protocol
 inet addr:10.81.110.227  P-t-P:10.64.64.64  Mask:255.255.255.255
 UP POINTOPOINT RUNNING NOARP MULTICAST  MTU:1500  Metric:1
 RX packets:14 errors:0 dropped:0 overruns:0 frame:0
 TX packets:15 errors:0 dropped:0 overruns:0 carrier:0
 collisions:0 txqueuelen:3
matmih@Express2:~$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.64.64.64     0.0.0.0         255.255.255.255 UH    0      0        0 ppp0
169.254.0.0     0.0.0.0         255.255.0.0     U     1000   0        0 ppp0
0.0.0.0         10.64.64.64     0.0.0.0         UG    0      0        0 ppp0

We will keep the NAT’ed configuration described in b) with the following changes:

# /etc/xen/xend-config.sxp
(network-script 'network-nat netdev=ppp0')
(vif-script vif-nat)
# ~/Work/XenImages/xen-vista.cfg
vif  = [ 'type=ioemu, bridge=192.168.1.254/24' ]

The xend-config.sxp file changes the default eth0 interface to my ppp0 USB Modem Point-To-Point Interface. This will be used by the /etc/xen/scripts/network-nat script to add the following NAT rule for all ip addresses that go through that interface:

 iptables -t nat -A POSTROUTING -o ppp0 -j MASQUERADE

Next one must modify the /etc/xen/qemu-ifup script that is called for the newly created tapx.x virtual tap adapter created by Qemu because of the above type=ioemu vif option:

#!/bin/sh
# $1 - tapx.x name
# $2 - bridge domain config vif option (vif = ['type=ioemu, bridge=$2'])
echo 'config qemu network with xen interface ' $*
# configure the tapx.x interface to have the ip provided in the bridge option
ip link set "$1" up arp on
ip addr add $2 dev "$1"
# add a route for the Qemu private network to go to the tapx.x interface
ip_only=echo $2 | awk -F/ '{print $1}'
route add $2 dev $1 src $ip_only
# make the tapx.x interface rewrite the MAC address for the forwarded virtual machines packages
# this will make tapx.x interface act as a gateway
echo 1 >/proc/sys/net/ipv4/conf/$1/proxy_arp
# add the iptables rules, in case firewall is enabled, to allow all connection in/out of the tapx.x interface
iptables -I FORWARD -m physdev --physdev-in "$1" -j ACCEPT 2>/dev/null
iptables -I FORWARD -m state --state RELATED,ESTABLISHED -m physdev --physdev-out "$1" -j ACCEPT 2>/dev/null

The ideea behind the settings is that the tapx.x interface acts as a gateway for the Virtual Machine network and all packets that go to the outgoing network interface, ppp0, will be NAT’ed. This means that the guest configuration can have any ip address in the private network range and must have the tapx.x interface address set up as a gateway:

# the configuration of my HVM Vista Guest:
C:\Users\VistaOnXen>ipconfig
Windows IP Configuration
Ethernet Adapter Local Area Connection:
ip         192.168.1.2
netmask    255.255.255.0
gateway    192.168.1.254
DNS        192.230.161.3, 193.230.161.4
# the configuration of my Ubuntu Dom0 priviledged guest after the vista guest booted
matmih@Express2:~$ cat /etc/resolv.conf
# Generated by NetworkManager
nameserver 193.230.161.3
nameserver 193.230.161.4
matmih@Express2:~/Work/XenImages$ ifconfig
ppp0      Link encap:Point-to-Point Protocol
 inet addr:10.81.110.227  P-t-P:10.64.64.64  Mask:255.255.255.255
 UP POINTOPOINT RUNNING NOARP MULTICAST  MTU:1500  Metric:1
 RX packets:8440 errors:0 dropped:0 overruns:0 frame:0
 TX packets:8180 errors:0 dropped:0 overruns:0 carrier:0
 collisions:0 txqueuelen:3
 RX bytes:8675386 (8.6 MB)  TX bytes:1711305 (1.7 MB)
tap1.0    Link encap:Ethernet  HWaddr f2:be:7d:4d:a1:65
 inet addr:192.168.1.254  Bcast:0.0.0.0  Mask:255.255.255.0
 inet6 addr: fe80::f0be:7dff:fe4d:a165/64 Scope:Link
 UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
 RX packets:217 errors:0 dropped:0 overruns:0 frame:0
 TX packets:34 errors:0 dropped:0 overruns:0 carrier:0
 collisions:0 txqueuelen:500
 RX bytes:41115 (41.1 KB)  TX bytes:4905 (4.9 KB)
matmih@Express2:~$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.64.64.64     0.0.0.0         255.255.255.255 UH    0      0        0 ppp0
192.168.1.0     0.0.0.0         255.255.255.0   U     0      0        0 tap2.0
169.254.0.0     0.0.0.0         255.255.0.0     U     1000   0        0 ppp0
0.0.0.0         10.64.64.64     0.0.0.0         UG    0      0        0 ppp0
matmih@Express2:~$ sudo iptables -nvL
Chain FORWARD (policy ACCEPT 20 packets, 1569 bytes)
 pkts bytes target     prot opt in     out     source               destination
 0     0    ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0           PHYSDEV match --physdev-in tap1.0 
 0     0    ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0           state RELATED,ESTABLISHED PHYSDEV match --physdev-out tap1.0 
matmih@Express2:~$ sudo iptables -nvL -t nat
Chain POSTROUTING (policy ACCEPT 7 packets, 1046 bytes)
 pkts bytes target     prot opt in     out     source               destination
 457 27563 MASQUERADE  all  --  *      ppp0    0.0.0.0/0            0.0.0.0/0

One thing to note that some other entries may be for the vif1.0 entry but this interface is not realy used. I will be brought down and the iptables rules for it removed when the domain is destroyed, but the tap1.0 iptables rules will have to be removed by hand.

  • Xen Troubleshooting

In this last chapter I will summarize some errors I encountered along the way and the cause and solutions I found. First one thing to know about are the log files that must be checked when things go wrong:

/var/log/qemu-dm-{DomainName}.log   - contains the Qemu emulator logs, including errors from /etc/xen/qemu-ifup script
/var/log/xend.long                  - xend python tool logs, xm logs
/var/log/syslog                     - usually different kernel messages + errors from the /etc/xen/scripts/* scripts
sudo xm dmesg | grep VMX            - the xm errors for your HVM deployment
udevadm monitor                     - will output the UDEV events received by the udevd daemon

1) Graphical issues

i) First if you intend to use opengl = 1 option in your HVM domain configuration file for your SDL display you must make sure that opengl is corectly configured on your Dom0 system (mine was not) :

matmih@Express2:~/Work/XenImages$ glxgears
X Error of failed request: BadRequest (invalid request code or no such operation)
Major opcode of failed request: 135 (GLX)
Minor opcode of failed request: 19 (X_GLXQueryServerString)
Serial number of failed request: 14
Current serial number in output stream: 14

ii) The domain could not be launched if the xm script was run as root because it could not open an SDL window. This was probably related to the fact that there was no X server configured for user root, so you should always run as a normal user with sudo rights

matmih@Express2:~/Work/XenImages$ sudo xm create xen_vista.cfg

2) ACPI/APIC issues

Initially I tried to install Vista using 2 virtual CPUs. I all ran OK untill, some minutes in the system CD installation I got the following error:

The only solution I could find was to limit to 1 VCPU running on the first physical CPU (vcpus = 1, cpus = “1″).

3) I/O Devices issues

i) Loopback problems – when using the loopback driver to set up a virtual disk backed by a filesystem file (using the domain’s configuration vbd:file:/ option) I got the following error:

Error: Device 768 (vbd) could not be connected. Failed to find an unused loop device

To correct the issue you can either create additional loopback devices or to modify the default number of loopbacks and restart the system:

mknod -m660 /dev/loopXXX b 7 8
# add the below line in /etc/modprobe.d/local-loop - we'll only work for the loopback driver not being compiled in the kernel
options loop max-loop=64

ii) Networking issues – one of the most common error when trying to create a HVM domain is the following:

Error: Device 0 (vif) could not be connected. Hotplug scripts not working
/sbin/hotplug need to be on your computer

The error can be caused by many things. It is a xend python tool error that after bringing the vif paravirtualized backend for the virtual domain, it’s hotplug-status is not updated. Usually the Xen Store entry is updated by the Xen’s virtual interface configuration scripts (vif-bridge, vif-nat, vif-routes) only if there was no error detected, set by the success python function. But apparently those scripts were not even called when I tried to create a HVM guest, or any guest for that matter. This scripts should be called by udev configuration, discussed in an earlier post.

To try to debug what was happening, apart from viewing xen and qemu logs, one thing that one must do is check that all devices are brought up correctly. To do that I used udev logs, for example this is a sample of a correct log, or at least one that works, for booting a HVM Vista guest, with bridged networking:

root@Express2:/var/log/xen# udevadm monitor
 monitor will print the received events for:
 UDEV - the event which udev sends out after rule processing
 KERNEL - the kernel uevent
 KERNEL[1258317695.583288] add /devices/vbd-14-768 (xen-backend)
 KERNEL[1258317695.603290] add /devices/vbd-14-5632 (xen-backend)
 KERNEL[1258317695.625653] add /devices/vif-14-0 (xen-backend)
 UDEV [1258317695.626504] add /devices/vif-14-0 (xen-backend)
 KERNEL[1258317695.724992] add /devices/virtual/net/tap14.0 (net)
 UDEV [1258317695.764755] add /devices/virtual/net/tap14.0 (net)
 KERNEL[1258317695.882795] add /devices/console-14-0 (xen-backend)
 UDEV [1258317695.883452] add /devices/console-14-0 (xen-backend)

In my case I could not see the vif backend added to the system, the add /devices/vif entry. Finally I discovered that the Network backend driver was not compiled in the kernel, so you must take care with the default settings of the paravirt-ops kernel distribution default configuration:

matmih@Express2:~/Work/linux-2.6-xen$ make menuconfig
# and enable the following
# Device Drivers --->
#    [*] Backend driver support --->
#           [*] Block-device backend driver
#           [*] Xen backend network device
and recompile and install the kernel as in my previous po

4) Xend debugging

One other thing that you can do, before really searching on Xen’s wiki page is to dump the Xen Store database to see which devices has been added to it. Most of the xend’s programming logic is based on pooling entries from this Xen Store. Usually, to check if a device has been successfully added to guest machine layout, xend is looking for the hotplug-status information associated with adding a device. For example when successfully booting a Vista HVM guest I can see the vif backend added to the Domain 0 Ethernet devices in Xen Store:

/local/domain/0/backend/tap/1/768/hotplug-status=connected

In order to dump the Xen Store information I am using the following script:

#!/bin/bash
function dumpkey() {
 local param=${1}
 local key
 local result
 result=$(xenstore-list ${param})
 if [ "${result}" != "" ] ; then
 for key in ${result} ; do dumpkey ${param}/${key} ; done
 else
 echo -n ${param}'='
 xenstore-read ${param}
 fi
}
for key in /vm /local/domain /tool ; do dumpkey ${key} ; done

I hope this article will speed up your guest HVM deploying. You can find more information, and probably more accurately on Xen page. In the following blogs I will present how this HVM guest setup has helped me with what I wanted to do for my master paper.

By Mihai Matei on November 23, 2009 | Xen | 2 comments

HVM Xen Architecture

  • Xen Hardware Virtual Machine Guests

Until virtualization support was introduced into recent hardware, the only means to allow concurent systems to be using the same virtual machine was only through methods such as paravirtualization or even binary rewriting. Both of this involved a deep understanding of the architecture of the virtualized guest system, and in the case of PV guests changes were added to the original source code, meaning that the operating system kernel had to be rebuilt. Xen supports guest domains running unmodified guest operating systems, like closed source Windows systems, using virtualization extensions available on recent processors, such as the Intel Virtualization Extension (Intel-VT) or the AMD extension (AMD-V).

The main difference between a Paravirtualized Guest and a HVM Guest it’s in the way the are accesing hardware resources, expecially I/O devices like disk access. A PV guest’s paravirtualized kernel will make hypercalls (in a very similar fashion to a basic system call), handled by the Xen hypervisor handler (in ring 0) which forwards the requests to the real device drivers in Linux Dom0 kernel, that makes the actual I/O request to the hardware. A higher level abstraction of this mechanism is implemented as event channels and shared memory buffers between Guest PV and Dom0 in order to communicate requests and data. The shared memory services are offerred by the hypervisor software that handles all of the memory layout of the machine, including the mappings from guest physical/virtual to true physical addresses through the management of the TLB entries in the MMU.

Paravirtualized Guest Achitecture

Paravirtualized Guest Achitecture

When it comes to fully virtualized guests things get more trikier. One thing that you should keep in mind on on x86 and ia64 architecture that all exceptions and external interrupts are configured by Xen to go to the hypervisor’s software in ring 0. One of the basic tasks of a minialist hypervisor is to separate the guests by keeping all the memory layout separated to prevent unallowed access between virtual machines, as a normal operating system does with userspace processes, and to emulate the instructions that the virtualized guest running in unpriviledged levels does not have access to. For example an guest operating system cannot use a hypercall any more to set up a required TLB mapping as it is usually done with the paravirtualized guests, instead it has to rely on the hypervisor to emulate it’s TLB mapping instruction. Let’s suppose a Windows Vista process accesses an unmapped address. Normally this page miss will trap in the Windows NT kernel and the MMU module for the architecture that is running will walk the PTE entries for that process and will use a ring 0 instruction to add the entry in the TLB from guest virtual (userspace address) to guest physical that the windows kernel knows to have, after if it has copied that memory into physical memory from swap if needed. But now the Windows kernel is running in a VM entry ring (Intel’s description), and the current virtual CPU that the hardware is emulating does not have access to the TLB mapping instruction, and this will cause a VM exit exception that will be handled by the hypervisor software. It will emulate the instructions by looking in it’s own PTE for that current guest (virtual machine) and add a TLB entry from the guest userspace original virtual address to the true machine physical address. There is a catch though: this is not the normal case and the Page Miss exception does not go directly to the Vista kernel. Instead it goes to the hypevisor software, which can add the correct TLB entry transparently if it is already aware of the translation or can it can inject a Page Miss exception to the Windows kernel in VM space, by jumping to the windows page miss handler and changing the current context to the one of the Virtual Machine.

The same emulation must be made for some I/O devices. Due to the fact that some hardware devices must be used by all the Virtual Machines running on the platform, the access to the actual I/O space must be multiplexed. In order to do this there has to be one central zone where all requests are received, and, in Xen’s case, this is done in Dom0 Linux. But as before for a HVM guest instruction that access this I/O space, like PCI addreses and ports, must be caught by the hypervisor software and emulated. In order that the hypervisor and the guest operating system to have a common knowledge of the virtual’s machines interrupt and device layout and I/O addresses a virtual firmware is added to the HVM guest at boot time, by copying the custom BIOS into the physical address space of the guest. The guest firmware (BIOS) provides the boot services and run-time services required by the OS in the HVM. This guest firmware does not see any real physical devices, it operates on the virtual devices provided by the device models. You can find the source code for the default firmware used by Xen in xen3.4.1/tools/firmware/hvmloder directory. As you can see from the source code things like the PCI bus, ACPI, virtual APIC and other things are set up using paravirtualized code (xen hypercalls) by the firmware BIOS code. In this way the hypervisor knows what are the addresses where the Windows kernel installed interrupt handlers, the physical guest addresses where the operating system has set up the emulated devices and so on. Then the hypervisor software can restrict the guest from accesing these physical addresses (in guest physical space) so it can trap and emulate the instruction using the virtual drivers it provides. So if, for example, the Windows Kernel block module wants to write some data to a physical drive, it will issue a DMA operation by writing at a certain PCI address where it has mapped that physical devices as read from the BIOS. Hypervisor traps this access and will decode the I/O request that the guest has issued. Now when it comes to emulating this things follow different paths. VmWare’s solution is to keep the actual device emulation into hypervisor software for performance considerations. Xen’s solution is to use the Qemu emulator expecially compiled for this, since one purpose of an userspace emulator is to handle I/O request from the target by decoding the current instruction the guest run that trapped into the emulator code.

Hardware Virtual Machine Guest Architecture

Hardware Virtual Machine Guest Architecture

Now one thing to note is that Qemu had already code for emulating basic I/O buses like ide hard drives, several network cards (PCI or ISA cards on the PC target), PCI UHCI USB controller for guest using USB devices, a Vesa video card for graphics and much more. This was done first by setting up a custom BIOS in the guest physical image/address space in order to trick the guest to use and install drivers for the emulated hardware, that Qemu knows, just like it does when it boots on that specific machine. So what needed to be done in order for Qemu to work with Xen was to add support for the interrupts that came from the hypervisor from the real devices to Qemu, either working in a process or in a custom I/O guest domain, and for Qemu to use the actual backend drivers in dom0 or in a domU, depending on it’s compiling options. You can see the new 2 custom targets added by Xen development team in xen-3.4.1\tools\ioemu-qemu-xen\i386-dm for the Qemu port to run in a Dom0 process, emulating the I/O for a single HVM guest domain, and in xen-3.4.1\tools\ioemu-qemu-xen\i386-stubdom for the port to run in a custom I/O guest domain that can be used to emulate more than one HVM guest. You can see both ports using paravirtualized code to talk to Xen through hypercalls. Also you may also take a look at xen-3.4.1\stubdom directory to see how the custom guest I/O domain  is set up, including custom Grub image for booting just Qemu using a micro kernel environment. Unfortunatly the HVM machine will only have a list of the basic hardware emulated by Qemu, like the Cirrus Logic GD5446 Video card or the VESA VBE virtual graphic card (option ‘-std-vga’) for higher resolutions, so no games on my HVM guest, at least not yet :).

By Mihai Matei on November 11, 2009 | Xen | 5 comments

Deploying Xen for a Ubuntu Dom0

Now after building the paravirtualized Linux kernel for Dom0 requirements I will talk about what needs to be done in order to compile Xen hypervisor, including needed tools, and the configuration requirements for booting a priviledged guest (Ubuntu 9.04 Dom0 Linux image) on top of the ring0 Xen hyperviser software.

  • Building Xen from sources

First we need to get the official release of xen sources from www.xen.org:

matmih@Express2:~/Work$ wget http://bits.xensource.com/oss-xen/release/3.4.1/xen-3.4.1.tar.gz
matmih@Express2:~/Work$ tar -xzvf /xen-3.4.1.tar.gz

Before compiling the hypervisor binaries and tools we must make sure that we have all the required dependency packages. An easy way to do this in Ubuntu, using apt, is to search for all the packages that start with xen, and then use build-dep apt option for those xen packages to install the dependencies:

matmih@Express2:~$ sudo apt-cache search xen
# now look for the packages that seem to be related to the Xen hypervisor and do the following:
matmih@Express2:~$ sudo apt-get build-dep xen-docs-3.3 xen-hypervisor-3.3 xen-shell xen-tools xen-utils-3.3
# a small list of required packages are listed below
matmih@Express2:~$ sudo apt-get install build-essential fakeroot crash kexec-tools makedumpfile git-core libncurses5 \
libncurses5-dev python python2.6-dev iproute bridge-utils libcurl-dev3 xserver-xorg-dev mercurial gitk uuid-dev gawk gettext texinfo bcc

Now we are ready to compile the Xen hypervisor, stubdoms and tools needed in Dom0 image:

# before compiling Xen make sure you set in xen3.4.1/Config.mk file:
# PYTHON = python
# PYTHON_PREFIX_ARG =
# Tuning Config.mk results Xen packages to be placed into usr/local/lib/python2.6/dist-packages due to Changeset 19594 in xen-3.4-testing.hg.
# Xen packages should go to usr/lib/python2.6/site-packages, which is not default location for python 2.6 on Ubuntu 9.04 ( vs F11 ),
# otherwise xend daemon will not start. You can always manually copy the python Xen libraries from the install location to the python's
# version correct directory if things still are not properly installed
matmih@Express2:~/Work/xen-3.4.1$ sudo make install-xen
matmih@Express2:~/Work/xen-3.4.1$ sudo make install-tool
matmih@Express2:~/Work/xen-3.4.1$ sudo make install-stubdom

The installed files and newly created directories in you local file system are also located in xen-3.4.1/dist. You can also use the dist/install.sh script to install dist directory content in your root filesystem.

  • Deploying Xen

Now you should have everything installed, the binary kernel images including Xen should be in your /boot directory as below:

matmih@Express2:~$ ls /boot
abi-2.6.28-11-generic     config-2.6.31.5          initrd-2.6-xen.img            System.map-2.6.18.8-xen       vmcoreinfo-2.6.28-11-generic  vmlinuz-2.6.31.5  xen-3.gz
config-2.6.18.8-xen       grub                     initrd.img-2.6.28-11-generic  System.map-2.6.28-11-generic  vmlinuz-2.6.18.8-xen          xen-3.4.1.gz      xen.gz
config-2.6.28-11-generic  initrd-2.6.18.8-xen.img  memtest86+.bin                System.map-2.6.31.5           vmlinuz-2.6.28-11-generic     xen-3.4.gz        xen-syms-3.4.1

You can notice the hypervisor binaries with the corresponding symbolic links (xen-3.4.1.gz), the original 2.6.28-11 kernel that came with Ubuntu 9.04, the 2.6.18.8-xen version that was build using make world” in Xen sources and finally the 2.6.31.5 corresponding to the paravirt_ops kernel version that I will use from now on. The hopefully last thing that you need to do before booting Xen is to add the correct entries to the Grub boot utility configuration file, so:

matmih@Express2:~$ sudo vim /boot/grub/menu.lst
# This boot entry contains the kernel image from Xen sources, ext3 root filesystem found on SATA drive partition /dev/sda5
title               Ubuntu 9.04, Xen Default
kernel              /boot/xen-3.4.1.gz
module              /boot/vmlinuz-2.6.18.8-xen root=/dev/sda5 ro console=tty0
module              /boot/initrd-2.6.18.8-xen.img
# This boot entry contains the paravirt_ops kernel compiled earlier, ext3 root filesystem found on drive uuid=505b3e6c-d4c1-4f48-9e4e-b54c2c46c41f=/dev/sda5
title               Ubuntu 9.04, Xen ParaVirtOps
uuid                505b3e6c-d4c1-4f48-9e4e-b54c2c46c41f
kernel              /boot/xen-3.4.1.gz
module              /boot/vmlinuz-2.6.31.5 root=UUID=505b3e6c-d4c1-4f48-9e4e-b54c2c46c41f ro console=tty0
module              /boot/initrd-2.6-xen.img
# This entry represents the original systems as found in the initial menu.lst file
title               Ubuntu 9.04, kernel 2.6.28-11-generic
uuid                505b3e6c-d4c1-4f48-9e4e-b54c2c46c41f
kernel              /boot/vmlinuz-2.6.28-11-generic root=UUID=505b3e6c-d4c1-4f48-9e4e-b54c2c46c41f ro quiet splash
initrd              /boot/initrd.img-2.6.28-11-generic

# This entry automatically added by the Debian installer for a non-linux OS on /dev/sda3
title Windows Vista (loader)
rootnoverify (hd0,2)
savedefault
makeactive
chainloader +1

You can now restart and try to boot one of the Xen configuration. For me only the second one worked with Ubuntu 9.04, the Default Xen configuration worked only if no mem_dom0 parameter was specified and had some boot errors (expecially regarding udev compatibility with /sys sysfs filesystem layout, and some memory window errors) which lead to the boot process taking a long time while mounting the file system and to the Xserver becaming unresponsive after some seconds/minutes in spent Dom0. I did not spent any more time investingating the problems because the paravirt_ops newer kernel worked like a charm. So basically you must always make sure that you use the same or at least a newer kernel version modified for a Xen Dom0 image that the one your system image came with. This will most likely reduce your deployment time by several days.

Another problem that I ran across was listing the current machines in my Xen Linux Dom0:

matmih@Express2:~$ sudo xm list
Error: Unable to connect to xend: No such file or directory. Is xend running?

Aparently this was caused by the xend daemon not even been running, you can easialy check this.  When I tried to manually start it I got the following:

matmih@Express2:~$ ps ax | grep xend # to check whether xend is running
matmih@Express2:~$ sudo /usr/sbin/xend start # xend should start automatically, check the xend script in /etc/init.d directory
ERROR Internal error: Could not obtain handle on privileged command interface (2 = No such file or directory)

Depending on the Xen/xend version you are running the python script will check either the sysfs exposed /sys/hypervisor directory or the xenfs‘s /proc/xen. In the current unstable branch of xen-tools, the file libxc/xc_linux.c looks for /proc/xen/privcmd.
But this may or may not work. First, the latest tip for the Dom0 kernel makes the whole /proc/xen entity an option. Presumably, it will be superseded by /sys/hypervisor, which itself is optional. So make sure that you booted Xen Linux and that you have the above directories for the coresponding filesystems mounted. If you do not see any contents in /sys/hypervisor directory than you may need to get an erlier version  of Xen, or Xen support has not been added to your Dom0 kernel. If there is nothing in /proc/xen you may need to mount xenfs filesystem on that location:

matmih@Express2:~$ cat /etc/fstab
# <file system>                            <mount point>   <type>        <options>                    <dump>  <pass>
proc                                        /proc           proc          defaults                     0       0
# / was on /dev/sda5 during installation
UUID=505b3e6c-d4c1-4f48-9e4e-b54c2c46c41f   /               ext3          relatime,errors=remount-ro   0       1
/dev/scd0                                   /media/cdrom0   udf,iso9660   user,noauto,exec,utf8        0       0
none                                        /proc/xen       xenfs         defaults                     0       0

A current view of my system’s mounts and the content of the hypervisor new filesystem entries can be found below:

matmih@Express2:~$ ls /sys/hypervisor/
compilation  properties  type  uuid  version
matmih@Express2:~$ ls /proc/xen/
capabilities  privcmd  xenbus  xsd_kva  xsd_port
matmih@Express2:~$  df -ah
Filesystem            Size  Used Avail Use% Mounted on/l
tmpfs                 1.5G     0  1.5G   0% /lib/init/rw
proc                     0     0     0   -  /proc
sysfs                    0     0     0   -  /sys
varrun                1.5G  108K  1.5G   1% /var/run
varlock               1.5G  4.0K  1.5G   1% /var/lock
udev                  1.5G  184K  1.5G   1% /dev
tmpfs                 1.5G   76K  1.5G   1% /dev/shm
devpts                   0     0     0   -  /dev/pts
fusectl                  0     0     0   -  /sys/fs/fuse/connections
none                     0     0     0   -  /proc/xen
securityfs               0     0     0   -  /sys/kernel/security
binfmt_misc              0     0     0   -  /proc/sys/fs/binfmt_misc
gvfs-fuse-daemon         0     0     0   -  /home/matmih/.gvfs
/dev/sr1               37M   37M     0 100% /media/VMC LITE 9.3.0.9237

The only problem that I can see is that even if my 1920×1080 resolution was kept (I had to install ati drivers to get 16:9 aspect ration working on Ubuntu 9.04) the refresh rate seems to be low as if I am running a Vesa driver for my video card. The resons seems to be that the Ati Radeon driver, flgrx,  has not been installed for my current kernel (lsmod | grep flgrx). The ati-driver-installer-9-9-x86.x86_64.run installer, appart from configuring my Xserver, has installed some drivers for my previous kernel (2.6.28-11). I need to rebuild and install the kernel drivers for my current paravirt_ops Xen 2.6.31-5 kernel, but running the installer again did not succeed to build the flgrx module again due to the following error:

matmih@Express2:~$ cat /usr/share/ati/fglrx-install.log
......
Building modules, stage 2.
 MODPOST 1 modules
 WARNING: "find_task_by_vpid" [/lib/modules/fglrx/build_mod/2.6.x/fglrx.ko] undefined!
.....

It seems that the find_task_by_vpid function has been removed starting with 2.6.30 kernel version. To fix this go to the fldrx driver sources location and do the following:

matmih@Express2:~$ cd /lib/modules/fglrx/build_mod
matmih@Express2:~:/lib/modules/fglrx/build_mod$ sudo wget http://launchpadlibrarian.net/29698859/find_task_by_vpid.patch
matmih@Express2:~:/lib/modules/fglrx/build_mod$ sudo patch -p1 < find_task_by_vpid.patch
matmih@Express2:~:/lib/modules/fglrx/build_mod$ sudo ./make.sh
matmih@Express2:~:/lib/modules/fglrx/build_mod$ cd ..
matmih@Express2:~:/lib/modules/fglrx$ sudo ./make_install.sh

Unfortunatly this did not fix my initial problems. Running dmesg command I could see that the flgrx module could not be loaded due to the following error message: “[fglrx] CPU does not support PAT”.

By Mihai Matei on November 5, 2009 | Ubuntu, Xen | 1 comment

Building a Linux kernel for Xen’s Dom0 image

There are several ways of building a paravirtualized Linux kernel that can be used for a Dom0 or DomU.

  • Default kernel from xen package

In xen3.4.1 sources the modified kernel is based on a 2.6.18.8 version. In order to build everything, including downloading the kernel sources you will need to go to xen directory and:

matmih@Express2:~/Work/xen-3.4.1$ make world

Note*: you will need mercurial packages in order to download the kernel.

You can use the default settings for the kernel configuration (including PCI passthrough in order to keep the same virtual <-> physical mappings of the PCI devices between Dom0 and guest partitions). The built kernel image will be placed at dist/install/boot/vmlinuz-2.6.18.8-xen. Also you will need to configure a ramdisk for the kernel to boot with your Linux distribution. If you need to rebuild the kernel of modify some configs you can use:

make linux-2.6-xen-config CONFIGMODE=menuconfig     (or xconfig)
make linux-2.6-xen-build
make linux-2.6-xen-install

This should be the easiest and safest mode to compile a paravirtualized kernel for Xen hypervisor, and should work for most distributions and filesystems images. But for my Ubuntu 9.04 this kernel is a bit old due to udev version and it was quite unstable, the boot log also had some problems and could not boot without assigning the whole RAM memory (specifying the dom0_mem parameter in grub script cause it to crash during boot time with “Out of Memory: Kill process 2 (migration/0) score 0 and children”) and even if I assigned all the physical memory it booted (though it took a very long time) but after a few seconds in gnome desktop it did not respond any more (no input was possible only hard reset).

So as a personal rule I should always use if not the same at least a newer kernel version than the one that came with my Ubuntu 9.04 distribution.

  • Vanilla kernel + Opensuse Xen patches

First we need to get the kernel sources for the version we want to patch using Andrew Lyon’s mentained patches for a Xen Dom0 kernel. A  good thing to do is to check out Gentoo Xen Patches to see the latest Xen patches version and the vanila kernel we want to download from the Linux Kernel Archive. From Andrew Lyon’s download page we we’ll get the latest xen patches, 2.6.31-7 based on Opensuse’s kernel sources 2.6.31-5. So we do the following:

matmih@Express2:~/Work$ wget http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.31.5.tar.bz2
matmih@Express2:~/Work$ wget http://gentoo-xen-kernel.googlecode.com/files/xen-patches-2.6.31-7.tar.bz2
matmih@Express2:~/Work$ bzcat linux-2.6.31.tar.bz2 | tar xf -
matmih@Express2:~/Work$ cd linux-2.6.31 ; mkdir xen-patches ; cd xen-patches
matmih@Express2:~/Work/linux-2.6.31/xen-patches$ bzcat ../../xen-patches-2.6.31-7.tar.bz2 | tar xf - ; cd ..
# now apply the pathes ... should be alphabetically ordered in the archive and will only print errors
matmih@Express2:~/Work/linux-2.6.31$ cat xen-patches/6*.patch* | patch -p1 -s -l

You now have a possible correct kernel environment that you can configure, build and install for a Dom0 image.

  • Xen paravirt_ops Linux kernel

A new kernel infrastructure that allows it to run paravirtualized on a hypervisor, like Xen, VMWare’s VMI and lguest. The infrastructure, supporting x86_32, x86_64 and ia64 architectures, allows you to compile a single kernel binary which will either boot native on bare hardware (or in hvm mode under Xen), or boot fully paravirtualized in any of the environments you’ve enabled in the kernel configuration. Xen pv_ops (domU) support has been in mainline Linux since 2.6.23, and is the basis of all on-going Linux/Xen development. In xen 3.5 it will be the default paravirtualized Dom0 kernel that comes with the hypervizor package. Although we are using the stable version of xen 3.4 we are going to use this kernel as it may as well be the most common way of building a Dom0 kernel in the future. You can find more information at XenParavirtOps page. First we need to get the latest sourcing unsing git found in git-core package:

matmih@Express2:~/Work$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git linux-2.6-xen$
matmih@Express2:~/Work$ cd linux-2.6-xen
matmih@Express2:~/Work/linux-2.6-xen$ git pull

Now that we have a valid source kernel environment, made with either methods, though I recommend the last one, we should begin configuring the kernel to be build for your image and hardware requirements, as well as for the requirements of a paravirtualized kernel that can run as a Dom0 kernel, or even in a DomU guest partition. Normally you should copy the kernel configuration that came with your distribution (found in /boot/config-2.6.28- 11-generic for my 64bit Ubuntu 9.04) to linux-2.6-xen/.config and start from there addind Xen features as you find them, as well as the devices may not be included, either in the binary or as modules (use lspci command to see which drivers you need on your machine). If you are using the paravirt-ops kernel, as I did the only thing that you must enable is Dom0 support, so:

matmih@Express2:~/Work/linux-2.6-xen$ make menuconfig
# and enable the following
# Processor type and features --->
#    [*] Paravirtualized guest support --->
#           [*] Enable Xen priviledged domain support (NEW)

And the build and install the kernel image. Do not forget to also build the initial ramdisk that will contain some of the modules required for Dom0 to boot.

matmih@Express2:~/Work/linux-2.6-xen$ make
matmih@Express2:~/Work/linux-2.6-xen$ sudo make modules_install install
matmih@Express2:~/Work/linux-2.6-xen$ ls /lib/modules ; ls /boot # to see what was the name of the kernel newly installed
matmih@Express2:~/Work/linux-2.6-xen$ sudo mkinitramfs -o /boot/initrd-2.6-xen.img 2.6.31.5

You should now have the kernel image and initial ramdisk located in your local /boot directory. Also, expecially if you have a SATA disk drive, do not forget to force adding the following modules to your image, if not already built into the kernel image: aacraid, sd_mod, scsi_mod. A detailed description about the SCSI system, for 2.4 kernel at least, can be found at here, and an architectural overview of the implementation can be seen bellow:

SCSI architecture

So basically you must do the following things to make sure that the above modules are in your ramdisk, depending if you are using mkinitrd, or mkinitramfs utility, where 2.6.31.5 is the kernel version you want to create the image for:

matmih@Express2:~$ sudo depmode 2.6.31.5
matmih@Express2:~$ sudo mkinitrd -v -f --with=aacraid --with=sd_mod --with=scsi_mod /boot/initrd-xen.img 2.6.31.5
# or add the modules to mkinitramfs utility configuration file like I did and run the utility command
matmih@Express2:~$ cat /etc/initramfs-tools/modules
aacraid
sd_mod
scsi_mod
matmih@Express2:~$ sudo mkinitramfs -o /boot/initrd-2.6-xen.img 2.6.31.5

Usually a common error message when the kernel tries to boot is usually “VFS: Cannot open root device “sda1″ or unknown-block(0,0)“. Also if the boot log indicates that it cannot mount the filesystem you should check if you also added (or built in the kernel image) the required filesystem (like ext3). You can force adding this module as well like above I did for the SCSI SATA disk requirements, but running depmod for the installed modules of the built kernel should add the above modules (you can see what depmod generated looking at, for example, /lib/modules/2.6.31.5/modules.dep file used by the initial ramdisk creation utilities to add the modules to the compressed initrd image).

There are several ways of building a paravirtualized Linux kernel that can be used for a Dom0 or DomU.

  • Default kernel from xen package.

In xen3.4.1 sources the modified kernel is based on a 2.6.18.8 version. In order to build everything, including downloading the kernel sources you will need to go to xen directory and:

matmih@Express2:~/Work/xen-3.4.1$ make world

Note*: you will need mercurial packages in order to download the kernel.

You can use the default settings for the kernel configuration (including PCI passthrough in order to keep the same virtual <-> physical mappings of the PCI devices between Dom0 and guest partitions). The built kernel image will be placed at dist/install/boot/vmlinuz-2.6.18.8-xen. Also you will need to configure a ramdisk for the kernel to boot with your Linux distribution. If you need to rebuild the kernel of modify some configs you can use:

make linux-2.6-xen-config CONFIGMODE=menuconfig     (or xconfig)
make linux-2.6-xen-build
make linux-2.6-xen-install

This should be the easiest and safest mode to compile a paravirtualized kernel for Xen hypervisor, and should work for most distributions and filesystems images. But for my Ubuntu 9.04 this kernel is a bit old due to udev version and it was quite unstable, the boot log also had some problems and could not boot without assigning the whole RAM memory (specifying the dom0_mem parameter in grub script cause it to crash during boot time with –complete here the warnings —

  • Vanilla kernel + Opensuse Xen patches.

First we need to get the kernel sources for the version we want to patch using Andrew Lyon’s mentained patches for a Xen Dom0 kernel. A  good thing to do is to check out Gentoo Xen Patches to see the latest Xen patches version and the vanila kernel we want to download from the Linux Kernel Archive. From Andrew Lyon’s download page we we’ll get the latest xen patches, 2.6.31-7 based on Opensuse’s kernel sources 2.6.31-5. So we do the following:

matmih@Express2:~/Work$ wget http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.31.5.tar.bz2
matmih@Express2:~/Work$ wget http://gentoo-xen-kernel.googlecode.com/files/xen-patches-2.6.31-7.tar.bz2
matmih@Express2:~/Work$ bzcat linux-2.6.31.tar.bz2 | tar xf -
matmih@Express2:~/Work$ cd linux-2.6.31 ; mkdir xen-patches ; cd xen-patches
matmih@Express2:~/Work/linux-2.6.31/xen-patches$ bzcat ../../xen-patches-2.6.31-7.tar.bz2 | tar xf - ; cd ..
# now apply the pathes ... should be alphabetically ordered in the archive and will only print errors
matmih@Express2:~/Work/linux-2.6.31$ cat xen-patches/6*.patch* | patch -p1 -s -l

You now have a possible correct kernel environment that you can configure, build and install for a Dom0 image.

  • Xen paravairt_ops Linux kernel

A new kernel infrastructure that allows it to run paravirtualized on a hypervisor, like Xen, VMWare’s VMI and lguest. The infrastructure, supporting x86_32, x86_64 and ia64 architectures, allows you to compile a single kernel binary which will either boot native on bare hardware (or in hvm mode under Xen), or boot fully paravirtualized in any of the environments you’ve enabled in the kernel configuration. Xen pv_ops (domU) support has been in mainline Linux since 2.6.23, and is the basis of all on-going Linux/Xen development. In xen 3.5 it will be the default paravirtualized Dom0 kernel that comes with the hypervizor package. Although we are using the stable version of xen 3.4 we are going to use this kernel as it may as well be the most common way of building a Dom0 kernel in the future. You can find more information at XenParavirtOps page. First we need to get the latest sourcing unsing git found in git-core package:

matmih@Express2:~/Work$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git linux-2.6-xen$
matmih@Express2:~/Work$ cd linux-2.6-xen
matmih@Express2:~/Work/linux-2.6-xen$ git pull

Now that we have a valid source kernel environment, made with either methods, though I recommend the last one, we should begin configuring the kernel to be build for your image and hardware requirements, as well as for the requirements of a paravirtualized kernel that can run as a Dom0 kernel, or even in a DomU guest partition. Normally you should copy the kernel configuration that came with your distribution (found in /boot/config-2.6.28- 11-generic for my 64bit Ubuntu 9.04) to linux-2.6-xen/.config and start from there addind Xen features as you find them, as well as the devices may not be included, either in the binary or as modules (use lspci command to see which drivers you need on your machine). If you are using the paravirt-ops kernel, as I did the only thing that you must enable is Dom0 support, so:

matmih@Express2:~/Work/linux-2.6-xen$ make menuconfig
# and enable the following
# Processor type and features --->
#    [*] Paravirtualized guest support --->
#           [*] Enable Xen priviledged domain support (NEW)

And the build and install the kernel image. Do not forget to also build the initial ramdisk that will contain some of the modules required for Dom0 to boot.

matmih@Express2:~/Work/linux-2.6-xen$ make
matmih@Express2:~/Work/linux-2.6-xen$ sudo make modules_install install
matmih@Express2:~/Work/linux-2.6-xen$ ls /lib/modules ; ls /boot # to see what was the name of the kernel newly installed
matmih@Express2:~/Work/linux-2.6-xen$ sudo mkinitramfs -o /boot/initrd-2.6-xen.img 2.6.31.5

You should now have the kernel image and initial ramdisk located in your local /boot directory. Also do not forget to force adding the following modules to your image, if not already built into the kernel image: aacraid, sd_mod, scs

matmih@Express2:~$ cat /etc/initramfs-tools/modules
aacraid
sd_mod
scsi_mod


By Mihai Matei on November 2, 2009 | Ubuntu, Xen | A comment?

Xen, Linux and udev

udev is the device manager for the Linux 2.6 kernel series. Primarily, it manages device nodes in /dev. It is the successor of devfs and hotplug, which means that it handles the /dev directory and all user space actions when adding/removing devices, including firmware load. The system is divided into three parts:

  • The library libsysfs, that allows access to device information (dropped since version 080)
  • The daemon udevd, in user space, that manages the virtual /dev
  • The administrative command udevadm for diagnostics

When trying to boot Ubuntu 9.04 with 2.6.18.8 Xen modified kernel that comes with default Xen sources the following error ocured (look at /var/log/syslog file for the boot log):

Oct 25 22:52:26 Express2 kernel: udev: starting version 141
Oct 25 22:52:26 Express2 kernel: udev: deprecated sysfs layout; update the kernel or disable CONFIG_SYSFS_DEPRECATED;
 some udev features will not work correctly

My current image distribution udev version is 141. In order to get udev, distribution and current kernel version you can do the following:

matmih@Express2:~$ udevadm info -V ; cat /etc/issue ; uname -a
141
Ubuntu 9.04 \n \l
Linux Express2 2.6.28-11-generic #42-Ubuntu SMP Fri Apr 17 01:58:03 UTC 2009 x86_64 GNU/Linux

This means that the kernel version that comes with xen 3.4.1 may  be too old for my Ubuntu. As you can see from the above the original modified kernel (with Ubuntu patches) was based on a 2.6.28 version.

But let’s take a short look at how custom Linux Dom0 makes use of udev services. After you compiled xen 3.4.1 you can take a look in:

matmih@Express2:~/Work/xen-3.4.1$ ls dist/install
boot  etc  lib  usr  var

Here is it where xen build system will place all the binaries and configuration files needed to be deployed in the dom0 image. The udev stuff can be found in etc/hotplug and etc/udev directories.

Lets take a look at what custom udev rules xen system is adding to the Linux Dom0 Image:

matmih@Express2:~/Work/xen-3.4.1/dist/install/etc$ cat udev/xen-backend.rules
SUBSYSTEM=="xen-backend", KERNEL=="tap*", RUN+="/etc/xen/scripts/blktap $env{ACTION}"
SUBSYSTEM=="xen-backend", KERNEL=="vbd*", RUN+="/etc/xen/scripts/block $env{ACTION}"
SUBSYSTEM=="xen-backend", KERNEL=="vtpm*", RUN+="/etc/xen/scripts/vtpm $env{ACTION}"
SUBSYSTEM=="xen-backend", KERNEL=="vif*", ACTION=="online", RUN+="$env{script} online"
SUBSYSTEM=="xen-backend", KERNEL=="vif*", ACTION=="offline", RUN+="$env{script} offline"
SUBSYSTEM=="xen-backend", KERNEL=="vscsi*", RUN+="/etc/xen/scripts/vscsi $env{ACTION}"
SUBSYSTEM=="xen-backend", ACTION=="remove", RUN+="/etc/xen/scripts/xen-hotplug-cleanup"
KERNEL=="evtchn", NAME="xen/%k"
KERNEL=="blktap[0-9]*", NAME="xen/%k"
matmih@Express2:~/Work/xen-3.4.1/dist/install/etc$ cat udev/xend.rules
SUBSYSTEM=="pci", RUN+="socket:/org/xen/xend/udev_event"
SUBSYSTEM=="scsi", RUN+="socket:/org/xen/xend/udev_event"
#SUBSYSTEM=="net", KERNEL!="vif[0-9]*.[0-9]*|tap[0-9]*.[0-9]*", RUN+="socket:/org/xen/xend/udev_event"

We can see that for all xen-backend devices, such as tap or vscsi certain scripts are run for each specifice driver event. As well all pci and scsi device events are sent through a named socket org.xen.xend to be processed by the xend daemon, in the same format the kernel uses for the event – usually a netlink message.

For example the udev rule line:

SUBSYSTEM=="xen-backend", KERNEL=="tap*", RUN+="/etc/xen/scripts/blktap $env{ACTION}"

calls, for every tap* block device added to a virtual machine (a partition backed by a local filesystem file – blktap device functionality) /etc/xen/scripts/blktap bash script which checks the xenstore database if this file on the local filesystem hasn’t been already assigned to another virtual block device or if the access rights on the file are ok. If the script fails the udev system does not allow the file to be used (the device operation is forbidden).

By Mihai Matei on October 29, 2009 | Xen | A comment?

Starting on the yellow brick road…

“Tzutzu, we’re not in Kansas any more!”

After several long years in school dedicated to becoming an IT specialist and 2 years of working in multinational firms with no raise I finally decided to take some time off to finish my master degree and perhaps to start a new company where I will certaintly not be promoted.

My current hardware configuration can be found at: Dellnb/StudioXPS16/2.8G/C2D/16HDready/2×2048/320FFS7.2k/ATIHD3670/3yNBD/VistaHomeP-v2 , which is basically a T9600 2,8 GHz, IVT Core 2 Duo processor with 4Gb DDR3 memory and 320Gb SATA hard-drive and HD3670 Radeon video board.

The current configuration comes with a Vista Home Professional system, and my intention is to install a Xen hypervisor with a Ubuntu Dom0 2.6.18 modified kernel with Xen patches, to move my Windows system to a DomU Guest, with most of the PCI devices directly assigned to it and to have another DomU Linux Centos(Rhel5) system, no X server, for my Linux development needs as I intend to use the Windows system for my GUI needs (including remote source editing on my Linux DomU box). I do not want to use the Dom0 system directly for security resons since tampering with this system can lead to crashing the physical machine. The windows hypervisor guest partiton will be a fully virtualized machine, using the IVT features of the processor and will basically run in a Qemu process in the Dom0 Linux while the DomU Linux partition will be a paravirtualized guest directly calling hypervisor services through the hypercall shared memory page.

By Mihai Matei on October 27, 2009 | Xen | A comment?

Ubuntu
Xen