QEMU weekly news: Jan 22, 2008 - Jan 28, 2008

1 Mailing list

1.1 Jan 22, 2008 - [PATCH][RESEND] e1000 device emulation
1.2 Jan 23, 2008 - [PATCH 0/5] SCSI passthrough cleanup
1.3 Jan 24, 2008 - [PATCH] avoid name clashes due to LIST_*
1.4 Jan 24, 2008 - Slow clock in guest OS
1.5 Jan 25, 2008 - [PATCH][RFC] To mount qemu disk image on the host
1.6 Jan 25, 2008 - [PATCH] Allow AF_UNIX sockets to be disabled on non-Windows
1.7 Jan 25, 2008 - Merging KVM QEMU changes upstream
1.8 Jan 27, 2008 - Compilation error on Ubuntu 6.06 and 7.10 with gcc-3.4
1.9 Jan 28, 2008 - Mouse click simple queue
1.10 Jan 28, 2008 - APIC: add NMI and SMI IPI support

Lots of patches got submitted (and resubmitted) this week. Ian Jackson of the Xen project posted a half-dozen patches (which were mostly negatively recieved or outright ignored), and Anthony Liguori inquired about pushing KVM patches upstream implementing ambitious new features (currently with x86-specific implementations). Aurelien Jarno ended the week with another set of patches, mostly bug fixes.

1.1 Jan 22, 2008 - [PATCH][RESEND] e1000 device emulation

Dor Laor posted a new version of his gigabit adapter device emulation:

It supports TCP/UDP and IP transmit checksum, as well as TSO.
It has been tested with Linux (2.6.18|22|23|24)++ and Windows XP (using
the driver supplied at the intel download site).
Windows Vista also works with driver downloaded from Intel.

Checksum calculation is currently naïve and unoptimized (the host
kernel does it better). But when working in conjuction to tso
the performance is drastically better.

Some figures (using kvm): Linux rx 350Mbps, tx 150Mbps, Windows rx
700Mbps, tx 100Mbps.
(using qemu): Linux rx 51Mbps, tx 113Mbps.

The e1000_hw.h is copied from Linux kernel and after requests by list
members it has been reduced by 2/3.

1.2 Jan 23, 2008 - [PATCH 0/5] SCSI passthrough cleanup

Laurent Vivier posted a series of 5 patches to upgrade SCSI support:

This series of patches makes some cleanups in SCSI passthrough and
adds functionality.

[PATCH 1/5] reverse scsi-generic
Reverse previous implementation and restore block-raw-posix.c.

[PATCH 2/5] Move AIO
This patch moves raw AIO part from block-raw-posix.c to qemu-aio-raw.c.

[PATCH 3/5] Add block SG interface
This patch re-implement scsi-generic.c using a new block interface.

[PATCH 4/5] DVD movie support
This patch allows to read a protected/encrypted movie from a DVD.

[PATCH 5/5] SCSI device DMA split
This patch allows to split a READ or WRITE into several READ or WRITE.

1.3 Jan 24, 2008 - [PATCH] avoid name clashes due to LIST_*

Ian Jackson posted a patch to make it easier for the Xen project to copy code from QEMU:

qemu's audio subdirectory contains a copy of BSD's sys-queue.h, which
defines a bunch of LIST_ macros.  This makes it difficult to build a
program made partly out of qemu and partly out of the Linux kernel[1],
since Linux has a different set of LIST_ macros.  It might also cause
trouble when mixing with BSD-derived code.

Under the circumstances it's probably best to rename the versions in
qemu.  The attached patch does this.

[1] You might well ask why anyone would want to do this.  In Xen we
are moving our emulation of IO devices from processes which run on the
host into a dedicated VM (one per actual VM) which we call a `stub
domain'.  This dedicated VM runs a very cut-down `operating system'
which uses some code from Linux.

The general response from QEMU developers was negative. Anthony Liguori replied:

That doesn't seem like a very good justification.  If you're mixing QEMU 
code with other code, it's easier for you to maintain these merge 
conflict fixes as normal QEMU developers would have no idea what it 
wasn't okay to just use LIST_xxx

Ian then attempted to defend his patch, to which Johannes Schindelin responded:

> Well, surely with something like qemu one might expect to mix the code 
> with other things ?

Read what you wrote.  By that reasoning you cannot use _any_ name in qemu, 
because qemu should bend over to be mixable with other code.

The thread continued on a bit from there, but did not become more sympathetic to Ian's patch.

1.4 Jan 24, 2008 - Slow clock in guest OS

Sergey Bychkov had a problem running a Windows guest under Linux:

I can't understand why clock in guest OS (Windows 2003) goes very slow.

Mulyadi Santosa suggested:

Are you sure the rtc freq has been made to 1024?
# cat /proc/sys/dev/rtc/max-user-freq
should yield 1024 before you ran qemu.

Sergey said that helped, but didn't fix it. The next suggestion was the -clock command line option, and Sergey reported:

After some investigations I can say that with the latest (2008/01/30) qemu 
from cvs, compiled with gcc-3.4 on linux x86_64 host, guest OS win2k3 works 
not too good.
With "-clock dynticks" clock in OS is very slow - and windows time server 
can't adjust.
With "-clock rtc" hung periodically - for up to 5 minutes, 300 seconds. This 
could happen at bootstrap - when no OS, only BIOS. Then it resumes and works 
for some random period of time, then hangs again, and so on. This behaviour 
doesn't depend on guest OS, and was reproduced with Knoppix live CD.

Sergey decided to ignore rtc and focus on dynticks, and eventually found the cause of his slowdown:

I have found that slow clock was inspired by working UltraVNC server 
installed in guest OS.
Possibly, often queries to video driver force qemu to "forget" to send clock 
IRQs to guest.
At this time I didn't find more details about this problem, but stopping 
UltraVNC service completely remove it.

1.5 Jan 25, 2008 - [PATCH][RFC] To mount qemu disk image on the host

Laurent Vivier posted a patch allowing qemu-img to act as a network block device server for qcow images:

this patch allows to mount qemu disk images on the host.

It is based on the Network Block Device protocol and allows qemu-img to
become an NBD server (Yes, Anthony, userspace block device is the right
way to do that... :-P ).

Once you've applied the attached patch to Qemu and build the binaries,
you can use it like that:

# ./qemu-img server -d 1234 etch.qcow2

This starts an NBD server on port 1234. This server will expose
the disk image etch.qcow2. "-d" means it will be daemonize and will run
in background.

Then you need to connect the block device to the server:

# nbd-client localhost 1234 /dev/nbd0
Negotiation: ..size = 4194304KB
bs=1024, sz=4194304

This will link etch.qcow2 to /dev/nbd0.

Then to see partitions, you can use kpartx, as explained Daniel, or my
patched loop modules (I can send an updated and bug free version).
...
# kpartx -a /dev/nbd0
...
or
...
# rmmod loop
# insmod drivers/block/loop.ko max_part=64
# losetup -f /dev/nbd0
...
# mount /dev/loop0p1 /mnt
# ls /mnt
bench  cdrom    etc     initrd.img  media  proc  selinux  tmp  vmlinuz
bin    clients  home    lib         mnt    root  srv      usr
boot   dev      initrd  lost+found  opt    sbin  sys      var
# cd
# umount /mnt
# losetup -d  /dev/loop0
# nbd-client -d /dev/nbd0

Initial feedback suggested that this feature might someday be useful on hosts other than Linux, and Laruent agreed and posted an updated patch.

Anthony Liguori noted:

FYI, I've been maintaining qemu-nbd out of tree for a while now.  
http://hg.codemonkey.ws/qemu-nbd

It also includes some nice features like read-only mount and exposing an 
individual partition.

And also said:

Note, the general problem with this approach is that mounting a NBD 
device locally with write access can lead to dead locks.  If you look 
through the mailing list archives, you'll find a number of conversations 
on the topic.

A discussion ensued about potential alternate implementations, but petered out without resolution. (Using an actual qemu process to export the filesystem apparently works.)

1.6 Jan 25, 2008 - [PATCH] Allow AF_UNIX sockets to be disabled on non-Windows

Ian Jackson posted another Xen patch:

The patch below makes it possible to disable AF_UNIX (unix-domain)
sockets in host environments which do not define _WIN32, by adding
-DNO_UNIX_SOCKETS to the compiler flags.  This is useful in the
effectively-embedded qemu host which are going to be using for device
emulation in Xen.

The backstory is that over the past year, the Linux kernel developers have largely lost interest in Xen in favor of the less-intrusive KVM. In response, the Xen project is reducing its reliance on Linux as a host, instead trying to run on the bare hardware via a thin OS layer called "MINIOS". Like Windows, this OS layer doesn't support Unix domain sockets, so Ian wanted to genericize one of QEMU's Windows workarounds for other less-than-posix operating systems. (Both Xen and KVM use QEMU's device emulation code to provide virtual I/O devices for their guest operating systems to interact with.)

The intersting part of the thread was Johannes Schindelin's advice on how to go about genericizing the workaround:

> changing it to something like
> 
>  #if !(defined(_WIN32) || defined(MINIOS)
> 
> seems very ugly.

Yes, that is very ugly.  But changing it to

#ifndef NO_AF_UNIX_SOCKETS

it actually gives you a bit of documentation what the code does, in 
addition to controlling what is compiled and what not.

Like in the patch we saw today where there were a lot of "#ifdef 
__linux__", it is always good if you can see _why_ some code is enabled or 
disabled, instead of for what platform.

Samuel Thibault elaborated:

> It should just check a define for _MINIOS.

That's exactly what we wanted to avoid.

> That makes it a lot more obvious why it's not being included.

But it doesn't necessarily make obvious _what_ is not being included
(here, local sockets). To my mind, something like

#if !(defined(_WIN32) || defined(_MINIOS))
#define DO_UNIX_SOCKET
#endif

And then in the code, #ifdef DO_UNIX_SOCKET, is much nicer than
repeating the if (!def||def) everywhere (and have to change them all if
another system needs that too)

1.7 Jan 25, 2008 - Merging KVM QEMU changes upstream

Anthony Liguori asked:

As most probably know, the KVM project has been maintaining a QEMU tree 
for some time now.  Beyond support for the KVM kernel interface, the 
tree also contains a number of useful features like live migration, 
virtio, and extboot.  Some of these things have been posted to 
qemu-devel already but were not included.

I would like to work on merging the KVM changes into upstream QEMU but 
before I started that work, I wanted to get a read on how difficult it 
would be.  A lot of these things were designed specifically for KVM on 
x86.  Only now are other architectures starting to be considered.  
Certainly, cross-architecture emulation hasn't really been considered.

I wouldn't expect anything to be merged that caused a regression for 
cross-architecture emulation, but I don't really have the time to get a 
lot of the new features working for the cross-architecture case.  I 
would expect, though, that if these things were merged, it would make it 
relatively easy for someone else to do that though.

Is this a reasonable merge strategy?  We won't introduce regressions but 
I can't guarantee these new things will work cross-architecture.

Paul Brook replied:

I think it depends to some extent whether things will need rewriting to be 
made cross-architecture. In particular if this requires interface changes.  
This means either breaking existing guests, or having to support both 
interfaces.

To which Anthony said:

That's a reasonable stance to take.  I don't think anything in the tree 
right now presents that problem.  I'll start sending out some patches 
and if you have specific concerns, we can talk about them 1-by-1.

1.8 Jan 27, 2008 - Compilation error on Ubuntu 6.06 and 7.10 with gcc-3.4

One of the reasons qemu doesn't build with gcc 4 is that x86 is a very register starved architecture, and dyngen reserves several registers for its own use. Sometimes, a seemingly innocuous change to the C code (or change to the compiler version) means gcc can't allocate enough registers to compile a chunk of code, as in the case

Brad Campbell reported:

gcc-3.4 -Wall -O2 -g -fno-strict-aliasing -fomit-frame-pointer -I. -I.. 
-I/tracks/src/src/qemu/target-i386 -I/tracks/src/src/qemu -MMD -MP -DNEED_CPU_H -D_GNU_SOURCE 
-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -I/tracks/src/src/qemu/fpu -DHAS_AUDIO -DHAS_AUDIO_CHOICE 
-I/tracks/src/src/qemu/slirp    -c -o cpu-exec.o /tracks/src/src/qemu/cpu-exec.c
/tracks/src/src/qemu/cpu-exec.c: In function `cmp1':
/tracks/src/src/qemu/cpu-exec.c:143: error: unable to find a register to spill in class `DIREG'
/tracks/src/src/qemu/cpu-exec.c:143: error: this is the insn:
(insn:HI 15 62 16 0 /tracks/src/src/qemu/cpu-exec.c:140 (parallel [
             (set (reg:SI 2 cx [64])
                 (unspec:SI [
                         (mem:BLK (reg/f:SI 66 [ s2 ]) [0 A8])
                         (reg:QI 0 ax [68])
                         (const_int 1 [0x1])
                         (reg:SI 2 cx [67])
                     ] 20))
             (use (reg:SI 19 dirflag))
             (clobber (reg/f:SI 66 [ s2 ]))
             (clobber (reg:CC 17 flags))
         ]) 632 {*strlenqi_1} (insn_list 11 (insn_list 12 (insn_list 13 (insn_list 14 (nil)))))
     (expr_list:REG_DEAD (reg:SI 19 dirflag)
         (expr_list:REG_DEAD (reg:SI 2 cx [67])
             (expr_list:REG_DEAD (reg:QI 0 ax [68])
                 (expr_list:REG_DEAD (reg/f:SI 66 [ s2 ])
                     (expr_list:REG_UNUSED (reg:CC 17 flags)
                         (expr_list:REG_UNUSED (reg/f:SI 66 [ s2 ])
                             (expr_list:REG_EQUAL (unspec:SI [
                                         (mem:BLK (reg/f:SI 66 [ s2 ]) [0 A8])
                                         (reg:QI 0 ax [68])
                                         (const_int 1 [0x1])
                                         (reg:SI 2 cx [67])
                                     ] 20)
                                 (nil)))))))))
/tracks/src/src/qemu/cpu-exec.c:143: confused by earlier errors, bailing out
make[1]: *** [cpu-exec.o] Error 1
make[1]: Leaving directory `/tracks/src/src/qemu/i386-softmmu'
make: *** [subdir-i386-softmmu] Error 2

In this case, gcc 3.4.6 was unable to find a spill register. Stefano Stabellini confirmed that the problem also occurred on gcc 3.3. Carlo Marcel Arenas Belon explained:

architectural limitation for x86 triggered by cpu-exec.c version 1.131,
reverting to 1.130 allows the compilation to proceed

Brad Campbell said that a second file (vl.c) also needed to revert to an earlier version.

The importance of this thread is showing the problems dyngen causes. Not only is gcc 4.x unable to build QEMU, but seemingly innocent changes cause the existing compiler to unpredictably spew extremely complicated error messages from deep in the bowels of the optimizer, which look like Lisp rather than C. The only solution is to revert the code back to a version that worked.

1.9 Jan 28, 2008 - Mouse click simple queue

Stefano Stabellini posted a patch:

qemu doesn't enqueue mouse events, just records the latest mouse state.
This can cause some lost mouse double clicks if the events are not 
processed fast enought.
I am attaching a patch that implements a simple queue for left mouse 
click events.

1.10 Jan 28, 2008 - APIC: add NMI and SMI IPI support

Aurelien Jarno posted a patch adding NMI support:

While testing KGDB (yeah, it actually seem to make it into mainline!)
under QEMU, I failed to get it running in SMP mode. Reason: NMI IPIs are
not correctly handled by QEMU's emulated APIC.

To overcome this, the patch below introduces a new interruption request,
CPU_INTERRUPT_NMI, so that a VCPU can cleanly send this special
interrupt to other VCPUs. It also introduces HF_NMI_MASK which shall
ensure that NMIs are not recursively triggered, but I must confess that
this particular property was not really tested yet.

CPU_INTERRUPT_NMI is then trivially exploited by apic_bus_deliver to
send out both NMIs and (for the sake of completeness - it's untested as
well SMIs).

With this patch applied, I'm finally able to run (and potentially debug)
KGDB for Linux SMP guests.

Robi Yangel wondered if the original NMI patch could be extended for watchdog support, and Jan posted two more patches.

2 Source control

Slow week. The only patch of note was the new "-translation=no-cache" debugging option to disable the translation buffer cache.

This week's commits: