Lots of patches got submitted (and resubmitted) this week. Ian Jackson of the Xen project posted a half-dozen patches (which were mostly negatively recieved or outright ignored), and Anthony Liguori inquired about pushing KVM patches upstream implementing ambitious new features (currently with x86-specific implementations). Aurelien Jarno ended the week with another set of patches, mostly bug fixes.
Dor Laor posted a new version of his gigabit adapter device emulation:
It supports TCP/UDP and IP transmit checksum, as well as TSO. It has been tested with Linux (2.6.18|22|23|24)++ and Windows XP (using the driver supplied at the intel download site). Windows Vista also works with driver downloaded from Intel. Checksum calculation is currently naïve and unoptimized (the host kernel does it better). But when working in conjuction to tso the performance is drastically better. Some figures (using kvm): Linux rx 350Mbps, tx 150Mbps, Windows rx 700Mbps, tx 100Mbps. (using qemu): Linux rx 51Mbps, tx 113Mbps. The e1000_hw.h is copied from Linux kernel and after requests by list members it has been reduced by 2/3.
Laurent Vivier posted a series of 5 patches to upgrade SCSI support:
This series of patches makes some cleanups in SCSI passthrough and adds functionality. [PATCH 1/5] reverse scsi-generic Reverse previous implementation and restore block-raw-posix.c. [PATCH 2/5] Move AIO This patch moves raw AIO part from block-raw-posix.c to qemu-aio-raw.c. [PATCH 3/5] Add block SG interface This patch re-implement scsi-generic.c using a new block interface. [PATCH 4/5] DVD movie support This patch allows to read a protected/encrypted movie from a DVD. [PATCH 5/5] SCSI device DMA split This patch allows to split a READ or WRITE into several READ or WRITE.
Ian Jackson posted a patch to make it easier for the Xen project to copy code from QEMU:
qemu's audio subdirectory contains a copy of BSD's sys-queue.h, which defines a bunch of LIST_ macros. This makes it difficult to build a program made partly out of qemu and partly out of the Linux kernel[1], since Linux has a different set of LIST_ macros. It might also cause trouble when mixing with BSD-derived code. Under the circumstances it's probably best to rename the versions in qemu. The attached patch does this. [1] You might well ask why anyone would want to do this. In Xen we are moving our emulation of IO devices from processes which run on the host into a dedicated VM (one per actual VM) which we call a `stub domain'. This dedicated VM runs a very cut-down `operating system' which uses some code from Linux.
The general response from QEMU developers was negative. Anthony Liguori replied:
That doesn't seem like a very good justification. If you're mixing QEMU code with other code, it's easier for you to maintain these merge conflict fixes as normal QEMU developers would have no idea what it wasn't okay to just use LIST_xxx
Ian then attempted to defend his patch, to which Johannes Schindelin responded:
> Well, surely with something like qemu one might expect to mix the code > with other things ? Read what you wrote. By that reasoning you cannot use _any_ name in qemu, because qemu should bend over to be mixable with other code.
The thread continued on a bit from there, but did not become more sympathetic to Ian's patch.
Sergey Bychkov had a problem running a Windows guest under Linux:
I can't understand why clock in guest OS (Windows 2003) goes very slow.
Are you sure the rtc freq has been made to 1024? # cat /proc/sys/dev/rtc/max-user-freq should yield 1024 before you ran qemu.
Sergey said that helped, but didn't fix it. The next suggestion was the -clock command line option, and Sergey reported:
After some investigations I can say that with the latest (2008/01/30) qemu from cvs, compiled with gcc-3.4 on linux x86_64 host, guest OS win2k3 works not too good. With "-clock dynticks" clock in OS is very slow - and windows time server can't adjust. With "-clock rtc" hung periodically - for up to 5 minutes, 300 seconds. This could happen at bootstrap - when no OS, only BIOS. Then it resumes and works for some random period of time, then hangs again, and so on. This behaviour doesn't depend on guest OS, and was reproduced with Knoppix live CD.
Sergey decided to ignore rtc and focus on dynticks, and eventually found the cause of his slowdown:
I have found that slow clock was inspired by working UltraVNC server installed in guest OS. Possibly, often queries to video driver force qemu to "forget" to send clock IRQs to guest. At this time I didn't find more details about this problem, but stopping UltraVNC service completely remove it.
Laurent Vivier posted a patch allowing qemu-img to act as a network block device server for qcow images:
this patch allows to mount qemu disk images on the host. It is based on the Network Block Device protocol and allows qemu-img to become an NBD server (Yes, Anthony, userspace block device is the right way to do that... :-P ). Once you've applied the attached patch to Qemu and build the binaries, you can use it like that: # ./qemu-img server -d 1234 etch.qcow2 This starts an NBD server on port 1234. This server will expose the disk image etch.qcow2. "-d" means it will be daemonize and will run in background. Then you need to connect the block device to the server: # nbd-client localhost 1234 /dev/nbd0 Negotiation: ..size = 4194304KB bs=1024, sz=4194304 This will link etch.qcow2 to /dev/nbd0. Then to see partitions, you can use kpartx, as explained Daniel, or my patched loop modules (I can send an updated and bug free version). ... # kpartx -a /dev/nbd0 ... or ... # rmmod loop # insmod drivers/block/loop.ko max_part=64 # losetup -f /dev/nbd0 ... # mount /dev/loop0p1 /mnt # ls /mnt bench cdrom etc initrd.img media proc selinux tmp vmlinuz bin clients home lib mnt root srv usr boot dev initrd lost+found opt sbin sys var # cd # umount /mnt # losetup -d /dev/loop0 # nbd-client -d /dev/nbd0
Initial feedback suggested that this feature might someday be useful on hosts other than Linux, and Laruent agreed and posted an updated patch.
FYI, I've been maintaining qemu-nbd out of tree for a while now. http://hg.codemonkey.ws/qemu-nbd It also includes some nice features like read-only mount and exposing an individual partition.
Note, the general problem with this approach is that mounting a NBD device locally with write access can lead to dead locks. If you look through the mailing list archives, you'll find a number of conversations on the topic.
A discussion ensued about potential alternate implementations, but petered out without resolution. (Using an actual qemu process to export the filesystem apparently works.)
Ian Jackson posted another Xen patch:
The patch below makes it possible to disable AF_UNIX (unix-domain) sockets in host environments which do not define _WIN32, by adding -DNO_UNIX_SOCKETS to the compiler flags. This is useful in the effectively-embedded qemu host which are going to be using for device emulation in Xen.
The backstory is that over the past year, the Linux kernel developers have largely lost interest in Xen in favor of the less-intrusive KVM. In response, the Xen project is reducing its reliance on Linux as a host, instead trying to run on the bare hardware via a thin OS layer called "MINIOS". Like Windows, this OS layer doesn't support Unix domain sockets, so Ian wanted to genericize one of QEMU's Windows workarounds for other less-than-posix operating systems. (Both Xen and KVM use QEMU's device emulation code to provide virtual I/O devices for their guest operating systems to interact with.)
The intersting part of the thread was Johannes Schindelin's advice on how to go about genericizing the workaround:
> changing it to something like > > #if !(defined(_WIN32) || defined(MINIOS) > > seems very ugly. Yes, that is very ugly. But changing it to #ifndef NO_AF_UNIX_SOCKETS it actually gives you a bit of documentation what the code does, in addition to controlling what is compiled and what not. Like in the patch we saw today where there were a lot of "#ifdef __linux__", it is always good if you can see _why_ some code is enabled or disabled, instead of for what platform.
> It should just check a define for _MINIOS. That's exactly what we wanted to avoid. > That makes it a lot more obvious why it's not being included. But it doesn't necessarily make obvious _what_ is not being included (here, local sockets). To my mind, something like #if !(defined(_WIN32) || defined(_MINIOS)) #define DO_UNIX_SOCKET #endif And then in the code, #ifdef DO_UNIX_SOCKET, is much nicer than repeating the if (!def||def) everywhere (and have to change them all if another system needs that too)
As most probably know, the KVM project has been maintaining a QEMU tree for some time now. Beyond support for the KVM kernel interface, the tree also contains a number of useful features like live migration, virtio, and extboot. Some of these things have been posted to qemu-devel already but were not included. I would like to work on merging the KVM changes into upstream QEMU but before I started that work, I wanted to get a read on how difficult it would be. A lot of these things were designed specifically for KVM on x86. Only now are other architectures starting to be considered. Certainly, cross-architecture emulation hasn't really been considered. I wouldn't expect anything to be merged that caused a regression for cross-architecture emulation, but I don't really have the time to get a lot of the new features working for the cross-architecture case. I would expect, though, that if these things were merged, it would make it relatively easy for someone else to do that though. Is this a reasonable merge strategy? We won't introduce regressions but I can't guarantee these new things will work cross-architecture.
I think it depends to some extent whether things will need rewriting to be made cross-architecture. In particular if this requires interface changes. This means either breaking existing guests, or having to support both interfaces.
To which Anthony said:
That's a reasonable stance to take. I don't think anything in the tree right now presents that problem. I'll start sending out some patches and if you have specific concerns, we can talk about them 1-by-1.
One of the reasons qemu doesn't build with gcc 4 is that x86 is a very register starved architecture, and dyngen reserves several registers for its own use. Sometimes, a seemingly innocuous change to the C code (or change to the compiler version) means gcc can't allocate enough registers to compile a chunk of code, as in the case
gcc-3.4 -Wall -O2 -g -fno-strict-aliasing -fomit-frame-pointer -I. -I.. -I/tracks/src/src/qemu/target-i386 -I/tracks/src/src/qemu -MMD -MP -DNEED_CPU_H -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -I/tracks/src/src/qemu/fpu -DHAS_AUDIO -DHAS_AUDIO_CHOICE -I/tracks/src/src/qemu/slirp -c -o cpu-exec.o /tracks/src/src/qemu/cpu-exec.c /tracks/src/src/qemu/cpu-exec.c: In function `cmp1': /tracks/src/src/qemu/cpu-exec.c:143: error: unable to find a register to spill in class `DIREG' /tracks/src/src/qemu/cpu-exec.c:143: error: this is the insn: (insn:HI 15 62 16 0 /tracks/src/src/qemu/cpu-exec.c:140 (parallel [ (set (reg:SI 2 cx [64]) (unspec:SI [ (mem:BLK (reg/f:SI 66 [ s2 ]) [0 A8]) (reg:QI 0 ax [68]) (const_int 1 [0x1]) (reg:SI 2 cx [67]) ] 20)) (use (reg:SI 19 dirflag)) (clobber (reg/f:SI 66 [ s2 ])) (clobber (reg:CC 17 flags)) ]) 632 {*strlenqi_1} (insn_list 11 (insn_list 12 (insn_list 13 (insn_list 14 (nil))))) (expr_list:REG_DEAD (reg:SI 19 dirflag) (expr_list:REG_DEAD (reg:SI 2 cx [67]) (expr_list:REG_DEAD (reg:QI 0 ax [68]) (expr_list:REG_DEAD (reg/f:SI 66 [ s2 ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (expr_list:REG_UNUSED (reg/f:SI 66 [ s2 ]) (expr_list:REG_EQUAL (unspec:SI [ (mem:BLK (reg/f:SI 66 [ s2 ]) [0 A8]) (reg:QI 0 ax [68]) (const_int 1 [0x1]) (reg:SI 2 cx [67]) ] 20) (nil))))))))) /tracks/src/src/qemu/cpu-exec.c:143: confused by earlier errors, bailing out make[1]: *** [cpu-exec.o] Error 1 make[1]: Leaving directory `/tracks/src/src/qemu/i386-softmmu' make: *** [subdir-i386-softmmu] Error 2
In this case, gcc 3.4.6 was unable to find a spill register. Stefano Stabellini confirmed that the problem also occurred on gcc 3.3. Carlo Marcel Arenas Belon explained:
architectural limitation for x86 triggered by cpu-exec.c version 1.131, reverting to 1.130 allows the compilation to proceed
Brad Campbell said that a second file (vl.c) also needed to revert to an earlier version.
The importance of this thread is showing the problems dyngen causes. Not only is gcc 4.x unable to build QEMU, but seemingly innocent changes cause the existing compiler to unpredictably spew extremely complicated error messages from deep in the bowels of the optimizer, which look like Lisp rather than C. The only solution is to revert the code back to a version that worked.
Stefano Stabellini posted a patch:
qemu doesn't enqueue mouse events, just records the latest mouse state. This can cause some lost mouse double clicks if the events are not processed fast enought. I am attaching a patch that implements a simple queue for left mouse click events.
Aurelien Jarno posted a patch adding NMI support:
While testing KGDB (yeah, it actually seem to make it into mainline!) under QEMU, I failed to get it running in SMP mode. Reason: NMI IPIs are not correctly handled by QEMU's emulated APIC. To overcome this, the patch below introduces a new interruption request, CPU_INTERRUPT_NMI, so that a VCPU can cleanly send this special interrupt to other VCPUs. It also introduces HF_NMI_MASK which shall ensure that NMIs are not recursively triggered, but I must confess that this particular property was not really tested yet. CPU_INTERRUPT_NMI is then trivially exploited by apic_bus_deliver to send out both NMIs and (for the sake of completeness - it's untested as well SMIs). With this patch applied, I'm finally able to run (and potentially debug) KGDB for Linux SMP guests.
Robi Yangel wondered if the original NMI patch could be extended for watchdog support, and Jan posted two more patches.
Slow week. The only patch of note was the new "-translation=no-cache" debugging option to disable the translation buffer cache.
This week's commits: