[Openstack-operators] qemu 1.x to 2.0

Daniele Venzano daniele.venzano at eurecom.fr
Tue Oct 21 05:56:22 UTC 2014


The version we are using is:
1.10.2-0ubuntu2~cloud0

The version that was not working for us is:
2.0.1+git20140120-0ubuntu2~cloud1

Network:
Intel Corporation I350 Gigabit Network Connection (igb module)

We were seeing the problem, strangely enough, at the application level, 
inside the VMs, where Hadoop was reporting corrupted data on TCP 
connections. No other messages on the hypervisor or in the VM kernel. 
Hadoop makes lots of connections to lots of different VMs moving lots 
(terabytes) of data as fast as posssibile. Also, it was 
non-deterministic, Hadoop would try several times to transfer the data, 
sometimes successfully, sometimes giving up. I tried some quick iperf 
tests, but they worked fine.

Daniele

On 10/20/14 18:46, Manish Godara wrote:
> > We had to do the same downgrade with openvswitch, the newest 
> version, under heavy load, corrupts packets in-transit, but we do not 
> have the time to investigate the issue further.
>
> Daniele, what was the openvswitch version before and after the 
> upgrade?  And which ethernet drivers do you have?  The corruption 
> maybe related to the drivers you have (the issues may be triggered by 
> the way openvswitch flows are configured in Icehouse vs Havana).
>
> Thanks.
>
> From: Daniele Venzano <daniele.venzano at eurecom.fr 
> <mailto:daniele.venzano at eurecom.fr>>
> Organization: Eurecom
> Date: Sunday, October 19, 2014 11:46 PM
> To: "openstack-operators at lists.openstack.org 
> <mailto:openstack-operators at lists.openstack.org>" 
> <openstack-operators at lists.openstack.org 
> <mailto:openstack-operators at lists.openstack.org>>
> Subject: Re: [Openstack-operators] qemu 1.x to 2.0
>
> We have the same setup (Icehouse on Ubuntu 12.04) and had similar 
> issues. We downgraded qemu from 2.x to 1.x, as we cannot terminate all 
> VMs for all users. We had non-resumable VMs also in the middle of the 
> 1.x series and nothing was documented in the changlelog.
> We had to do the same downgrade with openvswitch, the newest version, 
> under heavy load, corrupts packets in-transit, but we do not have the 
> time to investigate the issue further.
>
> We plan to warn our users in time for the next major upgrade to Juno 
> that all VMs need to be terminated, probably during the Christmas 
> holidays. I do not think they will be happy.
> Seeing also all the problems we had upgrading Neutron from OVS to ML2, 
> terminating all VMs is probably the best policy anyway during an 
> OpenStack upgrade. Or you do lots of migrations and upgrade qemu one 
> compute host at the time, but if something goes wrong you end-up with 
> an angry user and a stuck VM.
>
> It certainly is a big deal.
>
> On 10/20/14 00:59, Joe Topjian wrote:
>> Hello,
>>
>> We recently upgraded an OpenStack Grizzly environment to Icehouse 
>> (doing a quick stop-over at Havana). This environment is still 
>> running Ubuntu 12.04.
>>
>> The Ubuntu 14.04 release notes 
>> <https://wiki.ubuntu.com/TrustyTahr/ReleaseNotes#Ubuntu_Server> make 
>> mention of incompatibilities with 12.04 and moving to 14.04 and qemu 
>> 2.0. I didn't think that this would apply for upgrades staying on 
>> 12.04, but it indeed does.
>>
>> We found that existing instances could not be live migrated (as per 
>> the release notes). Additionally, instances that were hard-rebooted 
>> and had the libvirt xml file rebuilt could no longer start, either.
>>
>> The exact error message we saw was:
>>
>> "Length mismatch: vga.vram: 1000000 in != 800000"
>>
>> I found a few bugs that are related to this, but I don't think 
>> they're fully relevant to the issue I ran into:
>>
>> https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1308756
>> https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1291321
>> https://bugs.launchpad.net/nova/+bug/1312133
>>
>> We ended up downgrading to the stock Ubuntu 12.04 qemu 1.0 packages 
>> and everything is working nicely.
>>
>> I'm wondering if anyone else has run into this issue and how they 
>> dealt with it or plan to deal with it.
>>
>> Also, I'm curious as to why exactly qemu 1.x to 2.0 are incompatible 
>> with each other. Is this just an Ubuntu issue? Or is this native of qemu?
>>
>> Unless I'm missing something, this seems like a big deal. If we 
>> continue to use Ubuntu's OpenStack packages, we're basically stuck at 
>> 12.04 and Icehouse unless we have all users snapshot their instance 
>> and re-launch in a new cloud.
>>
>> Thanks,
>> Joe
>>
>>
>>
>> _______________________________________________
>> OpenStack-operators mailing list
>> OpenStack-operators at lists.openstack.orghttp://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20141021/67f4dbc5/attachment.html>


More information about the OpenStack-operators mailing list