[Openstack-operators] Problem with Heavy Network IO and Dnsmasq

Thomas Vachon vachon at sessionm.com
Wed Aug 15 12:24:01 UTC 2012

On Wed, Aug 15, 2012 at 8:00 AM, Narayan Desai <narayan.desai at gmail.com> wrote:
> On Wed, Aug 15, 2012 at 6:19 AM, Thomas Vachon <vachon at sessionm.com> wrote:
>> I reported this as a bug here: https://bugs.launchpad.net/nova/+bug/1037065
>> However, I was looking to see if anyone else has seen this.
>> <snip>
>> I was running a load test against a 4 node Cassandra cluster in
>> Openstack. I have separate tenancy for each node to ensure there was
>> no funny contention. Running the test 3 times produced the same
>> results each time.
>> About 1/3 of the way through the test, the dnsmasq process crashes
>> (with no warning or error in any log). The instance will continue
>> "working" but only inside of the VNC console as all outside
>> connectivity is now unroutable.
>> Here is a log from the dnsmasq process. The first two rows show that
>> dnsmasq was working, then it just fails to route correctly back to the
>> instance.
> I ran into some sort of virtio-net bug that manifested itself in a
> similar fashion recently. (Ubuntu Precise, fwiw). Basically, when
> moving large quantities of network traffic into VMs on some node types
> (but not others, oddly enough). In my case, it looked like dnsmasq was
> failing, but the process was still running; it had just stopped
> getting requests from the clients. Rebooting instances would bring
> them back into service.
> Can you bring the network back up via VNC? If so, this isn't the same
> as the issue I saw. If you can't then something is stuck in virtio. I
> was able to work around the problem by enabling the vhost_net module
> in the hypervisor.
>  -nld

I am also on Precise.  Downing/Up'ing the interface via VNC did work.
How exactly did you setup vhost_net, can you provide your libvirt?

