[OpenStack-Infra] Slow tempest runs F20 gate jobs

Daniel P. Berrange berrange at redhat.com
Wed Aug 13 11:08:46 UTC 2014


On Wed, Aug 13, 2014 at 11:04:14AM +0100, Daniel P. Berrange wrote:
> On Wed, Aug 13, 2014 at 03:23:55PM +1000, Ian Wienand wrote:
> > Hi,
> > 
> > I'm looking at Sean's change to run more of tempest on F20 [1].  To
> > test it I'm using [2] -- it has shown is that tempest on F20 is *way*
> > slower than on Ubuntu, to the point it times out.
> > 
> > I compared logs from [3] & [4] to come up with a comparison [5],
> > e.g. SecurityGroupsTestXML.test_server_security_groups goes from 28
> > seconds to 315 seconds (interestingly, it also showed up a bug that
> > Ubuntu isn't preparing the images for boto tests [6]).
> > 
> > I feel this has got to be due to some combination of nested virt v
> > binary translation.  Before I waste too much time, I thought I'd check
> > with the experts for any thoughts?  Is this a known issue?
> 
> I'm not sure I'd agree with the idea that the nested virt / binary
> translation is relevant. Presumably the Fedora and Ubuntu VMs are
> running on equivalent cloud infrastructure, so should exhibit the
> same approximate performance from the hypervisor. I'd be more
> inclined to look at differences in the software stack or setup
> between the two. As an example, Fedora is possibly exercising the
> libguestfs codepath for file injection while Ubuntu probably does
> the nbd/losetup codepath for file injection. The first run of
> libguestfs is always slow due to the need to build its appliance.
> This is cached thereafter in normal use, but since VMs for the
> gate are throw-away, the first VM spawn will always be slow with
> libguestfs.
> 
> I'd suggest analysing the nova compute log file to try and identify
> which stage of the test run is showing the delay. It might let you
> narrow it down to a specific part of VM spawn process for example.

Looking at the logs the first notable delay I see on Fedora 20 VM is

2014-08-13 00:34:31.239 INFO nova.virt.libvirt.firewall [req-88f904d9-b999-4445-85be-4264b37cdc0b SecurityGroupsTestXML-1188243981 SecurityGroupsTestXML-1642553084] [instanc
e: ec20f21f-896c-40fd-8f8d-353c864d62c9] Ensuring static filters
2014-08-13 00:35:26.394 DEBUG nova.virt.firewall [req-88f904d9-b999-4445-85be-4264b37cdc0b SecurityGroupsTestXML-1188243981 SecurityGroupsTestXML-1642553084] [instance: ec20
f21f-896c-40fd-8f8d-353c864d62c9] Security Groups....

ie it takes 1 minute to create the basic IP tables filters. Looking
further into the logs shows more delays at other points involving
iptables filters.

I'm practically certain that this is due to Fedora 20 using the 'firewalld'
daemon by default. The way libvirt talks to firewalld is very inefficient
(x18 slower than non-firewalld code path) and so could easily explain the
difference vs Ubuntu. See this blog post for more info:

  https://www.berrange.com/posts/2014/05/02/improving-libvirt-firewall-performance/

I did work to address this which is in libvirt 1.2.4 IIRC, which will be
in Fedora 21 (or virt-preview repository).

As a quick workaround you can try changing the Fedora 20 VM image you are
using to disable firewalld.service If you disable firewalld before libvirtd
is started, (or restart libvirtd.service), then it will fallback to the plain
iptables codepath which should match Ubuntu for performance.


Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|



More information about the OpenStack-Infra mailing list