Open Stack

Thu Jul 21 13:22:30 UTC 2016

Hi everyone,

Following [1], a few of us sat down during the last day of the Austin
Summit and discussed the possibility of adding formal support for
Tooz, specifically for the locking mechanism it provides. The
conclusion we reached was that benchmarks should be done to show if
and how Tooz affects the normal operation of Neutron (i.e. if locking
a resource using Zookeeper takes 3 seconds, it's not worthwhile at
all).

We've finally finished the benchmarks and they are available at [2].
They test a specific case: when creating an HA router a lock-free
algorithm is used to assign a vrid to a router (this is later used for
keepalived), and the benchmark specifically checks the effects of
locking that function with either Zookeeper or Etcd, using the no-Tooz
case as a baseline. The locking was checked in 2 different ways - one
which presents no contention (acquire() always succeeds immediately)
and one which presents contentions (acquire() may block until a
similar process for the invoking tenant is complete).

The benchmarks show that while using Tooz does raise the cost of an
operation, the effects are not as bad as we initially feared. In the
simple, single simultaneous request, using Zookeeper raised the
average time it took to create a router by 1.5% (from 11.811 to 11.988
seconds). On the more-realistic case of 6 simultaneous requests,
Zookeeper raised the cost by 3.74% (from 16.533 to 17.152 seconds).

It is important to note that the setup itself was overloaded - it was
built on a single baremetal hosting 5 VMs (4 of which were
controllers) and thus we were unable to go further - for example, 10
concurrent requests overloaded the server and caused some race
conditions to appear in the L3 scheduler (bugs will be opened soon),
so for this reason we haven't tested heavier samples and limited
ourselves to 6 simultaneous requests.

Also important to note that some kind of race condition was noticed in
tooz's etcd driver. We've discussed this with the tooz devs and
provided a patch that is supposed to fix them [3].
Lastly, races in the L3 HA Scheduler were found and we are yet to dig
into them and find out their cause - bugs will be opened for these as
well.

I've opened the summary [2] for comments so you're welcome to open a
discussion about the results both in the ML and on the doc itself.

(CC to all those who attended the Austin Summit meeting and other
interested parties)
Happy locking,

[1]: http://lists.openstack.org/pipermail/openstack-dev/2016-April/093199.html
[2]: https://docs.google.com/document/d/1jdI8gkQKBE0G9koR0nLiW02d5rwyWv_-gAp7yavt4w8
[3]: https://review.openstack.org/#/c/342096/

--
John Schwarz,
Senior Software Engineer,
Red Hat.

Open Stack

[openstack-dev] [neutron] [tooz] DLM benchmark results

OpenStack

Community

Documentation

Branding & Legal