[openstack-dev] [oslo][devstack][all] ZooKeeper vs etcd for Tooz/DLM

Clint Byrum clint at fewbar.com
Wed Mar 15 06:51:57 UTC 2017

Excerpts from Jay Pipes's message of 2017-03-14 22:13:32 -0400:
> On 03/14/2017 05:01 PM, Clint Byrum wrote:
> > Excerpts from Jay Pipes's message of 2017-03-14 15:30:32 -0400:
> >> On 03/14/2017 02:50 PM, Julien Danjou wrote:
> >>> On Tue, Mar 14 2017, Jay Pipes wrote:
> >>>
> >>>> Not tooz, because I'm not interested in a DLM nor leader election library
> >>>> (that's what the underlying etcd3 cluster handles for me), only a fast service
> >>>> liveness/healthcheck system, but it shows usage of etcd3 and Google Protocol
> >>>> Buffers implementing a simple API for liveness checking and host maintenance
> >>>> reporting.
> >>>
> >>> Cool cool. So that's the same feature that we implemented in tooz 3
> >>> years ago. It's called "group membership". You create a group, make
> >>> nodes join it, and you know who's dead/alive and get notified when their
> >>> status change.
> >>
> >> The point of os-lively is not to provide a thin API over ZooKeeper's
> >> group membership interface. The point of os-lively is to remove the need
> >> to have a database (RDBMS) record of a service in Nova.
> >
> > That's also the point of tooz's group membership API:
> >
> > https://docs.openstack.org/developer/tooz/compatibility.html#grouping
> Did you take a look at the code I wrote in os-lively? What part of the 
> tooz group membership API do you think I would have used?
> Again, this was a weekend project that I was moving fast on. I looked at 
> tooz and didn't see how I could use it for my purposes, which was to 
> store a versioned object in a consistent key/value store with support 
> for transactional semantics when storing index and data records at the 
> same time [1]
> https://github.com/jaypipes/os-lively/blob/master/os_lively/service.py#L468-L511
> etcd3 -- and specifically etcd3, not etcd2 -- supports the transactional 
> semantics in a consistent key/value store that I needed.

ZK has all the primitives necessary, and the client libs behind tooz for
it have transaction support basically identical to etcd3's:


> tooz is cool, but it's not what I was looking for. It's solving a 
> different problem than I was trying to solve.
> This isn't a case of NIH, despite what Julien is trying to intimate in 
> his emails.

Yeah I don't want to imply that either. I'm trying to figure out how we
can add what you're doing to tooz, not why you didn't see something or
why you reinvented something.

> >> tooz simply abstracts a group membership API across a number of drivers.
> >> I don't need that. I need a way to maintain a service record (with
> >> maintenance period information, region, and an evolvable data record
> >> format) and query those service records in an RDBMS-like manner but
> >> without the RDBMS being involved.
> >>
> >>>> servicegroup API with os-lively and eliminate Nova's use of an RDBMS for
> >>>> service liveness checking, which should dramatically reduce the amount of both
> >>>> DB traffic as well as conductor/MQ service update traffic.
> >>>
> >>> Interesting. Joshua and Vilob tried to push usage of tooz group
> >>> membership a couple of years ago, but it got nowhere. Well, no, they got
> >>> 2 specs written IIRC:
> >>>
> >>>   https://specs.openstack.org/openstack/nova-specs/specs/liberty/approved/service-group-using-tooz.html
> >>>
> >>> But then it died for whatever reasons on Nova side.
> >>
> >> It died because it didn't actually solve a problem.
> >>
> >> The problem is that even if we incorporate tooz, we would still need to
> >> have a service table in the RDBMS and continue to query it over and over
> >> again in the scheduler and API nodes.
> >
> > Most likely it was designed with hesitance to have a tooz requirement
> > to be a source of truth. But it's certainly not a problem for most tooz
> > backends to be a source of truth. Certainly not for etcd or ZK, which
> > are both designed to be that.
> >
> >> I want all service information in the same place, and I don't want to
> >> use an RDBMS for that information. etcd3 provides an ideal place to
> >> store service record information. Google Protocol Buffers is an ideal
> >> data format for evolvable versioned objects. os-lively presents an API
> >> that solves the problem I want to solve in Nova. tooz didn't.
> >
> > Was there something inherent in tooz's design that prevented you from
> > adding it to tooz's group API? Said API already includes liveness (watch
> > the group that corresponds to the service you want).
> See above about transactional semantics.
> I'm actually happy to add an etcd3 group membership driver to tooz, 
> though. After the experience gained this weekend using etcd3, I'd like 
> to do that.


> Still doesn't mean that tooz would be the appropriate choice for what I 
> was trying to do with os-lively, though.

Monty makes a pretty strong case for hiding what you're doing behind tooz
while we sort out operator/developer dissonance. I think 15 months since
Tokyo means we probably should take the operator community's temperature
again about etcd. It's likely it has appeared in many shops' world and
hasn't let them down in that time.

> > The only thing missing is being able to get groups and group members
> > by secondary indexes. etcd3's built in indexes by field are pretty nice
> Not sure what you're talking about. etcd3 doesn't have any indexing by 
> field. I built the os-lively library primarily as a well-defined set of 
> index overlays (by uuid, by host, by service type, and by region) over 
> etcd3's key/value store.

I skimmed your code a bit too fast and thought it was maintaining indexes
for you. Derp.

> > for that, but ZK can likely also do it too by maintaining the index in
> > the driver.
> Maybe, I'm not sure, I didn't spend much time this weekend looking at 
> ZooKeeper.

I'm certain it can now that I've taken a closer look at your update

> > I understand abstractions can seem pretty cumbersome when you're moving
> > fast. It's not something I want to see stand in your way. But it would
> > be nice to see where there's deficiency in tooz so we can be there for
> > the next project that needs it and maybe eventually factor out direct
> > etcd3 usage so users who have maybe chosen ZK as their tooz backend can
> > also benefit from your work.
> It's not a deficiency in tooz. It's a different problem domain. Look at 
> the os-lively API and show me how you think I could have used tooz to 
> implement that API.

I believe it is a deficiency in tooz, because it's for hiding coordinator
deployment choices from developer needs. It didn't suit your needs,
therefore it is deficient. And to be clear, I don't think there's
anything wrong with just going deep down into a real service like etcd3.
Abstractions are best when they're added to solve a real problem, not
when they're inserted to plan for a perceived one. Tooz, though, does
solve a real problem, which is that we haven't yet been able to pick just
one coordination service (which is what we should say instead of DLM).

Currently coordination only has member_ids, which is effectively the
primary key. Group_id is the way one gets the members of a group. But
I believe arbitrary fields could be added to the API and maintained
as secondary indexes, exactly as you've done in os-lively with etcd3.
Some optional filter argument for get_* and watch_* would allow using
the indexes for reads.

So yes, I believe if you add an etcd3 driver, we should be able to grow
tooz's API to accommodate transactional updates and reads for secondary
indexes, and I think ZK would also be able to do it. I don't know about
consul, but that one seems to have fallen out of favor.

More information about the OpenStack-dev mailing list