[openstack-dev] [nova] Service group foundations and features
Attila Fazekas
afazekas at redhat.com
Mon May 11 13:13:15 UTC 2015
----- Original Message -----
> From: "John Garbutt" <john at johngarbutt.com>
> To: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org>
> Sent: Saturday, May 9, 2015 1:18:48 PM
> Subject: Re: [openstack-dev] [nova] Service group foundations and features
>
> On 7 May 2015 at 22:52, Joshua Harlow <harlowja at outlook.com> wrote:
> > Hi all,
> >
> > In seeing the following:
> >
> > - https://review.openstack.org/#/c/169836/
> > - https://review.openstack.org/#/c/163274/
> > - https://review.openstack.org/#/c/138607/
> >
> > Vilobh and I are starting to come to the conclusion that the service group
> > layers in nova really need to be cleaned up (without adding more features
> > that only work in one driver), or removed or other... Spec[0] has
> > interesting findings on this:
> >
> > A summary/highlights:
> >
> > * The zookeeper service driver in nova has probably been broken for 1 or
> > more releases, due to eventlet attributes that are gone that it via
> > evzookeeper[1] library was using. Evzookeeper only works for eventlet <
> > 0.17.1. Please refer to [0] for details.
> > * The memcache service driver really only uses memcache for a tiny piece of
> > the service liveness information (and does a database service table scan to
> > get the list of services). Please refer to [0] for details.
> > * Nova-manage service disable (CLI admin api) does interact with the
> > service
> > group layer for the 'is_up'[3] API (but it also does a database service
> > table scan[4] to get the list of services, so this is inconsistent with the
> > service group driver API 'get_all'[2] view on what is enabled/disabled).
> > Please refer to [9][10] for nova manage service enable disable for details.
> > * Nova service delete (REST api) seems to follow a similar broken pattern
> > (it also avoids calling into the service group layer to delete a service,
> > which means it only works with the database layer[5], and therefore is
> > inconsistent with the service group 'get_all'[2] API).
> >
> > ^^ Doing the above makes both disable/delete agnostic about other backends
> > available that may/might manage service group data for example zookeeper,
> > memcache, redis etc... Please refer [6][7] for details. Ideally the API
> > should follow the model used in [8] so that the extension, admin interface
> > as well as the API interface use the same servicegroup interface which
> > should be *fully* responsible for managing services. Doing so we will have
> > a
> > consistent view of services data, liveness, disabled/enabled and so-on...
> >
> > So with no disrespect to the authors of 169836 and 163274 (or anyone else
> > involved), I am wondering if we can put a request in to figure out how to
> > get the foundation of the service group concepts stabilized (or other...)
> > before adding more features (that only work with the DB layer).
> >
> > What is the path to request some kind of larger coordination effort by the
> > nova folks to fix the service group layers (and the concepts that are not
> > disjoint/don't work across them) before continuing to add features on-top
> > of
> > a 'shakey' foundation?
> >
> > If I could propose something it would probably work out like the following:
> >
> > Step 0: Figure out if the service group API + layer(s) should be
> > maintained/tweaked at all (nova-core decides?)
> >
> > If maintain it:
> >
> > - Have an agreement that nova service extension, admin
> > interface(nova-manage) and API go through a common path for
> > update/delete/read.
> > * This common path should likely be the servicegroup API so as to have a
> > consistent view of data and that also helps nova to add different
> > data-stores (keeping the services data in a DB and getting numerous updates
> > about liveliness every few seconds of N number of compute where N is pretty
> > high can be detrimental to Nova's performance)
> > - At the same time allow 163274 to be worked on (since it fixes a
> > edge-case
> > that was asked about in the initial addition of the delete API in its
> > initial code commit @ https://review.openstack.org/#/c/39998/)
> > - Delay 169836 until the above two/three are fixed (and stabilized); it's
> > down concept (and all other usages of services that are hitting a database
> > mentioned above) will need to go through the same service group foundation
> > that is currently being skipped.
> >
> > Else:
> > - Discard 138607 and start removing the service group code (and just use
> > the DB for all the things).
> > - Allow 163274 and 138607 (since those would be additions on-top of the
> > DB
> > layer that will be preserved).
> >
> > Thoughts?
>
> I wonder about this approach:
>
> * I think we need to go back and document what we want from the
> "service group" concept.
> * Then we look at the best approach to implement that concept.
> * Then look at the best way to get to a happy place from where we are now,
> ** Noting we will need "live" upgrade for (at least) the most widely
> used drivers
>
> Does that make any sense?
>
> Things that pop into my head, include:
> * The operators have been asking questions like: "Should new services
> not be "disabled" by default?" and "Can't my admins tell you that I
> just killed it?"
> * And from the scheduler point of view, how do we interact with the
> provider that tells us if something is alive or not?
> * From the RPC api point of view, do we want to send a cast to
> something that we know is dead, maybe we want to? Should we wait for
> calls to timeout, or give up quicker?
How to fail sooner:
https://bugs.launchpad.net/oslo.messaging/+bug/1437955
We do not need a dedicated is_up just for this.
> * Polling the DB kinda sucks, although it sorta works for small
> deploys (and cells based deploys), being a separate DB to Nova would
> help some, should we force another external dependency for all users
> to deal with? Its hard enough to set things up already.
If the extra dependency can be set up in a working way even in all-in-one
deployment, I think it is ok to `force` it.
>
> Thanks,
> John
>
> > - Josh (and Vilobh, who is spending the most time on this recently)
> >
> > [0] Replace service group with tooz :
> > https://review.openstack.org/#/c/138607/
> > [1] https://pypi.python.org/pypi/evzookeeper/
> > [2]
> > https://github.com/openstack/nova/blob/stable/kilo/nova/servicegroup/api.py#L93
> > [3]
> > https://github.com/openstack/nova/blob/stable/kilo/nova/servicegroup/api.py#L87
> > [4] https://github.com/openstack/nova/blob/master/nova/cmd/manage.py#L711
> > [5]
> > https://github.com/openstack/nova/blob/master/nova/api/openstack/compute/contrib/services.py#L106
> > [6]
> > https://github.com/openstack/nova/blob/master/nova/api/openstack/compute/contrib/services.py#L107
> > [7] https://github.com/openstack/nova/blob/master/nova/compute/api.py#L3436
> > [8]
> > https://github.com/openstack/nova/blob/master/nova/api/openstack/compute/contrib/services.py#L61
> > [9] Nova manage enable :
> > https://github.com/openstack/nova/blob/master/nova/cmd/manage.py#L742
> > [10] Nova manage disable :
> > https://github.com/openstack/nova/blob/master/nova/cmd/manage.py#L756
> >
> >
> > __________________________________________________________________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
More information about the OpenStack-dev
mailing list