[openstack-dev] [nova] Service group foundations and features

John Garbutt john at johngarbutt.com
Sat May 9 11:18:48 UTC 2015


On 7 May 2015 at 22:52, Joshua Harlow <harlowja at outlook.com> wrote:
> Hi all,
>
> In seeing the following:
>
> - https://review.openstack.org/#/c/169836/
> - https://review.openstack.org/#/c/163274/
> - https://review.openstack.org/#/c/138607/
>
> Vilobh and I are starting to come to the conclusion that the service group
> layers in nova really need to be cleaned up (without adding more features
> that only work in one driver), or removed or other... Spec[0] has
> interesting findings on this:
>
> A summary/highlights:
>
> * The zookeeper service driver in nova has probably been broken for 1 or
> more releases, due to eventlet attributes that are gone that it via
> evzookeeper[1] library was using. Evzookeeper only works for eventlet <
> 0.17.1. Please refer to [0] for details.
> * The memcache service driver really only uses memcache for a tiny piece of
> the service liveness information (and does a database service table scan to
> get the list of services). Please refer to [0] for details.
> * Nova-manage service disable (CLI admin api) does interact with the service
> group layer for the 'is_up'[3] API (but it also does a database service
> table scan[4] to get the list of services, so this is inconsistent with the
> service group driver API 'get_all'[2] view on what is enabled/disabled).
> Please refer to [9][10] for nova manage service enable disable for details.
>   * Nova service delete (REST api) seems to follow a similar broken pattern
> (it also avoids calling into the service group layer to delete a service,
> which means it only works with the database layer[5], and therefore is
> inconsistent with the service group 'get_all'[2] API).
>
> ^^ Doing the above makes both disable/delete agnostic about other backends
> available that may/might manage service group data for example zookeeper,
> memcache, redis etc... Please refer [6][7] for details. Ideally the API
> should follow the model used in [8] so that the extension, admin interface
> as well as the API interface use the same servicegroup interface which
> should be *fully* responsible for managing services. Doing so we will have a
> consistent view of services data, liveness, disabled/enabled and so-on...
>
> So with no disrespect to the authors of 169836 and 163274 (or anyone else
> involved), I am wondering if we can put a request in to figure out how to
> get the foundation of the service group concepts stabilized (or other...)
> before adding more features (that only work with the DB layer).
>
> What is the path to request some kind of larger coordination effort by the
> nova folks to fix the service group layers (and the concepts that are not
> disjoint/don't work across them) before continuing to add features on-top of
> a 'shakey' foundation?
>
> If I could propose something it would probably work out like the following:
>
> Step 0: Figure out if the service group API + layer(s) should be
> maintained/tweaked at all (nova-core decides?)
>
> If maintain it:
>
>  - Have an agreement that nova service extension, admin
> interface(nova-manage) and API go through a common path for
> update/delete/read.
>   * This common path should likely be the servicegroup API so as to have a
> consistent view of data and that also helps nova to add different
> data-stores (keeping the services data in a DB and getting numerous updates
> about liveliness every few seconds of N number of compute where N is pretty
> high can be detrimental to Nova's performance)
>  - At the same time allow 163274 to be worked on (since it fixes a edge-case
> that was asked about in the initial addition of the delete API in its
> initial code commit @ https://review.openstack.org/#/c/39998/)
>  - Delay 169836 until the above two/three are fixed (and stabilized); it's
> down concept (and all other usages of services that are hitting a database
> mentioned above) will need to go through the same service group foundation
> that is currently being skipped.
>
> Else:
>   - Discard 138607 and start removing the service group code (and just use
> the DB for all the things).
>   - Allow 163274 and 138607 (since those would be additions on-top of the DB
> layer that will be preserved).
>
> Thoughts?

I wonder about this approach:

* I think we need to go back and document what we want from the
"service group" concept.
* Then we look at the best approach to implement that concept.
* Then look at the best way to get to a happy place from where we are now,
** Noting we will need "live" upgrade for (at least) the most widely
used drivers

Does that make any sense?

Things that pop into my head, include:
* The operators have been asking questions like: "Should new services
not be "disabled" by default?" and "Can't my admins tell you that I
just killed it?"
* And from the scheduler point of view, how do we interact with the
provider that tells us if something is alive or not?
* From the RPC api point of view, do we want to send a cast to
something that we know is dead, maybe we want to? Should we wait for
calls to timeout, or give up quicker?
* Polling the DB kinda sucks, although it sorta works for small
deploys (and cells based deploys), being a separate DB to Nova would
help some, should we force another external dependency for all users
to deal with? Its hard enough to set things up already.

Thanks,
John

> - Josh (and Vilobh, who is spending the most time on this recently)
>
> [0] Replace service group with tooz :
> https://review.openstack.org/#/c/138607/
> [1] https://pypi.python.org/pypi/evzookeeper/
> [2]
> https://github.com/openstack/nova/blob/stable/kilo/nova/servicegroup/api.py#L93
> [3]
> https://github.com/openstack/nova/blob/stable/kilo/nova/servicegroup/api.py#L87
> [4] https://github.com/openstack/nova/blob/master/nova/cmd/manage.py#L711
> [5]
> https://github.com/openstack/nova/blob/master/nova/api/openstack/compute/contrib/services.py#L106
> [6]
> https://github.com/openstack/nova/blob/master/nova/api/openstack/compute/contrib/services.py#L107
> [7] https://github.com/openstack/nova/blob/master/nova/compute/api.py#L3436
> [8]
> https://github.com/openstack/nova/blob/master/nova/api/openstack/compute/contrib/services.py#L61
> [9] Nova manage enable :
> https://github.com/openstack/nova/blob/master/nova/cmd/manage.py#L742
> [10] Nova manage disable :
> https://github.com/openstack/nova/blob/master/nova/cmd/manage.py#L756
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list