Open Stack

Thu Jun 11 19:31:27 UTC 2015

Le 11/06/2015 18:52, Vilobh Meshram a écrit :
> Few more places which can trigger inconsistent behaviour.
>
> - 
> https://github.com/openstack/nova/blob/stable/kilo/nova/api/openstack/compute/contrib/services.py#L44
>
> - 
> https://github.com/openstack/nova/blob/stable/kilo/nova/api/openstack/compute/contrib/hypervisors.py#L98
>
> - 
> https://github.com/openstack/nova/blob/stable/kilo/nova/availability_zones.py#L130
>
> - 
> https://github.com/openstack/nova/blob/stable/kilo/nova/api/openstack/compute/contrib/availability_zone.py#L68
>
> - 
> https://github.com/openstack/nova/blob/stable/kilo/nova/api/openstack/compute/contrib/hosts.py#L88-L89
>
> - 
> https://github.com/openstack/nova/blob/stable/kilo/nova/compute/api.py#L3399-L3421.
>
>
> Blueprint which plans to fix this : 
> https://blueprints.launchpad.net/nova/+spec/servicegroup-api-control-plane
>
> Related Spec : 1) https://review.openstack.org/#/c/190322/
>
>      2) https://review.openstack.org/#/c/138607/
>
> -Vilobh
>
>

tl,dr: checking a Service (is_up) should only be for making sure we can 
send a message to it, but not for checking if the related hypervisor(s) 
is/are up. Having a reference in the services table mapping 1:1 to a 
reference in a separate datastore is fine by me.

So, I'm going to review the specs above and leave my comments there.
That said, I want to also point out some humble opinion about what 
should be the relationship between a Service and what could be called 
the "ServiceGroup API" (badly named IMHO since it only checks a service, 
not a group ;-) )

 From my perspective, the Service object is related to the AMQP service 
tied to the queue and... that's it.
That has nothing to do related to an hypervisor (since hypervisors can 
be distributed for a single service). That only represents the single 
point of failure for messages sent to a nova-compute service (and not a 
compute node, remember the distributed stuff) and since this is the only 
way to communicate with the related hypervisor(s), we have to know its 
status.

Again, that doesn't necessarly imply that if the service (who listens to 
the AMQP queue) is up, the hypervisors will be up as well, but that's 
enough strong to say that if it's down, we are sure that the 
hypervisor(s) won't receive messages.
Whether if the hypervisor is still continuing to work while the service 
is down is a corner case that the service status should not provide IMHO.

That's exactly why we need to consider that the service is a reference 
which can be used as it is for any relationship with a list of 
hypervisors (call that ComputeNode now) and checking its state (using 
any driver for it) should just be used for knowing if the message can be 
sent to it - *and not for checking if the related hypervisor(s) are 
running or not*

Given that disclaimer (which implies that we need to be very clear about 
when to wonder if is_up(service) ), I'm fine with considering the 
reference stored in DB (ie. the services table) as only a list of 
references pointing to a separate object which can be stored in any 
datastore (DB/Memcache/ZK/pick your favorite)

The only thing we need to make sure is that there is a 1:1 mapping 
between the 2 objects (eg. the DB "service" item and the "datastored" 
object) which can only be done logically.

My 2 cts,
-Sylvain

>
> On Mon, May 11, 2015 at 8:08 AM, Chris Friesen 
> <chris.friesen at windriver.com <mailto:chris.friesen at windriver.com>> wrote:
>
>     On 05/11/2015 07:13 AM, Attila Fazekas wrote:
>
>             From: "John Garbutt" <john at johngarbutt.com
>             <mailto:john at johngarbutt.com>>
>
>
>             * From the RPC api point of view, do we want to send a cast to
>             something that we know is dead, maybe we want to? Should
>             we wait for
>             calls to timeout, or give up quicker?
>
>
>         How to fail sooner:
>         https://bugs.launchpad.net/oslo.messaging/+bug/1437955
>
>         We do not need a dedicated is_up just for this.
>
>
>     Is that really going to help?  As I understand it if nova-compute
>     dies (or is isolated) then the queue remains present on the server
>     but nothing will process messages from it.
>
>     Chris
>
>
>     __________________________________________________________________________
>     OpenStack Development Mailing List (not for usage questions)
>     Unsubscribe:
>     OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>     <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150611/1d019077/attachment.html>

Open Stack

[openstack-dev] [nova] Service group foundations and features

OpenStack

Community

Documentation

Branding & Legal