Open Stack

Thu Dec 12 18:10:02 UTC 2013

On 12/12/2013 12:53 PM, Kyle Mestery wrote:
> On Dec 12, 2013, at 11:44 AM, Jay Pipes <jaypipes at gmail.com> wrote:
>> On 12/12/2013 12:36 PM, Clint Byrum wrote:
>>> Excerpts from Russell Bryant's message of 2013-12-12 09:09:04 -0800:
>>>> On 12/12/2013 12:02 PM, Clint Byrum wrote:
>>>>> I've been chasing quite a few bugs in the TripleO automated bring-up
>>>>> lately that have to do with failures because either there are no valid
>>>>> hosts ready to have servers scheduled, or there are hosts listed and
>>>>> enabled, but they can't bind to the network because for whatever reason
>>>>> the L2 agent has not checked in with Neutron yet.
>>>>>
>>>>> This is only a problem in the first few minutes of a nova-compute host's
>>>>> life. But it is critical for scaling up rapidly, so it is important for
>>>>> me to understand how this is supposed to work.
>>>>>
>>>>> So I'm asking, is there a standard way to determine whether or not a
>>>>> nova-compute is definitely ready to have things scheduled on it? This
>>>>> can be via an API, or even by observing something on the nova-compute
>>>>> host itself. I just need a definitive signal that "the compute host is
>>>>> ready".
>>>>
>>>> If a nova compute host has registered itself to start having instances
>>>> scheduled to it, it *should* be ready.  AFAIK, we're not doing any
>>>> network sanity checks on startup, though.
>>>>
>>>> We already do some sanity checks on startup.  For example, nova-compute
>>>> requires that it can talk to nova-conductor.  nova-compute will block on
>>>> startup until nova-conductor is responding if they happened to be
>>>> brought up at the same time.
>>>>
>>>> We could do something like this with a networking sanity check if
>>>> someone could define what that check should look like.
>>>>
>>> Could we ask Neutron if our compute host has an L2 agent yet? That seems
>>> like a valid sanity check.
>>
>> ++
>>
> This makes sense to me as well. Although, not all Neutron plugins have
> an L2 agent, so I think the check needs to be more generic than that.
> For example, the OpenDaylight MechanismDriver we have developed
> doesn't need an agent. I also believe the Nicira plugin is agent-less,
> perhaps there are others as well.
>
> And I should note, does this sort of integration also happen with cinder,
> for example, when we're dealing with storage? Any other services which
> have a requirement on startup around integration with nova as well?

Right, it's more general than "is the L2 agent alive and running". It's 
more about having each service understand the relative dependencies it 
has on other supporting services.

For instance, have each service implement a:

GET /healthcheck

that would return either a 200 OK or 409 Conflict with the body 
containing a list of service types that it is waiting to hear back from 
in order to provide a 200 OK for itself.

Anyway, just some thoughts...

-jay

Open Stack

[openstack-dev] [Nova] [Neutron] How do we know a host is ready to have servers scheduled onto it?

OpenStack

Community

Documentation

Branding & Legal