[openstack-dev] [Nova] [Neutron] How do we know a host is ready to have servers scheduled onto it?

Russell Bryant rbryant at redhat.com
Thu Dec 12 20:16:32 UTC 2013


On 12/12/2013 01:36 PM, Clint Byrum wrote:
> Excerpts from Kyle Mestery's message of 2013-12-12 09:53:57 -0800:
>> On Dec 12, 2013, at 11:44 AM, Jay Pipes <jaypipes at gmail.com> wrote:
>>> On 12/12/2013 12:36 PM, Clint Byrum wrote:
>>>> Excerpts from Russell Bryant's message of 2013-12-12 09:09:04 -0800:
>>>>> On 12/12/2013 12:02 PM, Clint Byrum wrote:
>>>>>> I've been chasing quite a few bugs in the TripleO automated bring-up
>>>>>> lately that have to do with failures because either there are no valid
>>>>>> hosts ready to have servers scheduled, or there are hosts listed and
>>>>>> enabled, but they can't bind to the network because for whatever reason
>>>>>> the L2 agent has not checked in with Neutron yet.
>>>>>>
>>>>>> This is only a problem in the first few minutes of a nova-compute host's
>>>>>> life. But it is critical for scaling up rapidly, so it is important for
>>>>>> me to understand how this is supposed to work.
>>>>>>
>>>>>> So I'm asking, is there a standard way to determine whether or not a
>>>>>> nova-compute is definitely ready to have things scheduled on it? This
>>>>>> can be via an API, or even by observing something on the nova-compute
>>>>>> host itself. I just need a definitive signal that "the compute host is
>>>>>> ready".
>>>>>
>>>>> If a nova compute host has registered itself to start having instances
>>>>> scheduled to it, it *should* be ready.  AFAIK, we're not doing any
>>>>> network sanity checks on startup, though.
>>>>>
>>>>> We already do some sanity checks on startup.  For example, nova-compute
>>>>> requires that it can talk to nova-conductor.  nova-compute will block on
>>>>> startup until nova-conductor is responding if they happened to be
>>>>> brought up at the same time.
>>>>>
>>>>> We could do something like this with a networking sanity check if
>>>>> someone could define what that check should look like.
>>>>>
>>>> Could we ask Neutron if our compute host has an L2 agent yet? That seems
>>>> like a valid sanity check.
>>>
>>> ++
>>>
>> This makes sense to me as well. Although, not all Neutron plugins have
>> an L2 agent, so I think the check needs to be more generic than that.
>> For example, the OpenDaylight MechanismDriver we have developed
>> doesn't need an agent. I also believe the Nicira plugin is agent-less,
>> perhaps there are others as well.
>>
>> And I should note, does this sort of integration also happen with cinder,
>> for example, when we're dealing with storage? Any other services which
>> have a requirement on startup around integration with nova as well?
>>
> 
> Does cinder actually have per-compute-host concerns? I admit to being a
> bit cinder-stupid here.

No, it doesn't.

> Anyway, it seems to me that any service that is compute-host aware
> should be able to respond to the compute host whether or not it is a)
> aware of it, and b) ready to serve on it.
> 
> For agent-less drivers that is easy, you just always return True. And
> for drivers with agents, you return false unless you can find an agent
> for the host.
> 
> So something like:
> 
> GET /host/%(compute-host-name)
> 
> And then in the response include a "ready" attribute that would signal
> whether all networks that should work there, can work there.
> 
> As a first pass, just polling until that is "ready" before nova-compute
> enables itself would solve the problems I see (and that I think users
> would see as a cloud provider scales out compute nodes). Longer term
> we would also want to aim at having notifications available for this
> so that nova-compute could subscribe to that notification bus and then
> disable itself if its agent ever goes away.
> 
> I opened this bug to track the issue. I suspect there are duplicates of
> it already reported, but would like to start clean to make sure it is
> analyzed fully and then we can use those other bugs as test cases and
> confirmation:
> 
> https://bugs.launchpad.net/nova/+bug/1260440

Sounds good.  I'm happy to do this in Nova, but we'll have to get the
Neutron API bit sorted out first.

-- 
Russell Bryant



More information about the OpenStack-dev mailing list