[openstack-dev] Discussion about where to put database for bare-metal provisioning (review 10726)

Vishvananda Ishaya vishvananda at gmail.com
Fri Aug 24 22:42:32 UTC 2012


I would investigate changing the capabilities to key off of something other than hostname. It looks from the table structure like compute_nodes could be have a many-to-one relationship with services. You would just have to use a little more than hostname. Perhaps (hostname, hypervisor_hostname) could be used to update the entry?

Vish

On Aug 24, 2012, at 11:23 AM, David Kang <dkang at isi.edu> wrote:

> 
>  Vish,
> 
>  I've tested your code and did more testing.
> There are a couple of problems.
> 1. host name should be unique. If not, any repetitive updates of new capabilities with the same host name are simply overwritten.
> 2. We cannot generate arbitrary host names on the fly.
>   The scheduler (I tested filter scheduler) gets host names from db.
>   So, if a host name is not in the 'services' table, it is not considered by the scheduler at all.
> 
> So, to make your suggestions possible, nova-compute should register N different host names in 'services' table,
> and N corresponding entries in 'compute_nodes' table.
> Here is an example:
> 
> mysql> select id, host, binary, topic, report_count, disabled, availability_zone from services;
> +----+-------------+----------------+-----------+--------------+----------+-------------------+
> | id | host        | binary         | topic     | report_count | disabled | availability_zone |
> +----+-------------+----------------+-----------+--------------+----------+-------------------+
> |  1 | bespin101   | nova-scheduler | scheduler |        17145 |        0 | nova              |
> |  2 | bespin101   | nova-network   | network   |        16819 |        0 | nova              |
> |  3 | bespin101-0 | nova-compute   | compute   |        16405 |        0 | nova              |
> |  4 | bespin101-1 | nova-compute   | compute   |            1 |        0 | nova              |
> +----+-------------+----------------+-----------+--------------+----------+-------------------+
> 
> mysql> select id, service_id, hypervisor_hostname from compute_nodes;
> +----+------------+------------------------+
> | id | service_id | hypervisor_hostname    |
> +----+------------+------------------------+
> |  1 |          3 | bespin101.east.isi.edu |
> |  2 |          4 | bespin101.east.isi.edu |
> +----+------------+------------------------+
> 
>  Then, nova db (compute_nodes table) has entries of all bare-metal nodes.
> What do you think of this approach.
> Do you have any better approach?
> 
>  Thanks,
>  David
> 
> 
> 
> ----- Original Message -----
>> To elaborate, something the below. I'm not absolutely sure you need to
>> be able to set service_name and host, but this gives you the option to
>> do so if needed.
>> 
>> iff --git a/nova/manager.py b/nova/manager.py
>> index c6711aa..c0f4669 100644
>> --- a/nova/manager.py
>> +++ b/nova/manager.py
>> @@ -217,6 +217,8 @@ class SchedulerDependentManager(Manager):
>> 
>> def update_service_capabilities(self, capabilities):
>> """Remember these capabilities to send on next periodic update."""
>> + if not isinstance(capabilities, list):
>> + capabilities = [capabilities]
>> self.last_capabilities = capabilities
>> 
>> @periodic_task
>> @@ -224,5 +226,8 @@ class SchedulerDependentManager(Manager):
>> """Pass data back to the scheduler at a periodic interval."""
>> if self.last_capabilities:
>> LOG.debug(_('Notifying Schedulers of capabilities ...'))
>> - self.scheduler_rpcapi.update_service_capabilities(context,
>> - self.service_name, self.host, self.last_capabilities)
>> + for capability_item in self.last_capabilities:
>> + name = capability_item.get('service_name', self.service_name)
>> + host = capability_item.get('host', self.host)
>> + self.scheduler_rpcapi.update_service_capabilities(context,
>> + name, host, capability_item)
>> 
>> On Aug 21, 2012, at 1:28 PM, David Kang <dkang at isi.edu> wrote:
>> 
>>> 
>>>  Hi Vish,
>>> 
>>>  We are trying to change our code according to your comment.
>>> I want to ask a question.
>>> 
>>>>>> a) modify driver.get_host_stats to be able to return a list of
>>>>>> host
>>>>>> stats instead of just one. Report the whole list back to the
>>>>>> scheduler. We could modify the receiving end to accept a list as
>>>>>> well
>>>>>> or just make multiple calls to
>>>>>> self.update_service_capabilities(capabilities)
>>> 
>>>  Modifying driver.get_host_stats to return a list of host stats is
>>>  easy.
>>> Calling muliple calls to
>>> self.update_service_capabilities(capabilities) doesn't seem to work,
>>> because 'capabilities' is overwritten each time.
>>> 
>>>  Modifying the receiving end to accept a list seems to be easy.
>>> However, 'capabilities' is assumed to be dictionary by all other
>>> scheduler routines,
>>> it looks like that we have to change all of them to handle
>>> 'capability' as a list of dictionary.
>>> 
>>>  If my understanding is correct, it would affect many parts of the
>>>  scheduler.
>>> Is it what you recommended?
>>> 
>>>  Thanks,
>>>  David
>>> 
>>> 
>>> ----- Original Message -----
>>>> This was an immediate goal, the bare-metal nova-compute node could
>>>> keep an internal database, but report capabilities through nova in
>>>> the
>>>> common way with the changes below. Then the scheduler wouldn't need
>>>> access to the bare metal database at all.
>>>> 
>>>> On Aug 15, 2012, at 4:23 PM, David Kang <dkang at isi.edu> wrote:
>>>> 
>>>>> 
>>>>> Hi Vish,
>>>>> 
>>>>> Is this discussion for long-term goal or for this Folsom release?
>>>>> 
>>>>> We still believe that bare-metal database is needed
>>>>> because there is not an automated way how bare-metal nodes report
>>>>> their capabilities
>>>>> to their bare-metal nova-compute node.
>>>>> 
>>>>> Thanks,
>>>>> David
>>>>> 
>>>>>> 
>>>>>> I am interested in finding a solution that enables bare-metal and
>>>>>> virtualized requests to be serviced through the same scheduler
>>>>>> where
>>>>>> the compute_nodes table has a full view of schedulable resources.
>>>>>> This
>>>>>> would seem to simplify the end-to-end flow while opening up some
>>>>>> additional use cases (e.g. dynamic allocation of a node from
>>>>>> bare-metal to hypervisor and back).
>>>>>> 
>>>>>> One approach would be to have a proxy running a single
>>>>>> nova-compute
>>>>>> daemon fronting the bare-metal nodes . That nova-compute daemon
>>>>>> would
>>>>>> report up many HostState objects (1 per bare-metal node) to
>>>>>> become
>>>>>> entries in the compute_nodes table and accessible through the
>>>>>> scheduler HostManager object.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> The HostState object would set cpu_info, vcpus, member_mb and
>>>>>> local_gb
>>>>>> values to be used for scheduling with the hypervisor_host field
>>>>>> holding the bare-metal machine address (e.g. for IPMI based
>>>>>> commands)
>>>>>> and hypervisor_type = NONE. The bare-metal Flavors are created
>>>>>> with
>>>>>> an
>>>>>> extra_spec of hypervisor_type= NONE and the corresponding
>>>>>> compute_capabilities_filter would reduce the available hosts to
>>>>>> those
>>>>>> bare_metal nodes. The scheduler would need to understand that
>>>>>> hypervisor_type = NONE means you need an exact fit (or best-fit)
>>>>>> host
>>>>>> vs weighting them (perhaps through the multi-scheduler). The
>>>>>> scheduler
>>>>>> would cast out the message to the <topic>.<service-hostname>
>>>>>> (code
>>>>>> today uses the HostState hostname), with the compute driver
>>>>>> having
>>>>>> to
>>>>>> understand if it must be serviced elsewhere (but does not break
>>>>>> any
>>>>>> existing implementations since it is 1 to 1).
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Does this solution seem workable? Anything I missed?
>>>>>> 
>>>>>> The bare metal driver already is proxying for the other nodes so
>>>>>> it
>>>>>> sounds like we need a couple of things to make this happen:
>>>>>> 
>>>>>> 
>>>>>> a) modify driver.get_host_stats to be able to return a list of
>>>>>> host
>>>>>> stats instead of just one. Report the whole list back to the
>>>>>> scheduler. We could modify the receiving end to accept a list as
>>>>>> well
>>>>>> or just make multiple calls to
>>>>>> self.update_service_capabilities(capabilities)
>>>>>> 
>>>>>> 
>>>>>> b) make a few minor changes to the scheduler to make sure
>>>>>> filtering
>>>>>> still works. Note the changes here may be very helpful:
>>>>>> 
>>>>>> 
>>>>>> https://review.openstack.org/10327
>>>>>> 
>>>>>> 
>>>>>> c) we have to make sure that instances launched on those nodes
>>>>>> take
>>>>>> up
>>>>>> the entire host state somehow. We could probably do this by
>>>>>> making
>>>>>> sure that the instance_type ram, mb, gb etc. matches what the
>>>>>> node
>>>>>> has, but we may want a new boolean field "used" if those aren't
>>>>>> sufficient.
>>>>>> 
>>>>>> 
>>>>>> I This approach seems pretty good. We could potentially get rid
>>>>>> of
>>>>>> the
>>>>>> shared bare_metal_node table. I guess the only other concern is
>>>>>> how
>>>>>> you populate the capabilities that the bare metal nodes are
>>>>>> reporting.
>>>>>> I guess an api extension that rpcs to a baremetal node to add the
>>>>>> node. Maybe someday this could be autogenerated by the bare metal
>>>>>> host
>>>>>> looking in its arp table for dhcp requests! :)
>>>>>> 
>>>>>> 
>>>>>> Vish
>>>>>> 
>>>>>> _______________________________________________
>>>>>> OpenStack-dev mailing list
>>>>>> OpenStack-dev at lists.openstack.org
>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>> 
>>>>> _______________________________________________
>>>>> OpenStack-dev mailing list
>>>>> OpenStack-dev at lists.openstack.org
>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>> 
>>>> 
>>>> _______________________________________________
>>>> OpenStack-dev mailing list
>>>> OpenStack-dev at lists.openstack.org
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>> 
>>> _______________________________________________
>>> OpenStack-dev mailing list
>>> OpenStack-dev at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> 
>> 
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




More information about the OpenStack-dev mailing list