[openstack-dev] Discussion about where to put database for bare-metal provisioning (review 10726)

Vishvananda Ishaya vishvananda at gmail.com
Wed Aug 22 04:58:37 UTC 2012


To elaborate, something the below. I'm not absolutely sure you need to be able to set service_name and host, but this gives you the option to do so if needed.

iff --git a/nova/manager.py b/nova/manager.py
index c6711aa..c0f4669 100644
--- a/nova/manager.py
+++ b/nova/manager.py
@@ -217,6 +217,8 @@ class SchedulerDependentManager(Manager):
 
     def update_service_capabilities(self, capabilities):
         """Remember these capabilities to send on next periodic update."""
+        if not isinstance(capabilities, list):
+            capabilities = [capabilities]
         self.last_capabilities = capabilities
 
     @periodic_task
@@ -224,5 +226,8 @@ class SchedulerDependentManager(Manager):
         """Pass data back to the scheduler at a periodic interval."""
         if self.last_capabilities:
             LOG.debug(_('Notifying Schedulers of capabilities ...'))
-            self.scheduler_rpcapi.update_service_capabilities(context,
-                    self.service_name, self.host, self.last_capabilities)
+            for capability_item in self.last_capabilities:
+                name = capability_item.get('service_name', self.service_name)
+                host = capability_item.get('host', self.host)
+                self.scheduler_rpcapi.update_service_capabilities(context,
+                        name, host, capability_item)

On Aug 21, 2012, at 1:28 PM, David Kang <dkang at isi.edu> wrote:

> 
>  Hi Vish,
> 
>  We are trying to change our code according to your comment.
> I want to ask a question.
> 
>>>> a) modify driver.get_host_stats to be able to return a list of host
>>>> stats instead of just one. Report the whole list back to the
>>>> scheduler. We could modify the receiving end to accept a list as
>>>> well
>>>> or just make multiple calls to
>>>> self.update_service_capabilities(capabilities)
> 
>  Modifying driver.get_host_stats to return a list of host stats is easy.
> Calling muliple calls to self.update_service_capabilities(capabilities) doesn't seem to work,
> because 'capabilities' is overwritten each time.
> 
>  Modifying the receiving end to accept a list seems to be easy.
> However, 'capabilities' is assumed to be dictionary by all other scheduler routines,
> it looks like that we have to change all of them to handle 'capability' as a list of dictionary.
> 
>  If my understanding is correct, it would affect many parts of the scheduler.
> Is it what you recommended?
> 
>  Thanks,
>  David
>  
> 
> ----- Original Message -----
>> This was an immediate goal, the bare-metal nova-compute node could
>> keep an internal database, but report capabilities through nova in the
>> common way with the changes below. Then the scheduler wouldn't need
>> access to the bare metal database at all.
>> 
>> On Aug 15, 2012, at 4:23 PM, David Kang <dkang at isi.edu> wrote:
>> 
>>> 
>>> Hi Vish,
>>> 
>>> Is this discussion for long-term goal or for this Folsom release?
>>> 
>>> We still believe that bare-metal database is needed
>>> because there is not an automated way how bare-metal nodes report
>>> their capabilities
>>> to their bare-metal nova-compute node.
>>> 
>>> Thanks,
>>> David
>>> 
>>>> 
>>>> I am interested in finding a solution that enables bare-metal and
>>>> virtualized requests to be serviced through the same scheduler
>>>> where
>>>> the compute_nodes table has a full view of schedulable resources.
>>>> This
>>>> would seem to simplify the end-to-end flow while opening up some
>>>> additional use cases (e.g. dynamic allocation of a node from
>>>> bare-metal to hypervisor and back).
>>>> 
>>>> One approach would be to have a proxy running a single nova-compute
>>>> daemon fronting the bare-metal nodes . That nova-compute daemon
>>>> would
>>>> report up many HostState objects (1 per bare-metal node) to become
>>>> entries in the compute_nodes table and accessible through the
>>>> scheduler HostManager object.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> The HostState object would set cpu_info, vcpus, member_mb and
>>>> local_gb
>>>> values to be used for scheduling with the hypervisor_host field
>>>> holding the bare-metal machine address (e.g. for IPMI based
>>>> commands)
>>>> and hypervisor_type = NONE. The bare-metal Flavors are created with
>>>> an
>>>> extra_spec of hypervisor_type= NONE and the corresponding
>>>> compute_capabilities_filter would reduce the available hosts to
>>>> those
>>>> bare_metal nodes. The scheduler would need to understand that
>>>> hypervisor_type = NONE means you need an exact fit (or best-fit)
>>>> host
>>>> vs weighting them (perhaps through the multi-scheduler). The
>>>> scheduler
>>>> would cast out the message to the <topic>.<service-hostname> (code
>>>> today uses the HostState hostname), with the compute driver having
>>>> to
>>>> understand if it must be serviced elsewhere (but does not break any
>>>> existing implementations since it is 1 to 1).
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Does this solution seem workable? Anything I missed?
>>>> 
>>>> The bare metal driver already is proxying for the other nodes so it
>>>> sounds like we need a couple of things to make this happen:
>>>> 
>>>> 
>>>> a) modify driver.get_host_stats to be able to return a list of host
>>>> stats instead of just one. Report the whole list back to the
>>>> scheduler. We could modify the receiving end to accept a list as
>>>> well
>>>> or just make multiple calls to
>>>> self.update_service_capabilities(capabilities)
>>>> 
>>>> 
>>>> b) make a few minor changes to the scheduler to make sure filtering
>>>> still works. Note the changes here may be very helpful:
>>>> 
>>>> 
>>>> https://review.openstack.org/10327
>>>> 
>>>> 
>>>> c) we have to make sure that instances launched on those nodes take
>>>> up
>>>> the entire host state somehow. We could probably do this by making
>>>> sure that the instance_type ram, mb, gb etc. matches what the node
>>>> has, but we may want a new boolean field "used" if those aren't
>>>> sufficient.
>>>> 
>>>> 
>>>> I This approach seems pretty good. We could potentially get rid of
>>>> the
>>>> shared bare_metal_node table. I guess the only other concern is how
>>>> you populate the capabilities that the bare metal nodes are
>>>> reporting.
>>>> I guess an api extension that rpcs to a baremetal node to add the
>>>> node. Maybe someday this could be autogenerated by the bare metal
>>>> host
>>>> looking in its arp table for dhcp requests! :)
>>>> 
>>>> 
>>>> Vish
>>>> 
>>>> _______________________________________________
>>>> OpenStack-dev mailing list
>>>> OpenStack-dev at lists.openstack.org
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>> 
>>> _______________________________________________
>>> OpenStack-dev mailing list
>>> OpenStack-dev at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> 
>> 
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




More information about the OpenStack-dev mailing list