[openstack-dev] Discussion about where to put database for bare-metal provisioning (review 10726)

David Kang dkang at isi.edu
Mon Aug 27 16:40:58 UTC 2012


 Vish,

 I think I don't understand your statement fully.
Unless we use different hostnames, (hostname, hypervisor_hostname) must be the 
same for all bare-metal nodes under a bare-metal nova-compute.

 Could you elaborate the following statement a little bit more?

> You would just have to use a little more than hostname. Perhaps
> (hostname, hypervisor_hostname) could be used to update the entry?
> 

 Thanks,
 David



----- Original Message -----
> I would investigate changing the capabilities to key off of something
> other than hostname. It looks from the table structure like
> compute_nodes could be have a many-to-one relationship with services.
> You would just have to use a little more than hostname. Perhaps
> (hostname, hypervisor_hostname) could be used to update the entry?
> 
> Vish
> 
> On Aug 24, 2012, at 11:23 AM, David Kang <dkang at isi.edu> wrote:
> 
> >
> >  Vish,
> >
> >  I've tested your code and did more testing.
> > There are a couple of problems.
> > 1. host name should be unique. If not, any repetitive updates of new
> > capabilities with the same host name are simply overwritten.
> > 2. We cannot generate arbitrary host names on the fly.
> >   The scheduler (I tested filter scheduler) gets host names from db.
> >   So, if a host name is not in the 'services' table, it is not
> >   considered by the scheduler at all.
> >
> > So, to make your suggestions possible, nova-compute should register
> > N different host names in 'services' table,
> > and N corresponding entries in 'compute_nodes' table.
> > Here is an example:
> >
> > mysql> select id, host, binary, topic, report_count, disabled,
> > availability_zone from services;
> > +----+-------------+----------------+-----------+--------------+----------+-------------------+
> > | id | host | binary | topic | report_count | disabled |
> > | availability_zone |
> > +----+-------------+----------------+-----------+--------------+----------+-------------------+
> > |  1 | bespin101 | nova-scheduler | scheduler | 17145 | 0 | nova |
> > |  2 | bespin101 | nova-network | network | 16819 | 0 | nova |
> > |  3 | bespin101-0 | nova-compute | compute | 16405 | 0 | nova |
> > |  4 | bespin101-1 | nova-compute | compute | 1 | 0 | nova |
> > +----+-------------+----------------+-----------+--------------+----------+-------------------+
> >
> > mysql> select id, service_id, hypervisor_hostname from
> > compute_nodes;
> > +----+------------+------------------------+
> > | id | service_id | hypervisor_hostname |
> > +----+------------+------------------------+
> > |  1 | 3 | bespin101.east.isi.edu |
> > |  2 | 4 | bespin101.east.isi.edu |
> > +----+------------+------------------------+
> >
> >  Then, nova db (compute_nodes table) has entries of all bare-metal
> >  nodes.
> > What do you think of this approach.
> > Do you have any better approach?
> >
> >  Thanks,
> >  David
> >
> >
> >
> > ----- Original Message -----
> >> To elaborate, something the below. I'm not absolutely sure you need
> >> to
> >> be able to set service_name and host, but this gives you the option
> >> to
> >> do so if needed.
> >>
> >> iff --git a/nova/manager.py b/nova/manager.py
> >> index c6711aa..c0f4669 100644
> >> --- a/nova/manager.py
> >> +++ b/nova/manager.py
> >> @@ -217,6 +217,8 @@ class SchedulerDependentManager(Manager):
> >>
> >> def update_service_capabilities(self, capabilities):
> >> """Remember these capabilities to send on next periodic update."""
> >> + if not isinstance(capabilities, list):
> >> + capabilities = [capabilities]
> >> self.last_capabilities = capabilities
> >>
> >> @periodic_task
> >> @@ -224,5 +226,8 @@ class SchedulerDependentManager(Manager):
> >> """Pass data back to the scheduler at a periodic interval."""
> >> if self.last_capabilities:
> >> LOG.debug(_('Notifying Schedulers of capabilities ...'))
> >> - self.scheduler_rpcapi.update_service_capabilities(context,
> >> - self.service_name, self.host, self.last_capabilities)
> >> + for capability_item in self.last_capabilities:
> >> + name = capability_item.get('service_name', self.service_name)
> >> + host = capability_item.get('host', self.host)
> >> + self.scheduler_rpcapi.update_service_capabilities(context,
> >> + name, host, capability_item)
> >>
> >> On Aug 21, 2012, at 1:28 PM, David Kang <dkang at isi.edu> wrote:
> >>
> >>>
> >>>  Hi Vish,
> >>>
> >>>  We are trying to change our code according to your comment.
> >>> I want to ask a question.
> >>>
> >>>>>> a) modify driver.get_host_stats to be able to return a list of
> >>>>>> host
> >>>>>> stats instead of just one. Report the whole list back to the
> >>>>>> scheduler. We could modify the receiving end to accept a list
> >>>>>> as
> >>>>>> well
> >>>>>> or just make multiple calls to
> >>>>>> self.update_service_capabilities(capabilities)
> >>>
> >>>  Modifying driver.get_host_stats to return a list of host stats is
> >>>  easy.
> >>> Calling muliple calls to
> >>> self.update_service_capabilities(capabilities) doesn't seem to
> >>> work,
> >>> because 'capabilities' is overwritten each time.
> >>>
> >>>  Modifying the receiving end to accept a list seems to be easy.
> >>> However, 'capabilities' is assumed to be dictionary by all other
> >>> scheduler routines,
> >>> it looks like that we have to change all of them to handle
> >>> 'capability' as a list of dictionary.
> >>>
> >>>  If my understanding is correct, it would affect many parts of the
> >>>  scheduler.
> >>> Is it what you recommended?
> >>>
> >>>  Thanks,
> >>>  David
> >>>
> >>>
> >>> ----- Original Message -----
> >>>> This was an immediate goal, the bare-metal nova-compute node
> >>>> could
> >>>> keep an internal database, but report capabilities through nova
> >>>> in
> >>>> the
> >>>> common way with the changes below. Then the scheduler wouldn't
> >>>> need
> >>>> access to the bare metal database at all.
> >>>>
> >>>> On Aug 15, 2012, at 4:23 PM, David Kang <dkang at isi.edu> wrote:
> >>>>
> >>>>>
> >>>>> Hi Vish,
> >>>>>
> >>>>> Is this discussion for long-term goal or for this Folsom
> >>>>> release?
> >>>>>
> >>>>> We still believe that bare-metal database is needed
> >>>>> because there is not an automated way how bare-metal nodes
> >>>>> report
> >>>>> their capabilities
> >>>>> to their bare-metal nova-compute node.
> >>>>>
> >>>>> Thanks,
> >>>>> David
> >>>>>
> >>>>>>
> >>>>>> I am interested in finding a solution that enables bare-metal
> >>>>>> and
> >>>>>> virtualized requests to be serviced through the same scheduler
> >>>>>> where
> >>>>>> the compute_nodes table has a full view of schedulable
> >>>>>> resources.
> >>>>>> This
> >>>>>> would seem to simplify the end-to-end flow while opening up
> >>>>>> some
> >>>>>> additional use cases (e.g. dynamic allocation of a node from
> >>>>>> bare-metal to hypervisor and back).
> >>>>>>
> >>>>>> One approach would be to have a proxy running a single
> >>>>>> nova-compute
> >>>>>> daemon fronting the bare-metal nodes . That nova-compute daemon
> >>>>>> would
> >>>>>> report up many HostState objects (1 per bare-metal node) to
> >>>>>> become
> >>>>>> entries in the compute_nodes table and accessible through the
> >>>>>> scheduler HostManager object.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> The HostState object would set cpu_info, vcpus, member_mb and
> >>>>>> local_gb
> >>>>>> values to be used for scheduling with the hypervisor_host field
> >>>>>> holding the bare-metal machine address (e.g. for IPMI based
> >>>>>> commands)
> >>>>>> and hypervisor_type = NONE. The bare-metal Flavors are created
> >>>>>> with
> >>>>>> an
> >>>>>> extra_spec of hypervisor_type= NONE and the corresponding
> >>>>>> compute_capabilities_filter would reduce the available hosts to
> >>>>>> those
> >>>>>> bare_metal nodes. The scheduler would need to understand that
> >>>>>> hypervisor_type = NONE means you need an exact fit (or
> >>>>>> best-fit)
> >>>>>> host
> >>>>>> vs weighting them (perhaps through the multi-scheduler). The
> >>>>>> scheduler
> >>>>>> would cast out the message to the <topic>.<service-hostname>
> >>>>>> (code
> >>>>>> today uses the HostState hostname), with the compute driver
> >>>>>> having
> >>>>>> to
> >>>>>> understand if it must be serviced elsewhere (but does not break
> >>>>>> any
> >>>>>> existing implementations since it is 1 to 1).
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Does this solution seem workable? Anything I missed?
> >>>>>>
> >>>>>> The bare metal driver already is proxying for the other nodes
> >>>>>> so
> >>>>>> it
> >>>>>> sounds like we need a couple of things to make this happen:
> >>>>>>
> >>>>>>
> >>>>>> a) modify driver.get_host_stats to be able to return a list of
> >>>>>> host
> >>>>>> stats instead of just one. Report the whole list back to the
> >>>>>> scheduler. We could modify the receiving end to accept a list
> >>>>>> as
> >>>>>> well
> >>>>>> or just make multiple calls to
> >>>>>> self.update_service_capabilities(capabilities)
> >>>>>>
> >>>>>>
> >>>>>> b) make a few minor changes to the scheduler to make sure
> >>>>>> filtering
> >>>>>> still works. Note the changes here may be very helpful:
> >>>>>>
> >>>>>>
> >>>>>> https://review.openstack.org/10327
> >>>>>>
> >>>>>>
> >>>>>> c) we have to make sure that instances launched on those nodes
> >>>>>> take
> >>>>>> up
> >>>>>> the entire host state somehow. We could probably do this by
> >>>>>> making
> >>>>>> sure that the instance_type ram, mb, gb etc. matches what the
> >>>>>> node
> >>>>>> has, but we may want a new boolean field "used" if those aren't
> >>>>>> sufficient.
> >>>>>>
> >>>>>>
> >>>>>> I This approach seems pretty good. We could potentially get rid
> >>>>>> of
> >>>>>> the
> >>>>>> shared bare_metal_node table. I guess the only other concern is
> >>>>>> how
> >>>>>> you populate the capabilities that the bare metal nodes are
> >>>>>> reporting.
> >>>>>> I guess an api extension that rpcs to a baremetal node to add
> >>>>>> the
> >>>>>> node. Maybe someday this could be autogenerated by the bare
> >>>>>> metal
> >>>>>> host
> >>>>>> looking in its arp table for dhcp requests! :)
> >>>>>>
> >>>>>>
> >>>>>> Vish
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> OpenStack-dev mailing list
> >>>>>> OpenStack-dev at lists.openstack.org
> >>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>>>>
> >>>>> _______________________________________________
> >>>>> OpenStack-dev mailing list
> >>>>> OpenStack-dev at lists.openstack.org
> >>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> OpenStack-dev mailing list
> >>>> OpenStack-dev at lists.openstack.org
> >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>>
> >>> _______________________________________________
> >>> OpenStack-dev mailing list
> >>> OpenStack-dev at lists.openstack.org
> >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>
> >>
> >> _______________________________________________
> >> OpenStack-dev mailing list
> >> OpenStack-dev at lists.openstack.org
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list