<html><body>

<p><tt><font size="2">vishvananda@gmail.com wrote on 08/15/2012 06:54:58 PM:<br>

<br>

> From: Vishvananda Ishaya <vishvananda@gmail.com></font></tt><br>

<tt><font size="2">> To: OpenStack Development Mailing List <openstack-dev@lists.openstack.org>, </font></tt><br>

<tt><font size="2">> Cc: "openstack@lists.launchpad.net \(openstack@lists.launchpad.net<br>

> \)" <openstack@lists.launchpad.net></font></tt><br>

<tt><font size="2">> Date: 08/15/2012 06:58 PM</font></tt><br>

<tt><font size="2">> Subject: Re: [Openstack] [openstack-dev] Discussion about where to <br>

> put database for bare-metal provisioning (review 10726)</font></tt><br>

<tt><font size="2">> Sent by: openstack-bounces+mjfork=us.ibm.com@lists.launchpad.net</font></tt><br>

<tt><font size="2">> <br>

> On Aug 15, 2012, at 3:17 PM, Michael J Fork <mjfork@us.ibm.com> wrote:</font></tt><br>

<tt><font size="2">> <br>

> > I am interested in finding a solution that enables bare-metal and <br>

> > virtualized requests to be serviced through the same scheduler where<br>

> > the compute_nodes table has a full view of schedulable resources.  <br>

> > This would seem to simplify the end-to-end flow while opening up <br>

> > some additional use cases (e.g. dynamic allocation of a node from <br>

> > bare-metal to hypervisor and back).  <br>

> > <br>

> > One approach would be to have a proxy running a single nova-compute <br>

> > daemon fronting the bare-metal nodes .  That nova-compute daemon <br>

> > would report up many HostState objects (1 per bare-metal node) to <br>

> > become entries in the compute_nodes table and accessible through the<br>

> > scheduler HostManager object.</font></tt><br>

<tt><font size="2">> > The HostState object would set cpu_info, vcpus, member_mb and <br>

> > local_gb values to be used for scheduling with the hypervisor_host <br>

> > field holding the bare-metal machine address (e.g. for IPMI based <br>

> > commands) and hypervisor_type = NONE.  The bare-metal Flavors are <br>

> > created with an extra_spec of hypervisor_type= NONE and the <br>

> > corresponding compute_capabilities_filter would reduce the available<br>

> > hosts to those bare_metal nodes.  The scheduler would need to <br>

> > understand that hypervisor_type = NONE means you need an exact fit <br>

> > (or best-fit) host vs weighting them (perhaps through the multi-<br>

> > scheduler).  The scheduler would cast out the message to the <br>

> > <topic>.<service-hostname> (code today uses the HostState hostname),<br>

> > with the compute driver having to understand if it must be serviced <br>

> > elsewhere (but does not break any existing implementations since it <br>

> > is 1 to 1).</font></tt><br>

<tt><font size="2">> > <br>

> > Does this solution seem workable? Anything I missed?</font></tt><br>

<tt><font size="2">> > The bare metal driver already is proxying for the other nodes so it <br>

> sounds like we need a couple of things to make this happen:</font></tt><br>

<tt><font size="2">> <br>

> a) modify driver.get_host_stats to be able to return a list of host <br>

> stats instead of just one. Report the whole list back to the <br>

> scheduler. We could modify the receiving end to accept a list as <br>

> well or just make multiple calls to </font></tt><br>

<tt><font size="2">> self.update_service_capabilities(capabilities)</font></tt><br>

<tt><font size="2">> <br>

> b) make a few minor changes to the scheduler to make sure filtering <br>

> still works. Note the changes here may be very helpful:</font></tt><br>

<tt><font size="2">> <br>

> <a href="https://review.openstack.org/10327">https://review.openstack.org/10327</a></font></tt><br>

<tt><font size="2">> <br>

> c) we have to make sure that instances launched on those nodes take <br>

> up the entire host state somehow. We could probably do this by <br>

> making sure that the instance_type ram, mb, gb etc. matches what the<br>

> node has, but we may want a new boolean field "used" if those aren't<br>

> sufficient.</font></tt><br>

<br>

<tt><font size="2">My initial thought is that showing the actual resources the guest requested as being consumed in HostState would enable use cases like migrating a guest running on a too-big machine to a right-size one.  However, that would required the bare-metal node to store the state of the requested guest when that information could be obtained from the instance_type. </font></tt><br>

<br>

<tt><font size="2">For now, the simplest is probably to have the bare-metal virt driver set the disk_available = 0 and host_memory_free = 0 so the scheduler removes them from consideration, with the vcpus, disk_total, host_memory_total set to the physical machine values.  If the requested guest size is easily accessible, the _used values could be set to those values (although not clear if anything would break though with _total != _free + _used, in which case setting _used = _total would seem to be acceptable for now).  </font></tt><br>

<br>

<tt><font size="2">Another options is to add num_instances to HostState and have the bare-metal filter remove hypervisor_type = NONE with num_instances > 0.  The scheduler would never see them and then would be no need to show them fully consumed.  Drawback is that the num_instances call is marked as being expensive and would incur some overhead.</font></tt><br>

<tt><font size="2"> <br>

> I This approach seems pretty good. We could potentially get rid of <br>

> the shared bare_metal_node table. I guess the only other concern is <br>

> how you populate the capabilities that the bare metal nodes are <br>

> reporting. I guess an api extension that rpcs to a baremetal node to<br>

> add the node. Maybe someday this could be autogenerated by the bare <br>

> metal host looking in its arp table for dhcp requests! :)</font></tt><br>

<tt><font size="2">> <br>

> Vish</font></tt><br>

<tt><font size="2">> _______________________________________________<br>

> Mailing list: <a href="https://launchpad.net/~openstack">https://launchpad.net/~openstack</a><br>

> Post to     : openstack@lists.launchpad.net<br>

> Unsubscribe : <a href="https://launchpad.net/~openstack">https://launchpad.net/~openstack</a><br>

> More help   : <a href="https://help.launchpad.net/ListHelp">https://help.launchpad.net/ListHelp</a><br>

</font></tt><font size="2" face="sans-serif"><br>

Michael<br>

<br>

-------------------------------------------------<br>

Michael Fork<br>

Cloud Architect, Emerging Solutions<br>

IBM Systems & Technology Group</font></body></html>