[openstack-dev] Bare-metal node scheduling

Mark McLoughlin markmc at redhat.com
Mon Oct 8 13:50:29 UTC 2012


On Mon, 2012-10-08 at 14:21 +0100, John Garbutt wrote:
> Interesting ideas.
> 
> > What we're doing is allowing the scheduler to choose a compute node
> > based on the details of the individual bare-metal nodes available via the
> > compute node. However, the compute node is still responsible for choosing
> > which bare-metal node to provision.
> 
> While I don't like this approach, it could be used for Hypervisor pools.
> We did wonder about this for XenServer pools. However, it just seemed too messy.
> For example, when you want to live migrate between two members of the pool using nova.

Yeah, I'm not loving the idea of the nova scheduler knowing much if
anything about the details of the resource available to a
virt-driver-layer scheduler.

Another example would be if there was a virt driver for oVirt, I'd much
rather if nova knew nothing about individual oVirt hosts but rather if
the admin configured a bunch of compute slots representing the resources
which nova is allowed to consume from an oVirt cluster.

> > As for terminology, rather than the scheduler considering "nodes" I think
> > "slots" would be less confusing.
> > 
> > You could imagine extending this scheme to other virt drivers to give
> > providers the option of a much more simple and predictable scheduling
> > strategy. You could configure a compute node to have e.g. 10 medium size
> > "slots" and the scheduler would only ever schedule 10 medium size
> > instances to that node. This could potentially be a way for providers to
> > simplify their capacity planning.
> 
> This sounds like a good idea.

Cool.

> I have wondered about an alternative scheduler where each nova-compute
> node is configured with a supported set of flavours, and it reports to
> the scheduler how many of each flavour it still has the capacity to
> run (i.e. full-ish hypervisor reports: 4 tiny instances or 1 small
> instance, 0 large instances etc, but baremetal: 0 tiny, 3 small, 10
> large, etc). That seems to unify the two cases.

Yeah, that's the way I'm thinking.

The issue with making this about configuring a compute node with a set
of flavours is that we're working towards having the compute node not
access the DB at all.

This means the "compute slots" configuration would need to live in the
DB. I guess that's pretty nice in way because we can have a proper admin
API for it.

> For the above, I was thinking about GPU pass-through. You probably
> don't want to fill up a GPU pass-through enabled hypervisors with
> standard instances, unless there is no other option. So you could use
> the above information to write such a server. Once you have used the
> GPUs, you might want to fill up the server with tiny instances to
> maybe save on power.

You could use slots for this, but the simple version wouldn't have the
flexibility around allowing GPU slots to be used for standard instances
if there was no room elsewhere.

Cheers,
Mark.




More information about the OpenStack-dev mailing list