[openstack-dev] [OpenStack-Dev][Nova][VMWare] Enable live migration with one nova compute

Jay Pipes jaypipes at gmail.com
Wed Apr 9 13:05:02 UTC 2014


On Mon, 2014-04-07 at 15:47 +0100, Matthew Booth wrote:
> On 07/04/14 06:20, Jay Pipes wrote:
> > On Sun, 2014-04-06 at 06:59 +0000, Nandavar, Divakar Padiyar wrote:
> >>>> Well, it seems to me that the problem is the above blueprint and the code it introduced. This is an anti-feature IMO, and probably the best solution would be to remove the above code and go back to having a single >> nova-compute managing a single vCenter cluster, not multiple ones.
> >>
> >> Problem is not introduced by managing multiple clusters from single nova-compute proxy node.  
> > 
> > I strongly disagree.
> > 
> >> Internally this proxy driver is still presenting the "compute-node" for each of the cluster its managing.
> > 
> > In what way?
> > 
> >>  What we need to think about is applicability of the live migration use case when a "cluster" is modelled as a compute.   Since the "cluster" is modelled as a compute, it is assumed that a typical use case of live-move is taken care by the underlying "cluster" itself.       With this there are other use cases which are no-op today like host maintenance mode, live move, setting instance affinity etc.,     In order to resolve this I was thinking of 
> >> "A way to expose operations on individual ESX Hosts like Putting host in maintenance mode,  live move, instance affinity etc., by introducing Parent - Child compute node concept.   Scheduling can be restricted to Parent compute node and Child compute node can be used for providing more drill down on compute and also enable additional compute operations".    Any thoughts on this?
> > 
> > The fundamental problem is that hacks were put in place in order to make
> > Nova defer control to vCenter, when the design of Nova and vCenter are
> > not compatible, and we're paying the price for that right now.
> > 
> > All of the operations you describe above -- putting a host in
> > maintenance mode, live-migration of an instance, ensuring a new instance
> > is launched near or not-near another instance -- depend on a fundamental
> > design feature in Nova: that a nova-compute worker fully controls and
> > manages a host that provides a place to put server instances. We have
> > internal driver interfaces for the *hypervisor*, not for the *manager of
> > hypervisors*, because, you know, that's what Nova does.
> 
> I'm going to take you to task here for use of the word 'fundamental'.
> What does Nova do? Apparently: 'OpenStack Nova provides a cloud
> computing fabric controller, supporting a wide variety of virtualization
> technologies, including KVM, Xen, LXC, VMware, and more. In addition to
> its native API, it includes compatibility with the commonly encountered
> Amazon EC2 and S3 APIs.' There's nothing in there about the ratio of
> Nova instances to hypervisors: that's an implementation detail. Now this
> change may or may not sit well with design decisions which have been
> made in the past, but the concept of managing multiple clusters from a
> single Nova instance is certainly not fundamentally wrong. It may not be
> pragmatic; it may require further changes to Nova which were not made,
> but there is nothing about it which is fundamentally at odds with the
> stated goals of the project.
> 
> Why did I bother with that? I think it's in danger of being lost. Nova
> has been around for a while now and it has a lot of code and a lot of
> developers behind it. We need to remember, though, that's it's all for
> nothing if nobody wants to use it. VMware is different, but not wrong.
> Let's stay fresh.

Please see my previous email to Juan about this. I'm not anti-VMWare.
I'm just opposed to changing an important part of the implementation of
Nova just so that certain vCenter operations can be supported.

> > The problem with all of the vCenter stuff is that it is trying to say to
> > Nova "don't worry, I got this" but unfortunately, Nova wants and needs
> > to manage these things, not surrender control to a different system that
> > handles orchestration and scheduling in its own unique way.
> 
> Again, I'll flip that round. Nova *currently* manages these things, and
> working efficiently with a platform which also does these things would
> require rethinking some design above the driver level. It's not
> something we want to do naively, which the VMware driver is suffering
> from in this area. It may take time to get this right, but we shouldn't
> write it off as fundamentally wrong. It's useful to users and not
> fundamentally at odds with the project's goals.

I'm not writing off vCenter or its capabilities. I am arguing that the
bar for modifying a fundamental design decision in Nova -- that of being
horizontally scalable by having a single nova-compute worker responsible
for managing a single provider of compute resources -- was WAY too low,
and that this decision should be revisited in the future (and possibly
as part of the vmware driver refactoring efforts currently underway by
the good folks at RH and VMWare).

> > If a shop really wants to use vCenter for scheduling and orchestration
> > of server instances, what exactly is the point of using OpenStack Nova
> > to begin with? What exactly is the point of trying to use OpenStack Nova
> > for scheduling and host operations when you've already shelled out US
> > $6,000 for vCenter Server and a boatload more money for ESX licensing?
> 
> I confess I wondered this myself. However, I have now spoken to real
> people who are spending real money doing exactly this. The drivers seem
> to be:
> 
> * The external API
> * A heterogeneous cloud
> 
> vSphere isn't really designed for the former and doesn't do it well. It
> obviously doesn't help with the latter at all. For example, users want
> to be able to give non-admin customers the ability to deploy across both
> KVM and VMware.
> 
> To my mind, a VMware cluster is an obvious deployment target. I think
> it's reasonable for Nova to treat it like a single hypervisor, but with
> non-uniform resources (which would require a change in, or at least an
> addition to, the current data model).

That is a big and important change, and one that should not be taken
lightly.

>  Maybe something like extreme NUMA.
> I don't think it makes sense for Nova to concern itself with migrating
> VMs between hosts in a cluster. Putting a cluster into maintenance mode
> would involve the whole cluster, but the vSphere administrator obviously
> has other options.
> 
> The fact that we can't migrate a VM between 2 clusters is clearly a bug.

Is it?

> Whether a single Nova should manage multiple clusters is an open
> question, but it should be able to treat them as multiple targets. It's
> not fundamentally wrong, though.
> 
> Matt
> 





More information about the OpenStack-dev mailing list