[openstack-dev] [Nova] virt driver architecture
Alessandro Pilotti
ap at pilotti.it
Thu May 9 20:14:49 UTC 2013
On the Hyper-V side we have also the need to manage clustering (fault tolerance at the node level). This is something that was a bit further along the way in my personal priority list for Hyper-V, but at the design summit we heard a huge lot of requests for supporting it ASAP.
The main idea behind it is that in a lot of scenarios customers want to move their VMs to openstack from environments where HA at the host level is considered an obvious feature (System center VMM, vSphere, etc).
For what concerns Hyper-V, this can be done by putting the hosts in a Microsoft Cluster (MSCS) with the VM local storage on a shared storage (CSV) and that's it, HA is ready. It's very simple and cost effective, with the main advantage that in case of node failures, the cluster service will take care of failing over the VM resources to another node.
Now, I really agree on the fact that adding a "sub-controller" under Nova adds unnecessary complexity, but IMO a solution to this issue is required (possibly in the Havana timeframe).
IMO like most failover clusters, the Microsoft cluster service can be splitted very roughly in two components: a scheduler and a heartbeat service.
The scheduler is the component which is obviously overlapping with the Nova one, while the hearthbeat (AFAIK) is totally missing. What is also missing is a feature that entitles the scheduler to restart an instance that used to run on a node on another one in case of failures signaled from the heartbeat component.
I suppose that this type of discussions come over and over and I might have missed some of them, but what about a "nova-heartbeat" service, maybe with a plugin / driver model that will enable different solutions (e.g. HALinux, Microsoft Cluster Server, etc)?
Note: there's a completely different business scenario that requires to have OpenStack running along with vSphere, System Center VMM, etc.
In this case the user wants to manage VMs with both solutions at the same time tipycally for partitioned workloads: e.g.: VDI desktops with OpenStack and accounting servers with System center / vSphere. This is also something that customers keep on asking, which means creating a driver for System center VMM, unrelated to the Hyper-V one.
On May 9, 2013, at 20:44 , Russell Bryant <rbryant at redhat.com> wrote:
> On 05/09/2013 12:30 PM, Devananda van der Veen wrote:
>> On Thu, May 9, 2013 at 8:45 AM, Sean Dague <sean at dague.net
>> <mailto:sean at dague.net>> wrote:
>>
>> On 05/09/2013 10:53 AM, Russell Bryant wrote:
>>
>> Greetings,
>>
>> I've been growing concerned with the evolution of Nova's
>> architecture in
>> terms of the virt drivers and the impact they have on the rest
>> of Nova.
>> I've heard these concerns from others in private conversation.
>> Another
>> thread on the list today pushed me to where I think it's time we
>> talk
>> about it:
>>
>> http://lists.openstack.org/__pipermail/openstack-dev/2013-__May/008801.html
>> <http://lists.openstack.org/pipermail/openstack-dev/2013-May/008801.html>
>>
>> At our last design summit, there was a discussion of adding a
>> new virt
>> driver to support oVirt (RHEVM). That seems inappropriate for
>> Nova to
>> me. oVirt is a full virt management system and uses libvirt+KVM
>> hypervisors. We use libvirt+KVM directly. Punting off to yet
>> another
>> management system that wants to manage all of the same things as
>> OpenStack seems like a broken architecture. In fact, oVirt has
>> done a
>> lot of work to *consume* OpenStack resources (glance, quantum),
>> which
>> seems completely appropriate.
>>
>>
>> +1
>>
>>
>> Things get more complicated if we take that argument and apply it to
>> other drivers that we already have in Nova. In particular, I
>> think this
>> applies to the VMware (vCenter mode, not ESX mode) and Hyper-V
>> drivers.
>> I'm not necessarily proposing that those drivers work
>> significantly
>> different. I don't think that's practical if we want to support
>> these
>> systems.
>>
>> We now have two different types of drivers: those that manage
>> individual
>> hypervisor nodes, and those that proxy to much more complex systems.
>>
>> We need to be very aware of what's going on in all virt drivers,
>> even
>> the ones we don't care about as much because we don't use them.
>> We also
>> need to continue to solidify the virt driver interface and be
>> extremely
>> cautious when these drivers require changes to other parts of Nova.
>> Above all, let's make sure that evolution in this area is well
>> thought
>> out and done by conscious decision.
>>
>> Comments airing more specific concerns in this area would be
>> appreciated.
>>
>>
>> I think we learned a really important lesson in baremetal: putting a
>> different complex management system underneath the virt driver
>> interface is a bad fit, requires nova to do unnatural things, and
>> just doesn't make anyone happy at the end of the day. That's since
>> resulted in baremetal spinning out to a new incubated project,
>> Ironic, which I think is really the right long term approach.
>>
>> I think we need to take that lesson for what it was, and realize
>> these virt cluster drivers are very much the same kind of problem.
>> They are better served living in some new incubated effort instead
>> of force fitting into the nova-compute virt layer and driving a lot
>> more complexity into nova.
>>
>>
>> I don't feel like a new project is needed here -- the ongoing discussion
>> about moving scheduling/orchestration logic out of nova-compute and into
>> conductor-or-something-else seems to frame this discussion, too.
>>
>> The biggest change to Nova that I recall around adding the Baremetal
>> code was the addition of the "node" aka "hypervisor_hostname" concept --
>> that a single nova compute host might control more than one discrete
>> thing, which thereby need to be identified as (host, node). That change
>> opened the door for other cluster drivers to fit under the virt
>> interface. IMBW, but I believe this is exactly what the vCenter and
>> Hyper-V folks are looking at. It's also my current plan for Ironic.
>> However, I also believe that this logic doesn't necessarily have to live
>> under the virt API layer; I think it's a really good fit for the
>> orchestration/conductor discussions....
>
> Yep, that was the change that had the most impact on the rest of Nova.
> I think there's a big difference between baremetal and these other
> drivers. In the case of baremetal, the Nova component is still in full
> control of all nodes. There's not another system that is also (or
> instead of Nova) in control of the individual nodes.
>
>> We were talking about this a few days ago in -nova, particularly how
>> moving some of the ComputeManager logic out to conductor might fit
>> together with simplifying the (host, node) complexities, and help make
>> nova-compute just a thin virt API layer. Here is a very poor summary of
>> what I recall...
>> * AMQP topic is based on "nodename", not "hostname"
>> * for local hypervisors (KVM, etc), the topic identifies the local host,
>> and the local nova-compute agent subscribes to it
>> * for clustered hypervisors (ironic, vCenter, etc), the topic identifies
>> the unique resource, and any nova-compute which can manage that resource
>> subscribes to the topic.
>>
>> This would also remove the SPoF that nova-compute currently has for any
>> cluster-of-discrete-things it manages today (eg, baremetal).
>
> Totally agreed with this. However, I'm not sure having clustered
> hypervisors expose individual resources is something they want to do.
> It's in conflict with what the underlying system we're talking to wants
> to be in control of.
>
> --
> Russell Bryant
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20130509/fefa8119/attachment.html>
More information about the OpenStack-dev
mailing list