[openstack-dev] [TripleO] State preserving upgrades working, next MVP selection?

Dan Prince dprince at redhat.com
Mon Jan 27 17:22:21 UTC 2014



----- Original Message -----
> From: "Robert Collins" <robertc at robertcollins.net>
> To: "OpenStack Development Mailing List" <openstack-dev at lists.openstack.org>
> Sent: Sunday, January 26, 2014 3:30:22 PM
> Subject: [openstack-dev] [TripleO] State preserving upgrades working,	next MVP selection?
> 
> So great news - we've now got state preserving upgrades actually
> working - we can now upgrade a deployed cloud (with downtime) without
> tossing away all the users valuable data. Yay. This isn't entirely
> done as we have a couple of outstanding patches we're running early
> versions of, but - still, it's time to pick a new focus.
> 
> So we need to pick the next step to focus on. In our roadmap we have:
> 
> MVP4: Keep VMs running during deploys.
> 
> This to my mind means two things:
>  - VMs need to not be interrupted
>  - network traffic needs to be uninterrupted
> 
> Now, as the kernel can be upgraded in a deploy, this still requires us
> to presume that we may have to reboot a machine - so we're not at the
> point of focusing on high performance updates yet.
> 
> Further consequences - we'll need two network nodes and two
> hypervisors, and live migration. 10m (the times-two reboot time for a
> server) is too long for the central DB to be down if we want
> neutron-agents not to get unhappy as well, so we'll really need two
> control plane nodes.
> 
> So I think the next MVP needs the following cards:
>  - HA DB
>  - HA APIs
>  - rolling upgrades
>  - nova live migration

It seems a bit fuzzy whether live migration violates the rules above (no VM interruption, no network disruption). Live migration is certainly a good feature to have in general... but wiring it into our upgrade strategy seems like a bad idea. I would much rather see us put the effort into an upgrade path which allows VMs to persist on the compute host machine (uninterrupted) while the upgrade takes place. Live migrating things back and forth all the time just seems like a thrashing, cool for a demo, but bad idea in production sort of thing to me.


>  - neutron agent migration *or* neutron distributed-HA setup
>  - scale the heat template to have 2 control plane nodes
>  - scale the heat template to have 2 hypervisor nodes

This is cool, especially for bare metal sorts of setups. For developers though I would sort of like to consider a hybrid approach where we still support a single control plan and compute (hypervisor) node for the devtest scripts. Resources are just to limited to force everyone to use HA setups by default, always. While HA is certainly important it is only part of TripleO and there are many, things you might want to work on without using it. So lets keep this as an optional production focused sort of component.

> 
> as a minimum - are these too granular or about right? I broke the heat
> template change into two because we can scale hypervisors right now,
> whereas control plane scaling will need changes and testing so that we
> only have one HA database created, not two non-HA setup in parallel
> :).
> 
> I'm going to put this into trello now, and will adjust as we discuss
> 
> -Rob
> 
> --
> Robert Collins <rbtcollins at hp.com>
> Distinguished Technologist
> HP Converged Cloud
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 



More information about the OpenStack-dev mailing list