[openstack-dev] [TripleO] State preserving upgrades working, next MVP selection?
Dan Prince
dprince at redhat.com
Mon Jan 27 18:15:37 UTC 2014
----- Original Message -----
> From: "Clint Byrum" <clint at fewbar.com>
> To: "openstack-dev" <openstack-dev at lists.openstack.org>
> Sent: Monday, January 27, 2014 12:48:23 PM
> Subject: Re: [openstack-dev] [TripleO] State preserving upgrades working, next MVP selection?
>
> Excerpts from Dan Prince's message of 2014-01-27 09:22:21 -0800:
> >
> > ----- Original Message -----
> > > From: "Robert Collins" <robertc at robertcollins.net>
> > > To: "OpenStack Development Mailing List"
> > > <openstack-dev at lists.openstack.org>
> > > Sent: Sunday, January 26, 2014 3:30:22 PM
> > > Subject: [openstack-dev] [TripleO] State preserving upgrades working,
> > > next MVP selection?
> > >
> > > So great news - we've now got state preserving upgrades actually
> > > working - we can now upgrade a deployed cloud (with downtime) without
> > > tossing away all the users valuable data. Yay. This isn't entirely
> > > done as we have a couple of outstanding patches we're running early
> > > versions of, but - still, it's time to pick a new focus.
> > >
> > > So we need to pick the next step to focus on. In our roadmap we have:
> > >
> > > MVP4: Keep VMs running during deploys.
> > >
> > > This to my mind means two things:
> > > - VMs need to not be interrupted
> > > - network traffic needs to be uninterrupted
> > >
> > > Now, as the kernel can be upgraded in a deploy, this still requires us
> > > to presume that we may have to reboot a machine - so we're not at the
> > > point of focusing on high performance updates yet.
> > >
> > > Further consequences - we'll need two network nodes and two
> > > hypervisors, and live migration. 10m (the times-two reboot time for a
> > > server) is too long for the central DB to be down if we want
> > > neutron-agents not to get unhappy as well, so we'll really need two
> > > control plane nodes.
> > >
> > > So I think the next MVP needs the following cards:
> > > - HA DB
> > > - HA APIs
> > > - rolling upgrades
> > > - nova live migration
> >
> > It seems a bit fuzzy whether live migration violates the rules above
> > (no VM interruption, no network disruption). Live migration is certainly a
> > good feature to have in general... but wiring it into our upgrade strategy
> > seems like a bad idea. I would much rather see us put the effort into
> > an upgrade path which allows VMs to persist on the compute host machine
> > (uninterrupted) while the upgrade takes place. Live migrating things
> > back and forth all the time just seems like a thrashing, cool for a demo,
> > but bad idea in production sort of thing to me.
> >
>
> I'm not sure I understand. You must terminate the VMs when you update
> the kernel. Are you saying we should not handle kernel upgrades, or that
> we should just focus on evacuation for kernel upgrades?
I missed that the focus here was only on kernel upgrades. I thought this was just an upgrades in general thread with kernel upgrades being optional.
So.. if the subject here is really just "State preserving kernel upgrades" then carry on I guess...
>
> Evacuation has the same problem, but with VM interruption added in. I
> think we should actually offer either as an option, but we have to offer
> one of them, or we'll have no way to update compute node kernels.
>
> The non-kernel upgrade path is an order of magnitude easier, and we have
> discussed optimizations for it quite a bit already in other threads. We'll
> get to it. But leaving people without a way to upgrade the kernel on
> their compute nodes is not something I think we want to do.
If it is easier I would say lets go on and do it first then. From a priority standpoint an application redeployment of just OpenStack (without a kernel upgrade) is certainly going to be more useful on a day to day basis. Some shops may already have ways of performing hand cut in place kernel upgrades anyways so while an automated approach is valuable I'm not sure it is the most useful first order of business.
>
> >
> > > - neutron agent migration *or* neutron distributed-HA setup
> > > - scale the heat template to have 2 control plane nodes
> > > - scale the heat template to have 2 hypervisor nodes
> >
> > This is cool, especially for bare metal sorts of setups. For developers
> > though I would sort of like to consider a hybrid approach where we
> > still support a single control plan and compute (hypervisor) node for
> > the devtest scripts. Resources are just to limited to force everyone to
> > use HA setups by default, always. While HA is certainly important it is
> > only part of TripleO and there are many, things you might want to work
> > on without using it. So lets keep this as an optional production focused
> > sort of component.
> >
>
> I think that goes without saying. We can just develop in degraded mode. :)
>
> > >
> > > as a minimum - are these too granular or about right? I broke the heat
> > > template change into two because we can scale hypervisors right now,
> > > whereas control plane scaling will need changes and testing so that we
> > > only have one HA database created, not two non-HA setup in parallel
> > > :).
> > >
> > > I'm going to put this into trello now, and will adjust as we discuss
> > >
> > > -Rob
> > >
> > > --
> > > Robert Collins <rbtcollins at hp.com>
> > > Distinguished Technologist
> > > HP Converged Cloud
> > >
> > > _______________________________________________
> > > OpenStack-dev mailing list
> > > OpenStack-dev at lists.openstack.org
> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > >
> >
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
More information about the OpenStack-dev
mailing list