[openstack-dev] [TripleO] our update story: can people live with it?

Jay Pipes jaypipes at gmail.com
Wed Jan 22 18:53:14 UTC 2014


On Wed, 2014-01-22 at 13:15 -0500, Dan Prince wrote:
> 
> ----- Original Message -----
> > From: "Clint Byrum" <clint at fewbar.com>
> > To: "openstack-dev" <openstack-dev at lists.openstack.org>
> > Sent: Wednesday, January 22, 2014 12:45:45 PM
> > Subject: Re: [openstack-dev] [TripleO] our update story: can people live	with it?
> > 
> > Excerpts from Dan Prince's message of 2014-01-22 09:17:24 -0800:
> > > I've been thinking a bit more about how TripleO updates are developing
> > > specifically with regards to compute nodes. What is commonly called the
> > > "update story" I think.
> > > 
> > > As I understand it we expect people to actually have to reboot a compute
> > > node in the cluster in order to deploy an update. This really worries me
> > > because it seems like way overkill for such a simple operation. Lets say
> > > all I need to deploy is a simple change to Nova's libvirt driver. And
> > > I need to deploy it to *all* my compute instances. Do we really expect
> > > people to actually have to reboot every single compute node in their
> > > cluster for such a thing. And then do this again and again for each
> > > update they deploy?
> > > 
> > 
> > Agreed, if we make everybody reboot to push out a patch to libvirt, we
> > have failed. And thus far, we are failing to do that, but with good
> > reason.
> > 
> > Right at this very moment, we are leaning on 'rebuild' in Nova, which
> > reboots the instance. But this is so that we handle the hardest thing
> > well first (rebooting to have a new kernel).
> > 
> > For small updates we need to decouple things a bit more. There is a
> > notion of the image ID in Nova, versus the image ID that is actually
> > running. Right now we update it with a nova rebuild command only.
> > 
> > But ideally we would give operators a tool to optimize and avoid the
> > reboot when it is appropriate. The heuristic should be as simple as
> > comparing kernels.
> 
> When we get to implementing such a thing I might prefer it not to be auto-magic. I can see a case where I want the new image but maybe not the new kernel. Perhaps this should be addressed when building the image (by using the older kernel)... but still. I could see a case for explicitly not wanting to reboot here as well.

++

> > Once we have determined that a new image does not
> > need a reboot, we can just change the ID in Metadata, and an
> > os-refresh-config script will do something like this:
> > 
> > if [ "$(cat /etc/image_id)" != "$(os-apply-config --key image_id)" ] ;
> > then;
> >     download_new_image
> >     mount_image /tmp/new_image
> >     mount / -o remount,rw # Assuming we've achieved ro root
> >     rsync --one-file-system -a /tmp/new_image/ /
> >     mount / -o remount,ro # ditto
> > fi
> > 
> > No reboot required. This would run early in configure.d, so that any
> > pre-configure.d scripts will have run to quiesce services that can't
> > handle having their binaries removed out from under them (read:
> > non-Unix services). Then configure.d runs as usual, configures things,
> > restarts services, and we are now running the new image.
> 
> Cool. I like this a good bit better as it avoids the reboot. Still, this is a rather large amount of data to copy around if I'm only changing a single file in Nova.

Right.

> > 
> > > I understand the whole read only images thing plays into this too... but
> > > I'm wondering if there is a middle ground where things might work
> > > better. Perhaps we have a mechanism where we can tar up individual venvs
> > > from /opt/stack/ or perhaps also this is an area where real OpenStack
> > > packages could shine. It seems like we could certainly come up with some
> > > simple mechanisms to deploy these sorts of changes with Heat such that
> > > compute host reboot can be avoided for each new deploy.
> > 
> > Given the scenario above, that would be a further optimization. I don't
> > think it makes sense to specialize for venvs or openstack services
> > though, so just "ensure the root filesystems match" seems like a
> > workable, highly efficient system. Note that we've talked about having
> > highly efficient ways to widely distribute the new images as well.
> 
> Yes. Optimization! In the big scheme of things I could see 3 approaches being useful:
> 
> 1) Deploy a full image and reboot if you have a kernel update. (entire image is copied)
> 
> 2) Deploy a full image if you change a bunch of things and/or you prefer to do that. (entire image is copied)
> 
> 3) Deploy specific application level updates via packages or tarballs. (only selected applications/packages get deployed)

++. FWIW, #3 happens a heck of a lot more often than #1 or #2 in CD
environments, so this level of optimization will be frequently used.
And, as I've said before, optimizing for frequently-used scenarios is
worth spending the time on. Optimizing for infrequently-occurring
things... not so much. :)

Best,
-jay




More information about the OpenStack-dev mailing list