[openstack-dev] [TripleO] our update story: can people live with it?

Fox, Kevin M Kevin.Fox at pnnl.gov
Wed Jan 22 18:32:02 UTC 2014


Another tricky bit left is how to handle service restarts as needed?

Thanks,
Kevin
________________________________________
From: Dan Prince [dprince at redhat.com]
Sent: Wednesday, January 22, 2014 10:15 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [TripleO] our update story: can people     live    with it?

----- Original Message -----
> From: "Clint Byrum" <clint at fewbar.com>
> To: "openstack-dev" <openstack-dev at lists.openstack.org>
> Sent: Wednesday, January 22, 2014 12:45:45 PM
> Subject: Re: [openstack-dev] [TripleO] our update story: can people live      with it?
>
> Excerpts from Dan Prince's message of 2014-01-22 09:17:24 -0800:
> > I've been thinking a bit more about how TripleO updates are developing
> > specifically with regards to compute nodes. What is commonly called the
> > "update story" I think.
> >
> > As I understand it we expect people to actually have to reboot a compute
> > node in the cluster in order to deploy an update. This really worries me
> > because it seems like way overkill for such a simple operation. Lets say
> > all I need to deploy is a simple change to Nova's libvirt driver. And
> > I need to deploy it to *all* my compute instances. Do we really expect
> > people to actually have to reboot every single compute node in their
> > cluster for such a thing. And then do this again and again for each
> > update they deploy?
> >
>
> Agreed, if we make everybody reboot to push out a patch to libvirt, we
> have failed. And thus far, we are failing to do that, but with good
> reason.
>
> Right at this very moment, we are leaning on 'rebuild' in Nova, which
> reboots the instance. But this is so that we handle the hardest thing
> well first (rebooting to have a new kernel).
>
> For small updates we need to decouple things a bit more. There is a
> notion of the image ID in Nova, versus the image ID that is actually
> running. Right now we update it with a nova rebuild command only.
>
> But ideally we would give operators a tool to optimize and avoid the
> reboot when it is appropriate. The heuristic should be as simple as
> comparing kernels.

When we get to implementing such a thing I might prefer it not to be auto-magic. I can see a case where I want the new image but maybe not the new kernel. Perhaps this should be addressed when building the image (by using the older kernel)... but still. I could see a case for explicitly not wanting to reboot here as well.

> Once we have determined that a new image does not
> need a reboot, we can just change the ID in Metadata, and an
> os-refresh-config script will do something like this:
>
> if [ "$(cat /etc/image_id)" != "$(os-apply-config --key image_id)" ] ;
> then;
>     download_new_image
>     mount_image /tmp/new_image
>     mount / -o remount,rw # Assuming we've achieved ro root
>     rsync --one-file-system -a /tmp/new_image/ /
>     mount / -o remount,ro # ditto
> fi
>
> No reboot required. This would run early in configure.d, so that any
> pre-configure.d scripts will have run to quiesce services that can't
> handle having their binaries removed out from under them (read:
> non-Unix services). Then configure.d runs as usual, configures things,
> restarts services, and we are now running the new image.

Cool. I like this a good bit better as it avoids the reboot. Still, this is a rather large amount of data to copy around if I'm only changing a single file in Nova.

>
> > I understand the whole read only images thing plays into this too... but
> > I'm wondering if there is a middle ground where things might work
> > better. Perhaps we have a mechanism where we can tar up individual venvs
> > from /opt/stack/ or perhaps also this is an area where real OpenStack
> > packages could shine. It seems like we could certainly come up with some
> > simple mechanisms to deploy these sorts of changes with Heat such that
> > compute host reboot can be avoided for each new deploy.
>
> Given the scenario above, that would be a further optimization. I don't
> think it makes sense to specialize for venvs or openstack services
> though, so just "ensure the root filesystems match" seems like a
> workable, highly efficient system. Note that we've talked about having
> highly efficient ways to widely distribute the new images as well.

Yes. Optimization! In the big scheme of things I could see 3 approaches being useful:

1) Deploy a full image and reboot if you have a kernel update. (entire image is copied)

2) Deploy a full image if you change a bunch of things and/or you prefer to do that. (entire image is copied)

3) Deploy specific application level updates via packages or tarballs. (only selected applications/packages get deployed)

>
> I would call your e-mail a documentation/roadmap bug.

Fair enough. Thanks for the info.

> This plan may have been recorded somewhere, but for me it has just always been in my
> head as the end goal (thanks to Robert Collins for drilling the hole
> and pouring it in there btw ;).
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev at lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list