[openstack-dev] [TripleO] our update story: can people live with it?

Fox, Kevin M Kevin.Fox at pnnl.gov
Wed Jan 22 20:19:56 UTC 2014


I think most of the time taken to reboot is spent in bringing down/up the services though, so I'm not sure what it really buys you if you do it all. It may let you skip the crazy long bootup time on "enterprise" hardware, but that could be worked around with kexec on the full reboot method too.

Thanks,
Kevin
________________________________________
From: Clint Byrum [clint at fewbar.com]
Sent: Wednesday, January 22, 2014 10:55 AM
To: openstack-dev
Subject: Re: [openstack-dev] [TripleO] our update story: can people live        with it?

Agreed, it is tricky if we try to only restart what we've changed.

OR, just restart everything. We can make endpoints HA and use rolling
updates to avoid spurious faults.

There are complex ways to handle things even smoother.. but I go back to
"What does complexity cost?"

Excerpts from Fox, Kevin M's message of 2014-01-22 10:32:02 -0800:
> Another tricky bit left is how to handle service restarts as needed?
>
> Thanks,
> Kevin
> ________________________________________
> From: Dan Prince [dprince at redhat.com]
> Sent: Wednesday, January 22, 2014 10:15 AM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [TripleO] our update story: can people     live    with it?
>
> ----- Original Message -----
> > From: "Clint Byrum" <clint at fewbar.com>
> > To: "openstack-dev" <openstack-dev at lists.openstack.org>
> > Sent: Wednesday, January 22, 2014 12:45:45 PM
> > Subject: Re: [openstack-dev] [TripleO] our update story: can people live      with it?
> >
> > Excerpts from Dan Prince's message of 2014-01-22 09:17:24 -0800:
> > > I've been thinking a bit more about how TripleO updates are developing
> > > specifically with regards to compute nodes. What is commonly called the
> > > "update story" I think.
> > >
> > > As I understand it we expect people to actually have to reboot a compute
> > > node in the cluster in order to deploy an update. This really worries me
> > > because it seems like way overkill for such a simple operation. Lets say
> > > all I need to deploy is a simple change to Nova's libvirt driver. And
> > > I need to deploy it to *all* my compute instances. Do we really expect
> > > people to actually have to reboot every single compute node in their
> > > cluster for such a thing. And then do this again and again for each
> > > update they deploy?
> > >
> >
> > Agreed, if we make everybody reboot to push out a patch to libvirt, we
> > have failed. And thus far, we are failing to do that, but with good
> > reason.
> >
> > Right at this very moment, we are leaning on 'rebuild' in Nova, which
> > reboots the instance. But this is so that we handle the hardest thing
> > well first (rebooting to have a new kernel).
> >
> > For small updates we need to decouple things a bit more. There is a
> > notion of the image ID in Nova, versus the image ID that is actually
> > running. Right now we update it with a nova rebuild command only.
> >
> > But ideally we would give operators a tool to optimize and avoid the
> > reboot when it is appropriate. The heuristic should be as simple as
> > comparing kernels.
>
> When we get to implementing such a thing I might prefer it not to be auto-magic. I can see a case where I want the new image but maybe not the new kernel. Perhaps this should be addressed when building the image (by using the older kernel)... but still. I could see a case for explicitly not wanting to reboot here as well.
>
> > Once we have determined that a new image does not
> > need a reboot, we can just change the ID in Metadata, and an
> > os-refresh-config script will do something like this:
> >
> > if [ "$(cat /etc/image_id)" != "$(os-apply-config --key image_id)" ] ;
> > then;
> >     download_new_image
> >     mount_image /tmp/new_image
> >     mount / -o remount,rw # Assuming we've achieved ro root
> >     rsync --one-file-system -a /tmp/new_image/ /
> >     mount / -o remount,ro # ditto
> > fi
> >
> > No reboot required. This would run early in configure.d, so that any
> > pre-configure.d scripts will have run to quiesce services that can't
> > handle having their binaries removed out from under them (read:
> > non-Unix services). Then configure.d runs as usual, configures things,
> > restarts services, and we are now running the new image.
>
> Cool. I like this a good bit better as it avoids the reboot. Still, this is a rather large amount of data to copy around if I'm only changing a single file in Nova.
>
> >
> > > I understand the whole read only images thing plays into this too... but
> > > I'm wondering if there is a middle ground where things might work
> > > better. Perhaps we have a mechanism where we can tar up individual venvs
> > > from /opt/stack/ or perhaps also this is an area where real OpenStack
> > > packages could shine. It seems like we could certainly come up with some
> > > simple mechanisms to deploy these sorts of changes with Heat such that
> > > compute host reboot can be avoided for each new deploy.
> >
> > Given the scenario above, that would be a further optimization. I don't
> > think it makes sense to specialize for venvs or openstack services
> > though, so just "ensure the root filesystems match" seems like a
> > workable, highly efficient system. Note that we've talked about having
> > highly efficient ways to widely distribute the new images as well.
>
> Yes. Optimization! In the big scheme of things I could see 3 approaches being useful:
>
> 1) Deploy a full image and reboot if you have a kernel update. (entire image is copied)
>
> 2) Deploy a full image if you change a bunch of things and/or you prefer to do that. (entire image is copied)
>
> 3) Deploy specific application level updates via packages or tarballs. (only selected applications/packages get deployed)
>
> >
> > I would call your e-mail a documentation/roadmap bug.
>
> Fair enough. Thanks for the info.
>
> > This plan may have been recorded somewhere, but for me it has just always been in my
> > head as the end goal (thanks to Robert Collins for drilling the hole
> > and pouring it in there btw ;).
> >
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev at lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list