[openstack-dev] [TripleO] our update story: can people live with it?

Clint Byrum clint at fewbar.com
Thu Jan 23 22:31:09 UTC 2014


If we're ready to head down the path of trying to isolate things and
have zero-downtime-on-one-box deploys, then CoreOS has basically solved
this with Docker.

https://coreos.com/using-coreos/

However, let's solve one thing at a time. Even with CoreOS, we still
need HA, and we still want to have rolling deploys that are driven by
tests passing and monitoring.

So once we get those things, sure, lets grind our downtime windows down
to next to nothing. :)

(Also, really if we end up using Docker on top of TripleO, we'd want to
use the OpenStack driver, and that would mean we're using OpenStack on
OpenStack on OpenStack.. which is _QUINT-O_ ;)

Excerpts from Fox, Kevin M's message of 2014-01-23 13:10:09 -0800:
> Would docker work for this?
> 
> Assume every service gets its own docker container. A deployed node is then a docker base image with a set of service containers. Updating an image could be:
> Check if base part of image updated (kernel, docker). if so, full redeploy the node.
> Sync each container image, restart container only if different?
> 
> Maybe the same trick could be done with python virtualenvs instead...
> 
> Kevin
> ________________________________________
> From: Chris Jones [cmsj at tenshu.net]
> Sent: Thursday, January 23, 2014 7:17 AM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [TripleO] our update story: can people live        with it?
> 
> Hi
> 
> Not a tarball. The node would notice from Heat metadata that it should update to a new image and would fetch that image and sync the contents to its /. This would leave it bit for bit identical to a fresh deployment of the new image, at least on disk. The running state would differ and that still requires some design and implementation to figure out.
> 
> Cheers,
> --
> Chris Jones
> 
> > On 23 Jan 2014, at 12:57, Angus Thomas <athomas at redhat.com> wrote:
> >
> > On 22/01/14 20:54, Clint Byrum wrote:
> >>> >
> >>> >I don't understand the aversion to using existing, well-known tools to handle this?
> >>> >
> >> These tools are of course available to users and nobody is stopping them
> >> from using them. We are optimizing for not needing them. They are there
> >> and we're not going to explode if you use them. You just lose one aspect
> >> of what we're aiming at. I believe that having image based deploys will
> >> be well received as long as it is simple to understand.
> >>
> >>> >A hybrid model (blending 2 and 3, above) here I think would work best where
> >>> >TripleO lays down a baseline image and the cloud operator would employ an well-known
> >>> >and support configuration tool for any small diffs.
> >>> >
> >> These tools are popular because they control entropy and make it at
> >> least more likely that what you tested ends up on the boxes.
> >>
> >> A read-only root partition is a much stronger control on entropy.
> >>
> >>> >The operator would then be empowered to make the call for any major upgrades that
> >>> >would adversely impact the infrastructure (and ultimately the users/apps).  He/She
> >>> >could say, this is a major release, let's deploy the image.
> >>> >
> >>> >Something logically like this, seems reasonable:
> >>> >
> >>> >     if (system_change > 10%) {
> >>> >       use TripleO;
> >>> >       } else {
> >>> >       use Existing_Config_Management;
> >>> >     }
> >>> >
> >> I think we can make deploying minor updates minimally invasive.
> >>
> >> We've kept it simple enough, this should be a fairly straight forward
> >> optimization cycle. And the win there is that we also improve things
> >> for the 11% change.
> >
> > Hi Clint,
> >
> > For deploying minimally-invasive minor updates, the idea, if I've understood it correctly, would be to deploy a tarball which replaced selected files on the (usually read-only) root filesystem. That would allow for selective restarting of only the services which are directly affected. The alternative, pushing out a complete root filesystem image, would necessitate the same amount of disruption in all cases.
> >
> > There are a handful of costs with that approach which concern me: It simplifies the deployment itself, but increases the complexity of preparing the deployment. The administrator is going to have to identify the services which need to be restarted, based on the particular set of libraries which are touched in their partial update, and put together the service restart scripts accordingly.
> >
> > We're also making the administrator responsible for managing the sequence in which incremental updates are deployed. Since each incremetal update will re-write a particular set of files, any machine which gets updates 1,2, 3, there's an oversight, and then update 5 is deployed would end up in an odd state, which would require additional tooling to detect. Package based updates, with versioning and dependency tracking on each package, mitigate that risk.
> >
> > Then there's the relationship between the state of running machines, with applied partial updates, and the images which are put onto new machines by Ironic. We would need to apply the partial updates to the images which Ironic writes, or to have the tooling to ensure that newly deployed machines immediately apply the set of applicable partial updates, in sequence.
> >
> > Solving these issues feels like it'll require quite a lot of additional tooling.
> >
> >
> > Angus
> >
> >
> >
> >
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 



More information about the OpenStack-dev mailing list