[openstack-dev] [TripleO] our update story: can people live with it?

Angus Thomas athomas at redhat.com
Thu Jan 23 12:57:20 UTC 2014


On 22/01/14 20:54, Clint Byrum wrote:
>> >
>> >I don't understand the aversion to using existing, well-known tools to handle this?
>> >
> These tools are of course available to users and nobody is stopping them
> from using them. We are optimizing for not needing them. They are there
> and we're not going to explode if you use them. You just lose one aspect
> of what we're aiming at. I believe that having image based deploys will
> be well received as long as it is simple to understand.
>
>> >A hybrid model (blending 2 and 3, above) here I think would work best where
>> >TripleO lays down a baseline image and the cloud operator would employ an well-known
>> >and support configuration tool for any small diffs.
>> >
> These tools are popular because they control entropy and make it at
> least more likely that what you tested ends up on the boxes.
>
> A read-only root partition is a much stronger control on entropy.
>
>> >The operator would then be empowered to make the call for any major upgrades that
>> >would adversely impact the infrastructure (and ultimately the users/apps).  He/She
>> >could say, this is a major release, let's deploy the image.
>> >
>> >Something logically like this, seems reasonable:
>> >
>> >     if (system_change > 10%) {
>> >       use TripleO;
>> >       } else {
>> >       use Existing_Config_Management;
>> >     }
>> >
> I think we can make deploying minor updates minimally invasive.
>
> We've kept it simple enough, this should be a fairly straight forward
> optimization cycle. And the win there is that we also improve things
> for the 11% change.
>

Hi Clint,

For deploying minimally-invasive minor updates, the idea, if I've 
understood it correctly, would be to deploy a tarball which replaced 
selected files on the (usually read-only) root filesystem. That would 
allow for selective restarting of only the services which are directly 
affected. The alternative, pushing out a complete root filesystem image, 
would necessitate the same amount of disruption in all cases.

There are a handful of costs with that approach which concern me: It 
simplifies the deployment itself, but increases the complexity of 
preparing the deployment. The administrator is going to have to identify 
the services which need to be restarted, based on the particular set of 
libraries which are touched in their partial update, and put together 
the service restart scripts accordingly.

We're also making the administrator responsible for managing the 
sequence in which incremental updates are deployed. Since each 
incremetal update will re-write a particular set of files, any machine 
which gets updates 1,2, 3, there's an oversight, and then update 5 is 
deployed would end up in an odd state, which would require additional 
tooling to detect. Package based updates, with versioning and dependency 
tracking on each package, mitigate that risk.

Then there's the relationship between the state of running machines, 
with applied partial updates, and the images which are put onto new 
machines by Ironic. We would need to apply the partial updates to the 
images which Ironic writes, or to have the tooling to ensure that newly 
deployed machines immediately apply the set of applicable partial 
updates, in sequence.

Solving these issues feels like it'll require quite a lot of additional 
tooling.


Angus






More information about the OpenStack-dev mailing list