[openstack-dev] [TripleO] a need to assert user ownership in preserved state
Clint Byrum
clint at fewbar.com
Tue Oct 7 17:49:59 UTC 2014
Excerpts from Chris Jones's message of 2014-10-07 04:35:33 -0700:
> Hi
>
> > On 6 Oct 2014, at 17:41, Clint Byrum <clint at fewbar.com> wrote:
> > We have to be _extremely_ careful in how we manage this. I actually think
> > it has potential to really blow up in our faces.
>
> Yes, anything we do here has the potential to be extremely ruinous for operators, but the reality is that any existing TripleO deployment is at pretty severe risk of blowing up because of UIDs/GIDs changing when they update.
>
> > We need to give people
> > a way to move forward without us merging a patch, and at the same time
> > we need to make sure we provide a consistent set of UIDs for anything
> > people may want to deploy with diskimage-builder.
>
> IMO the only desirable option *has* to be that we statically define UIDs and GIDs in the elements, because:
> 1: Requires no data fragments to be kept safe and fed to subsequent build processes
> 2: Doesn't do anything dynamic on first boot that could take hours/days
> 3: Can be thoroughly audited at build time to ensure correctness
>
> As you rightly point out though, any existing deployments will definitely be disrupted by this, but as I said above, all we'd be doing there is moving the needle from "possible/probable" to "definite".
>
> Since the only leftovers we have from their previous image builds, are the images themselves, we could add the ability for a DIB run to extract IDs from a previous image, but this couldn't be required as a default build option, so we'd still risk existing deployments if they don't notice this feature.
>
> We could create a script that would spider an existing cloud and extract its ID mappings, to produce a fragment to feed into future builds, but again we're relying on operators to know that they need to do this.
>
Welllll... they'd know they need to do _something_ because their UIDs
and GIDs are all horked up (technical term).
> Instead, I agree with Greg's view that this is our fault and we should fix it. We didn't think of this sooner, and as a result, our users are at risk. If we don't entirely fix this ourselves, we will be both expecting them to become aware of this issue and expecting them to do additional work to mitigate it.
>
> To that end, I think we should audit all of our elements for use of /mnt/state/ and use the specific knowledge we have of the software they relate to, to build one-time ID migration scripts, which would:
> 1: Execute before any related services start
> 2: Compare the now-static ID mappings against known files in /mnt/state
> 3: chown/chgrp any files/directories that need migrating
> 4: store a flag file in /mnt/state indicating that this process doesn't need to run again
>
> It does mean they have a potentially painfully long update process once, but the result will be a completely stable, static arrangement that will not require them to preserve precious build fragments for the rest of time. Nor does it require some odd run-time remapping, or any additional mechanisms to centralise user management (e.g. LDAP. Please, no LDAP!)
>
> I think that tying ourselves and our operators into knots because we're afraid of the hit of one-time data migration, is crazy.
>
> AFAICS, the only risk left at that point, is elements that other people are maintaining. If we consider that to be a sufficient risk, we can still build the mechanism for injecting ID values from a previous build (essentially just seeding the static values that we'd be setting anyway) and apologise to the users who need that, or who don't discover its existence and break their clouds.
I'm not afraid of running migrations once. I want to make sure we never
_plan_ to run migrations as part of regular operation.
I agree with most of what you've written, but first I'd start with this:
* Create an element which exports /etc/passwd and /etc/group from build
process.
* Create an element which imports /etc/passwd and /etc/group from local
disk into image. This will have an element-provides of uid-gid-map
* Create a separate element called 'static-users' which also provides
uid-gid-map. Contains a map of uids and gids, and creates users early on
with static UIDs/GIDs only. Disables usual commands used to add users and
groups (error message should explain well enough that user can add their
own element that provides uid-gid-map or switch to importing/exporting).
* Make use-ephemeral depend on uid-gid-map.
* Make tripleo-ci build with static-users, and recommend it in TripleO
documentation.
Once that is done, we will be producing builds with static users. If you
want to create a user for base TripleO, you'll need to do it by hand in
the static-users element. If you are downstream and want to do things
differently that should be easy, just provide your own uid-gid-map
element.
As for migrations, that is fairly simple and can be done generically,
I've already written a script that does it fairly reliably. The only
worry is of course that large collections of files will take a long
time. I'll submit that as a separate element called 'fix-state-uid-gid'
or something like that. We might as well include it in the default build,
so that our images start fixing this problem now. :-P
More information about the OpenStack-dev
mailing list