[openstack-dev] creating a default for oslo config variables within a project?

Sean Dague sean at dague.net
Wed Dec 4 12:07:40 UTC 2013


On 12/03/2013 11:21 PM, Clint Byrum wrote:
> Excerpts from Sean Dague's message of 2013-12-03 16:05:47 -0800:
>> On 12/03/2013 06:13 PM, Ben Nemec wrote:
>>> On 2013-12-03 17:09, Sean Dague wrote:
>>>> On 12/03/2013 05:50 PM, Mark McLoughlin wrote:
>>>>> On Tue, 2013-12-03 at 16:23 -0600, Ben Nemec wrote:
>>>>>> On 2013-12-03 15:56, Sean Dague wrote:
>>>>>>> This cinder patch - https://review.openstack.org/#/c/48935/
>>>>>>>
>>>>>>> Is blocked on failing upgrade because the updated oslo lockutils won't
>>>>>>> function until there is a specific configuration variable added to the
>>>>>>> cinder.conf.
>>>>>>>
>>>>>>> That work around is proposed here -
>>>>>>> https://review.openstack.org/#/c/52070/3
>>>>>>>
>>>>>>> However I think this is exactly the kind of forward breaks that we
>>>>>>> want
>>>>>>> to prevent with grenade, as cinder failing to function after a rolling
>>>>>>> upgrade because a config item wasn't added is exactly the kind of pain
>>>>>>> we are trying to prevent happening to ops.
>>>>>>>
>>>>>>> So the question is, how is this done correctly so that a default
>>>>>>> can be
>>>>>>> set in the cinder code for this value, and it not require a config
>>>>>>> change to work?
>>>>>
>>>>> You're absolutely correct, in principle - if the default value for
>>>>> lock_path worked for users before, we should absolutely continue to
>>>>> support it.
>>>>>
>>>>>> I don't know that I have a good answer on how to handle this, but for
>>>>>> context this change is the result of a nasty bug in lockutils that
>>>>>> meant
>>>>>> external locks were doing nothing if lock_path wasn't set.  Basically
>>>>>> it's something we should never have allowed in the first place.
>>>>>>
>>>>>> As far as setting this in code, it's important that all of the
>>>>>> processes
>>>>>> for a service are using the same value to avoid the same bad situation
>>>>>> we were in before.  For tests, we have a lockutils wrapper
>>>>>> (https://github.com/openstack/oslo-incubator/blob/master/openstack/common/lockutils.py#L282)
>>>>>> that sets an environment variable to address this, but that only
>>>>>> works if all of the processes are going to be spawned from within
>>>>>> the same wrapper, and I'm not sure how secure that is for production
>>>>>> deployments since it puts all of the lock files in a temporary
>>>>>> directory.
>>>>>
>>>>> Right, I don't think the previous default really "worked" - if you used
>>>>> the default, then external locking was broken.
>>>>>
>>>>> I suspect most distros do set a default - I see RDO has this in its
>>>>> default nova.conf:
>>>>>
>>>>>   lock_path = /var/lib/nova/tmp
>>>>>
>>>>> So, yes - this is all terrible.
>>>>>
>>>>> IMHO, rather than raise an exception we should log a big fat warning
>>>>> about relying on the default and perhaps just treat the lock as an
>>>>> in-process lock in that case ... since that's essentially what it was
>>>>> before, right?
>>>>
>>>> So a default of lock_path = /tmp will work (FHS says that path has to be
>>>> there), even if not optimal. Could we make it a default value like that
>>>> instead of the current default which is null (and hence the problem).
>>>
>>> IIRC, my initial fix was something similar to that, but it got shot down
>>> because putting the lock files in a known world writeable location was a
>>> security issue.
>>>
>>> Although maybe if we put them in a subdirectory of /tmp and ensured that
>>> the permissions were such that only the user running the service could
>>> use that directory, it might be acceptable?  We could still log a
>>> warning if we wanted.
>>>
>>> This seems like it would have implications for people running services
>>> on Windows too, but we can probably find a way to make that work if we
>>> decide on a solution.
>>
>> How is that a security issue? Are the lock files being written with some
>> sensitive data in them and have g or o permissions on? The sticky bit
>> (+t) on /tmp will prevent other users from deleting the file.
>>
> 
> Right, but it won't prevent users from creating a symlink with the same
> name.
> 
> ln -s /var/lib/nova/instances/xxxxx/image.raw /tmp/well.known.location
> 
> Now when you do
> 
> with open('/tmp/well.known.location', 'w') as lockfile:
>   lockfile.write('Stuff')
> 
> Nova has just truncated the image file and written 'Stuff' to it.

So that's the generic case (and the way people often write this). But
the oslo lockutils code doesn't work that way. While it does open the
file for write, it does not actually write, it's using fcntl to hold
locks. That's taking a data structure on the fd in kernel memory (IIRC),
so it correctly gives it up if the process crashes.

I'm not saying there isn't some other possible security vulnerability
here as well, but it's not jumping out at me. So I'd love to understand
that, because if we can close that exposure we can provide a working
default, plus a strong recommendation for how to do that *right*. I'd be
totally happy with printing WARNING level at startup if lock_path = /tmp
that this should be adjusted.

> The typical solution is to use a lock directory, /var/run/yourprogram,
> that has restrictive enough permissions setup for your program to have
> exclusive use of it, and is created by root at boot time. That is what
> the packages do now.
> 
> It would be good if everybody agreed on a default, %(nova_home)/locks
> or something, but root still must set it up with the right permissions
> before the program can use it. IMO that is fine, your upgrade should
> include a change to your init script/systemd unit/upstart job which
> ensures that it exists before starting.

That would be interesting... if we actually gated using service scripts
(which we don't). But it does bring up a point of trying to get there.

> We could abstract this away with a nova-manage subcommand that is intended
> to be run as root but can inspect the config file. That would allow for
> simpler documentation and the packagers would probably love you for it.

Also... we're actually talking about cinder in this case. ;) But I do
realize this could apply to any project.

	-Sean

-- 
Sean Dague
http://dague.net

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 482 bytes
Desc: OpenPGP digital signature
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20131204/301cd96a/attachment.pgp>


More information about the OpenStack-dev mailing list