[openstack-dev] creating a default for oslo config variables within a project?
Ben Nemec
openstack at nemebean.com
Wed Dec 4 16:56:54 UTC 2013
On 2013-12-04 06:07, Sean Dague wrote:
> On 12/03/2013 11:21 PM, Clint Byrum wrote:
>> Excerpts from Sean Dague's message of 2013-12-03 16:05:47 -0800:
>>> On 12/03/2013 06:13 PM, Ben Nemec wrote:
>>>> On 2013-12-03 17:09, Sean Dague wrote:
>>>>> On 12/03/2013 05:50 PM, Mark McLoughlin wrote:
>>>>>> On Tue, 2013-12-03 at 16:23 -0600, Ben Nemec wrote:
>>>>>>> On 2013-12-03 15:56, Sean Dague wrote:
>>>>>>>> This cinder patch - https://review.openstack.org/#/c/48935/
>>>>>>>>
>>>>>>>> Is blocked on failing upgrade because the updated oslo lockutils
>>>>>>>> won't
>>>>>>>> function until there is a specific configuration variable added
>>>>>>>> to the
>>>>>>>> cinder.conf.
>>>>>>>>
>>>>>>>> That work around is proposed here -
>>>>>>>> https://review.openstack.org/#/c/52070/3
>>>>>>>>
>>>>>>>> However I think this is exactly the kind of forward breaks that
>>>>>>>> we
>>>>>>>> want
>>>>>>>> to prevent with grenade, as cinder failing to function after a
>>>>>>>> rolling
>>>>>>>> upgrade because a config item wasn't added is exactly the kind
>>>>>>>> of pain
>>>>>>>> we are trying to prevent happening to ops.
>>>>>>>>
>>>>>>>> So the question is, how is this done correctly so that a default
>>>>>>>> can be
>>>>>>>> set in the cinder code for this value, and it not require a
>>>>>>>> config
>>>>>>>> change to work?
>>>>>>
>>>>>> You're absolutely correct, in principle - if the default value for
>>>>>> lock_path worked for users before, we should absolutely continue
>>>>>> to
>>>>>> support it.
>>>>>>
>>>>>>> I don't know that I have a good answer on how to handle this, but
>>>>>>> for
>>>>>>> context this change is the result of a nasty bug in lockutils
>>>>>>> that
>>>>>>> meant
>>>>>>> external locks were doing nothing if lock_path wasn't set.
>>>>>>> Basically
>>>>>>> it's something we should never have allowed in the first place.
>>>>>>>
>>>>>>> As far as setting this in code, it's important that all of the
>>>>>>> processes
>>>>>>> for a service are using the same value to avoid the same bad
>>>>>>> situation
>>>>>>> we were in before. For tests, we have a lockutils wrapper
>>>>>>> (https://github.com/openstack/oslo-incubator/blob/master/openstack/common/lockutils.py#L282)
>>>>>>> that sets an environment variable to address this, but that only
>>>>>>> works if all of the processes are going to be spawned from within
>>>>>>> the same wrapper, and I'm not sure how secure that is for
>>>>>>> production
>>>>>>> deployments since it puts all of the lock files in a temporary
>>>>>>> directory.
>>>>>>
>>>>>> Right, I don't think the previous default really "worked" - if you
>>>>>> used
>>>>>> the default, then external locking was broken.
>>>>>>
>>>>>> I suspect most distros do set a default - I see RDO has this in
>>>>>> its
>>>>>> default nova.conf:
>>>>>>
>>>>>> lock_path = /var/lib/nova/tmp
>>>>>>
>>>>>> So, yes - this is all terrible.
>>>>>>
>>>>>> IMHO, rather than raise an exception we should log a big fat
>>>>>> warning
>>>>>> about relying on the default and perhaps just treat the lock as an
>>>>>> in-process lock in that case ... since that's essentially what it
>>>>>> was
>>>>>> before, right?
>>>>>
>>>>> So a default of lock_path = /tmp will work (FHS says that path has
>>>>> to be
>>>>> there), even if not optimal. Could we make it a default value like
>>>>> that
>>>>> instead of the current default which is null (and hence the
>>>>> problem).
>>>>
>>>> IIRC, my initial fix was something similar to that, but it got shot
>>>> down
>>>> because putting the lock files in a known world writeable location
>>>> was a
>>>> security issue.
>>>>
>>>> Although maybe if we put them in a subdirectory of /tmp and ensured
>>>> that
>>>> the permissions were such that only the user running the service
>>>> could
>>>> use that directory, it might be acceptable? We could still log a
>>>> warning if we wanted.
>>>>
>>>> This seems like it would have implications for people running
>>>> services
>>>> on Windows too, but we can probably find a way to make that work if
>>>> we
>>>> decide on a solution.
>>>
>>> How is that a security issue? Are the lock files being written with
>>> some
>>> sensitive data in them and have g or o permissions on? The sticky bit
>>> (+t) on /tmp will prevent other users from deleting the file.
>>>
>>
>> Right, but it won't prevent users from creating a symlink with the
>> same
>> name.
>>
>> ln -s /var/lib/nova/instances/xxxxx/image.raw /tmp/well.known.location
>>
>> Now when you do
>>
>> with open('/tmp/well.known.location', 'w') as lockfile:
>> lockfile.write('Stuff')
>>
>> Nova has just truncated the image file and written 'Stuff' to it.
>
> So that's the generic case (and the way people often write this). But
> the oslo lockutils code doesn't work that way. While it does open the
> file for write, it does not actually write, it's using fcntl to hold
> locks. That's taking a data structure on the fd in kernel memory
> (IIRC),
> so it correctly gives it up if the process crashes.
>
> I'm not saying there isn't some other possible security vulnerability
> here as well, but it's not jumping out at me. So I'd love to understand
> that, because if we can close that exposure we can provide a working
> default, plus a strong recommendation for how to do that *right*. I'd
> be
> totally happy with printing WARNING level at startup if lock_path =
> /tmp
> that this should be adjusted.
Full disclosure: I don't claim to be a security expert, so take my
thoughts on this with a grain of salt.
tldr: I still don't see a way to do this without breaking _something_.
Unfortunately, while we don't actually write to the file, just opening
it for write access truncates it. So there remains the issue Clint
raised if someone can make that file a link. I suppose we could check
for the bad things people might do, but I'm pretty sure that way lies
madness.
As a followup to my suggestion to have lockutils create something like
/tmp/oslo-lock-path and make it accessible only to the owner, I think
there's still a problem because a malicious user could create that
directory ahead of time and when OpenStack is run for the first time it
would happily use the existing directory and potentially overwrite any
linked files in it. There might be ways to avoid that, but I suspect
they would be racy (which I believe is why you're not supposed to use
tmp paths that are known ahead of time). Granted, I'm not sure you
should be installing to a system that already has malicious users on it,
but I suspect security people would be less than thrilled with that
answer.
So the workable options that I see right now are to have it break on
upgrade if lock_path is not set, or continue to break external locking
by having it fall back to process locks when lock_path is not set, as
Mark suggested. I'm not wild about either so I'd love to hear if anyone
has a third option.
-Ben
More information about the OpenStack-dev
mailing list