[openstack-dev] [keystone] Domain-namespaced user attributes in SAML assertions from Keystone IdPs

John Dennis jdennis at redhat.com
Mon Sep 24 18:40:17 UTC 2018


On 9/24/18 8:00 AM, Colleen Murphy wrote:
> This is in regard to https://launchpad.net/bugs/1641625 and the proposed patch https://review.openstack.org/588211 for it. Thanks Vishakha for getting the ball rolling.
> 
> tl;dr: Keystone as an IdP should support sending non-strings/lists-of-strings as user attribute values, specifically lists of keystone groups, here's how that might happen.
> 
> Problem statement:
> 
> When keystone is set up as a service provider with an external non-keystone identity provider, it is common to configure the mapping rules to accept a list of group names from the IdP and map them to some property of a local keystone user, usually also a keystone group name. When keystone acts as the IdP, it's not currently possible to send a group name as a user property in the assertion. There are a few problems:
>      
>      1. We haven't added any openstack_groups key in the creation of the SAML assertion (http://git.openstack.org/cgit/openstack/keystone/tree/keystone/federation/idp.py?h=14.0.0#n164).
>      2. If we did, this would not be enough. Unlike other IdPs, in keystone there can be multiple groups with the same name, namespaced by domain. So it's not enough for the SAML AttributeStatement to contain a semi-colon-separated list of group names, since a user could theoretically be a member of two or more groups with the same name.
>     * Why can't we just send group IDs, which are unique? Because two different keystones are not going to have independent groups with the same UUID, so we cannot possibly map an ID of a group from keystone A to the ID of a different group in keystone B. We could map the ID of the group in in A to the name of a group in B but then operators need to create groups with UUIDs as names which is a little awkward for both the operator and the user who now is a member of groups with nondescriptive names.
>      3. If we then were able to encode a complex type like a group dict in a SAML assertion, we'd have to deal with it on the service provider side by being able to parse such an environment variable from the Apache headers.
>      4. The current mapping rules engine uses basic python string formatting to translate remote key-value pairs to local rules. We would need to change the mapping API to work with values more complex than strings and lists of strings.
>      
> Possible solution:
> 
> Vishakha's patch (https://review.openstack.org/588211) starts to solve (1) but it doesn't go far enough to solve (2-4). What we talked about at the PTG was:
>      
>      2. Encode the group+domain as a string, for example by using the dict string repr or a string representation of some custom XML and maybe base64 encoding it.
>          * It's not totally clear whether the AttributeValue class of the pysaml2 library supports any data types outside of the xmlns:xs namespace or whether nested XML is an option, so encoding the whole thing as an xs:string seems like the simplest solution.
>      3. The SP will have to be aware that openstack_groups is a special key that needs the encoding reversed.
>          * I wrote down "MultiDict" in my notes but I don't recall exactly what format the environment variable would take that would make a MultiDict make sense here, in any case I think encoding the whole thing as a string eliminates the need for this.
>      4. We didn't talk about the mapping API, but here's what I think. If we were just talking about group names, the mapping API today would work like this (slight oversimplification for brevity):
>          
> Given a list of openstack_groups like ["A", "B", "C"], it would work like this:
>              
> [
>    {
>      "local":
>      [
>        {
>          "group":
>          {
>            "name": "{0}",
>            "domain":
>            {
>              "name": "federated_domain"
>            }
>          }
>        }
>      ], "remote":
>      [
>        {
>          "type": "openstack_groups"
>        }
>      ]
>    }
> ]
> (paste in case the spacing makes this unreadable: http://paste.openstack.org/show/730623/ )
> 
> But now, we no longer have a list of strings but something more like [{"name": "A", "domain_name": "Default"} {"name": "B", "domain_name": "Default", "name": "A", "domain_name": "domainB"}]. Since {0} isn't a string, this example doesn't really work. Instead, let's assume that in step (3) we converted the decoded AttributeValue text to an object. Then the mapping could look more like this:
>          
> [
>    {
>      "local":
>      [
>        {
>          "group":
>          {
>            "name": "{0.name}",
>            "domain":
>            {
>              "name": "{0.domain_name}"
>            }
>          }
>        }
>      ], "remote":
>      [
>        {
>          "type": "openstack_groups"
>        }
>      ]
>    }
> ]
> (paste: http://paste.openstack.org/show/730622/ )
> 
> Alternatively, we could forget about the namespacing problem and simply say we only pass group names in the assertion, and if you have ambiguous group names you're on your own. We could also try to support both, e.g. have an openstack_groups mean a list of group names for simpler use cases, and openstack_groups_unique mean the list of encoded group+domain strings for advanced use cases.
> 
> Finally, whatever we decide for groups we should also apply to openstack_roles which currently only supports global roles and not domain-specific roles.
> 
> (It's also worth noting, for clarity, that the samlize function does handle namespaced projects, but this is because it's retrieving the project from the token and therefore there is only ever one project and one project domain so there is no ambiguity.)
> 

A few thoughts to help focus the discussion:

* Namespacing is critical, no design should be permitted which allows 
for ambiguous names. Ambiguous names are a security issue and can be 
used by an attacker. The SAML designers recognized the importance to 
disambiguate names. In SAML names are conveyed inside a NameIdentifier 
element which (optionally) includes "name qualifier" attributes which in 
SAML lingo is a namespace name.

* SAML does not define the format of an attribute value. You can use 
anything you want as long as it can be expressed in valid XML as long as 
the cooperating parties know how to interpret the XML content. But 
herein lies the problem. Very few SAML implementations know how to 
consume an attribute value other than a string. In the real world, 
despite what the SAML spec says is permitted is the constraint attribute 
values is a string.

* I haven't looked at the pysaml implementation but I'd be surprised if 
it treated attribute values as anything other than a string. In theory 
it could take any Python object (or JSON) and serialize it into XML but 
you would still be stuck with the receiver being unable to parse the 
attribute value (see above point).

* You can encode complex data in an attribute value while only using a 
simple string. The only requirement is the relying party knowing how to 
interpret the string value. Note, this is distinctly different than 
using non-string attribute values because of who is responsible for 
parsing the value. If you use a non-string attribute value the SAML 
library need to know how to parse it, none or very few will know how to 
process that element. But if it's a string value the SAML library will 
happily pass that string back up to the application who can then 
interpret it. The easiest way to embed complex data in a string is with 
JSON, we do it all the time, all over the place in OpenStack. [1][2]

So my suggestion would be to give the attribute a meaningful name. 
Define a JSON schema for the data and then let the upper layers decode 
the JSON and operate on it. This is no different than any other SAML 
attribute passed as a string, the receive MUST know how to interpret the 
string value.

[1] We already pass complex data in a SAML attribute string value. We 
permit a comma separated list of group names to appear in the 'groups' 
mapping rule (although I don't think this feature is documented in our 
mapping rules documentation). The receiver (our mapping engine) has 
hard-coded logic to look for a list of names.

[2] We might want to prepend a format specifier to string containing 
complex data, e.g. "JSON:{json object}". Our parser could then look for 
a leading format tag and if if finds one strip it off and pass the rest 
of the string into the proper parser.

-- 
John Dennis



More information about the OpenStack-dev mailing list