[openstack-dev] [keystone] Case sensitivity & backend databases
Clint Byrum
clint at fewbar.com
Thu Sep 26 16:48:26 UTC 2013
Excerpts from Henry Nash's message of 2013-09-25 01:45:32 -0700:
> Hi
>
> Do we specify somewhere whether text field matching in the API is case sensitive or in-sensitive? I'm thinking about filters, as well as user and domain names in authentication. I think our current implementation will always be case sensitive for filters (since we do that in python and do not, yet, pass the filters to the backends), while authentication will reflect the "case sensitivity or lack thereof" of the underlying database. I believe that MySQL is case in-sensitive by default, while Postgres, sqllite and others are case-sensitive by default. If using an LDAP backend, then I think this is case-sensitive.
>
> The above seems to be inconsistent. It might become even more so when we pass the filters to the backend. Given that other projects already pass filters to the backend, we may also have inter-project inconsistencies that bleed through to the user experience. Should we make at least a recommendation that the backend should case-sensitive (you can configure MySQL to be so)? Insist on it? Ignore it and keep things as they are?
The collation controls case sensitivity. The default collations in MySQL
are all case in-sensitive, and this is a good thing for many reasons. I
don't want a user "spamaps" on the same domain where I have "SpamapS".
If you want to force it, you can do so at many levels, from the server
down to the column. utf8_bin would be the case-sensitive collation to
use. But that is not actually what you want.
The problem and the part where programmers get surprised by this is
that we are calling fields VARCHAR when we mean VARBINARY. Strings that
always need to be 100% identical and are not for human consumption,
should be _BINARY_.
In the keystone user table, id, extra, password, and domain_id are all
"varchar". This is a common mistake in SQL column design. Since user
is a utf8 table, the index on id varchar(64) is 64*3 bytes wide. This
is because we have to reserve enough space for 64 UTF-8 characters (and
MySQL only does up to 3-byte UTF8, really obscure UTF-8 can be as long
as 6 bytes!). We don't intend to interpret this as a human ever. So,
this should be VARBINARY(64) everywhere it is used.
The change would have several effects:
1) Indexes that mention the field would shrink by 128 bytes per key.
2) One could have two rows with identical hex values in id that varied
only by case. -- This is not an actual problem, but an effect of the
change.
3) Sorting by this field will now just use the binary value of each
character, not the language collation. When do you sort by a hex id?
Anyway, doing this on all ID fields and obviously-not-utf8-containing
fields will have a net effect of making the database leaner, so I think
it is worth a wide spread effort not just in keystone but in all of
OpenStack.
More information about the OpenStack-dev
mailing list