[openstack-dev] UTF-8 required charset/encoding for openstack database?

Doug Hellmann doug.hellmann at dreamhost.com
Tue Mar 18 22:08:36 UTC 2014


On Mon, Mar 10, 2014 at 4:02 PM, Ben Nemec <openstack at nemebean.com> wrote:

> On 2014-03-10 12:24, Chris Friesen wrote:
>
>> Hi,
>>
>> I'm using havana and recent we ran into an issue with heat related to
>> character sets.
>>
>> In heat/db/sqlalchemy/api.py in user_creds_get() we call
>> _decrypt() on an encrypted password stored in the database and then
>> try to convert the result to unicode.  Today we hit a case where this
>> errored out with the following message:
>>
>> UnicodeDecodeError: 'utf8' codec can't decode byte 0xf2 in position 0:
>> invalid continuation byte
>>
>> We're using postgres and currently all the databases are using
>> SQL_ASCII as the charset.
>>
>> I see that in icehouse heat will complain if you're using mysql and
>> not using UTF-8.  There doesn't seem to be any checks for other
>> databases though.
>>
>> It looks like devstack creates most databases as UTF-8 but uses latin1
>> for nova/nova_bm/nova_cell.  I assume this is because nova expects to
>> migrate the db to UTF-8 later.  Given that those migrations specify a
>> character set only for mysql, when using postgres should we explicitly
>> default to UTF-8 for everything?
>>
>> Thanks,
>> Chris
>>
>
> We just had a discussion about this in #openstack-oslo too.  See the
> discussion starting at 2014-03-10T16:32:26 http://eavesdrop.openstack.
> org/irclogs/%23openstack-oslo/%23openstack-oslo.2014-03-10.log
>
> While it seems Heat does require utf8 (or at least matching character
> sets) across all tables, I'm not sure the current solution is good.  It
> seems like we may want a migration to help with this for anyone who might
> already have mismatched tables.  There's a lot of overlap between that
> discussion and how to handle Postgres with this, I think.
>
> I don't have a definite answer for any of this yet but I think it is
> something we need to figure out, so hopefully we can get some input from
> people who know more about the encoding requirements of the Heat and other
> projects' databases.
>
> -Ben
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

Based on the discussion from the project meeting today [1], the Glance
team is going to write a migration to fix the database as the other
projects have (we have not seen issues with corrupted data, so we believe
this to be safe). However, there is one snag. In a follow-up conversation
with Ben in #openstack-oslo, he pointed out that no migrations will run
until the encoding is correct, so we do need to make some changes to the db
code in oslo.

Here's what I think we need to do:

1. In oslo, db_sync() needs a boolean to control whether
_db_schema_sanity_check() is called. This is an all-or-nothing flag (not
the "for some tables" implementation that was proposed).

2. Glance needs a migration to change the encoding of their tables.

3. In glance-manage, the code that calls upgrade migrations needs to look
at the current state and figure out if the requested state is before or
after the migration created in step 2. If it is before, it passes False to
disable the sanity check. If it is after, it passes True to enforce the
sanity check.

Ben, did I miss any details?

Doug

[1]
http://eavesdrop.openstack.org/meetings/project/2014/project.2014-03-18-21.03.log.txt

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140318/1a491052/attachment.html>


More information about the OpenStack-dev mailing list