[openstack-dev] UTF-8 required charset/encoding for openstack database?

Ben Nemec openstack at nemebean.com
Mon Mar 10 20:02:47 UTC 2014

On 2014-03-10 12:24, Chris Friesen wrote:
> Hi,
> I'm using havana and recent we ran into an issue with heat related to
> character sets.
> In heat/db/sqlalchemy/api.py in user_creds_get() we call
> _decrypt() on an encrypted password stored in the database and then
> try to convert the result to unicode.  Today we hit a case where this
> errored out with the following message:
> UnicodeDecodeError: 'utf8' codec can't decode byte 0xf2 in position 0:
> invalid continuation byte
> We're using postgres and currently all the databases are using
> SQL_ASCII as the charset.
> I see that in icehouse heat will complain if you're using mysql and
> not using UTF-8.  There doesn't seem to be any checks for other
> databases though.
> It looks like devstack creates most databases as UTF-8 but uses latin1
> for nova/nova_bm/nova_cell.  I assume this is because nova expects to
> migrate the db to UTF-8 later.  Given that those migrations specify a
> character set only for mysql, when using postgres should we explicitly
> default to UTF-8 for everything?
> Thanks,
> Chris

We just had a discussion about this in #openstack-oslo too.  See the 
discussion starting at 2014-03-10T16:32:26 

While it seems Heat does require utf8 (or at least matching character 
sets) across all tables, I'm not sure the current solution is good.  It 
seems like we may want a migration to help with this for anyone who 
might already have mismatched tables.  There's a lot of overlap between 
that discussion and how to handle Postgres with this, I think.

I don't have a definite answer for any of this yet but I think it is 
something we need to figure out, so hopefully we can get some input from 
people who know more about the encoding requirements of the Heat and 
other projects' databases.


