[openstack-dev] UTF-8 required charset/encoding for openstack database?

Clint Byrum clint at fewbar.com
Tue Mar 11 23:50:04 UTC 2014


Excerpts from Ben Nemec's message of 2014-03-10 13:02:47 -0700:
> On 2014-03-10 12:24, Chris Friesen wrote:
> > Hi,
> > 
> > I'm using havana and recent we ran into an issue with heat related to
> > character sets.
> > 
> > In heat/db/sqlalchemy/api.py in user_creds_get() we call
> > _decrypt() on an encrypted password stored in the database and then
> > try to convert the result to unicode.  Today we hit a case where this
> > errored out with the following message:
> > 
> > UnicodeDecodeError: 'utf8' codec can't decode byte 0xf2 in position 0:
> > invalid continuation byte
> > 
> > We're using postgres and currently all the databases are using
> > SQL_ASCII as the charset.
> > 
> > I see that in icehouse heat will complain if you're using mysql and
> > not using UTF-8.  There doesn't seem to be any checks for other
> > databases though.
> > 
> > It looks like devstack creates most databases as UTF-8 but uses latin1
> > for nova/nova_bm/nova_cell.  I assume this is because nova expects to
> > migrate the db to UTF-8 later.  Given that those migrations specify a
> > character set only for mysql, when using postgres should we explicitly
> > default to UTF-8 for everything?
> > 
> > Thanks,
> > Chris
> 
> We just had a discussion about this in #openstack-oslo too.  See the 
> discussion starting at 2014-03-10T16:32:26 
> http://eavesdrop.openstack.org/irclogs/%23openstack-oslo/%23openstack-oslo.2014-03-10.log
> 
> While it seems Heat does require utf8 (or at least matching character 
> sets) across all tables, I'm not sure the current solution is good.  It 
> seems like we may want a migration to help with this for anyone who 
> might already have mismatched tables.  There's a lot of overlap between 
> that discussion and how to handle Postgres with this, I think.
> 
> I don't have a definite answer for any of this yet but I think it is 
> something we need to figure out, so hopefully we can get some input from 
> people who know more about the encoding requirements of the Heat and 
> other projects' databases.

Doing a migration for this is haphazard. MySQL has _four_ places which
govern character set of any operation.

server charset
client charset
db charset
table charset

There are also per-column charsets but those basically trump all the
others.

But MySQL can't possibly know what you _meant_ when you were inserting
data. So, if you _assumed_ that the database was UTF-8, and inserted
UTF-8 with all of those things accidentally set for latin1, then you
will have UTF-8 in your db, but MySQL will think it is latin1. So if you
now try to alter the table to UTF-8, all of your high-byte strings will
be double-encoded.

It unfortunately takes analysis to determine what the course of action
is. That is why we added the check to Heat, so that it would complain
very early if your tables and/or server configuration were going to
disagree with the assumptions.

It would likely be best for there to be a more generally available
solution for stopping and complaining loudly when a badly configured
database is encountered.



More information about the OpenStack-dev mailing list