[Openstack] Nova string encoding

Andrew Bogott abogott at wikimedia.org
Tue Feb 14 02:16:32 UTC 2012


On 2/13/12 7:00 PM, Joshua Harlow wrote:
> Isn't the command line interface just a setting on the "terminal" app 
> you are using?
I'm sorry if I wasn't clear before.  What's happening is: I am using a 
utf8 shell (which is, I believe, normal.)  Nova-manage is receiving an 
argument and storing it as an 8-bit 'string'.  That is already wrong, 
because we've now lost track of what kind of 8-bit string it is.  Some 
parts of the code probably interpret it as UTF8, but the code in the bug 
I'm encountering is interpreting it as ASCII.  The 'string' type in 
Python 2 is known to be ambiguous in this way.  Because UTF8 and ASCII 
overlap for certain values, this ambiguity is seldom encountered by 
Americans.

One solution to this is to just declare "All strings in Nova will 
henceforth be treated as UTF8."  That may be the current intent, but it 
is not actually the case.  It's also not a great policy because it would 
have to be enforced 'by hand' due to Python 2's ongoing ignorance about 
encodings.

A more correct design which allows for future flexibility would look 
like this:

1)  Adopt a standard for what encoding is used for all 
implicitly-encoded IO.  (I would propose that that standard be UTF8 
rather than ASCII.)

2)  At all points where strings enter Python (e.g. commandline args) 
immediately decode them into unicode (which can unambiguously contain 
all possible 8-bit encodings.)

3) At all points where 'unicodes' exit Python (being written to stdout 
or a log file or a database) explicitly encode them as appropriate 
(generally UTF8, again, especially if we're ever going to read them back 
in.)

That approach is the one I'm most familiar with, and the one advocated 
for here:  http://farmdev.com/talks/unicode/.

So...back to my original question about what the policy is:  Can I 
assume that the answer is "There is no policy regarding string encoding 
but we've been lucky so far"?

-Andrew

> At least on a mac there is a terminal->preferences->advanced which 
> specifies which encoding to use (mine is UTF-8).
>
> Was that tried/verified?
>
> On 2/13/12 3:52 PM, "Andrew Bogott" <abogott at wikimedia.org> wrote:
>
>     On 2/13/12 5:04 PM, Naveed Massjouni wrote:
>     > Very recently, a change got in that converts all tables (except 1) to
>     > utf8 encoding, for the mysql engine. I manually tested creating
>     > servers with unicode names and with unicode metadata, and it worked
>     > fine. Make sure you are running against the latest code. -Naveed
>
>     That's a step in the right direction, but doesn't completely address
>     what I'm asking, unless by 'all tables' you meant 'all tables and also
>     all internal variables and also all REST and Commandline interfaces.'
>     Fixing my particular issue is straightforward, but the fact that I'm
>     seeing the bug in the first place suggests that there's no standard
>     encoding currently enforced.  Which seems bad.
>
>
>
>     _______________________________________________
>     Mailing list: https://launchpad.net/~openstack
>     <https://launchpad.net/%7Eopenstack>
>     Post to     : openstack at lists.launchpad.net
>     Unsubscribe : https://launchpad.net/~openstack
>     <https://launchpad.net/%7Eopenstack>
>     More help   : https://help.launchpad.net/ListHelp
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20120213/b349c908/attachment.html>


More information about the Openstack mailing list