[openstack-dev] [oslo] i18n Message improvements

John Dennis jdennis at redhat.com
Thu Oct 17 18:24:19 UTC 2013


On 10/17/2013 12:22 PM,  Luis A. Garcia wrote:
> On 10/16/2013 1:11 PM, Doug Hellmann wrote:
>>
>> [snip]
>> Option 3 is closer to the new plan for Icehouse, which is to have _()
>> return a Message, allow Message to work in a few contexts like a string
>> (so that, for example, log calls and exceptions can be left alone, even
>> if they use % to combine a translated string with arguments), but then
>> have the logging and API code explicitly handle the translation of
>> Message instances so we can always pass unicode objects outside of
>> OpenStack code (to logging or to web frameworks). Since the logging code
>> is part of Oslo and the API code can be, this seemed to provide
>> isolation while removing most of the magic.
>>
> 
> I think this is exactly what we have right now inherited form Havana. 
> The _() returns a Message that is then translated on-demand by the API 
> or in a special Translation log handler.
> 
> We just did not make Message look and feel enough like a str() and some 
> outside components (jsonifier in Glance and log Formatter all over) did 
> not know how to handle non text types correctly when non-ascii 
> characters were present.
> 
> I think extending from unicode and removing all the implementations in 
> place such that the unicode implementation kick in for all magic methods 
> will solve the problems we saw at the end of Havana.

I'm relatively new to OpenStack so I can't comment on prior OpenStack
implementations but I'm a long standing veteran of Python i18n issues.

What you're describing sounds a lot like problems that result from the
fact Python's default encoding is ASCII as opposed to the more sensible
UTF-8. I have a long write up on this issue from a few years ago but
I'll cut to the chase. Python will attempt to automatically encode
Unicode objects into ASCII during output which will fail if there are
non-ASCII code points in the Unicode. Python does this is in two
distinct contexts depending on whether destination of the output is a
file or terminal. If it's a terminal it attempts to use the encoding
associated with the TTY. Hence you can different results if you output
to a TTY or a file handle.

The simple solution to many of the encoding exceptions that Python will
throw is to override the default encoding and change it to UTF-8. But
the default encoding is locked by site.py due to internal Python string
optimizations which cache the default encoded version of the string so
the encoding happens only once. Changing the default encoding would
invalidate cached strings and there is no mechanism to deal with that,
that's why the default encoding is locked. But you can change the
default encoding using this trick if you do early enough during the
module loading process:

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

The reason this works is because site.py deletes the setdefaultencoding
from the sys module, but after reloading sys it's available again. One
can also use a tiny CPython module to set the default encoding without
having to use the sys reload trick. The following illustrates the reload
trick:

$ python
Python 2.7.3 (default, Aug  9 2012, 17:23:57)
[GCC 4.7.1 20120720 (Red Hat 4.7.1-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getdefaultencoding()
'ascii'
>>> sys.setdefaultencoding('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'setdefaultencoding'
>>> reload(sys)
<module 'sys' (built-in)>
>>> sys.setdefaultencoding('utf-8')
>>> sys.getdefaultencoding()
'utf-8'


Not fully undersanding the role of Python's default encoding and how
it's application differs between terminal and non-terminal output can
cause a lot of confusion and misunderstanding which can sometimes lead
to false conclusions as to what is going wrong.

If I get a chance I'll try to publicly post my write-up on Python i18n
issues.


-- 
John



More information about the OpenStack-dev mailing list