[openstack-dev] [oslo] i18n Message improvements

John Dennis jdennis at redhat.com
Fri Oct 18 18:21:52 UTC 2013


On 10/18/2013 12:57 PM, Doug Hellmann wrote:
> 
> 
> 
> On Thu, Oct 17, 2013 at 2:24 PM, John Dennis <jdennis at redhat.com
> <mailto:jdennis at redhat.com>> wrote:
> 
>     On 10/17/2013 12:22 PM,  Luis A. Garcia wrote:
>     > On 10/16/2013 1:11 PM, Doug Hellmann wrote:
>     >>
>     >> [snip]
>     >> Option 3 is closer to the new plan for Icehouse, which is to have _()
>     >> return a Message, allow Message to work in a few contexts like a
>     string
>     >> (so that, for example, log calls and exceptions can be left
>     alone, even
>     >> if they use % to combine a translated string with arguments), but
>     then
>     >> have the logging and API code explicitly handle the translation of
>     >> Message instances so we can always pass unicode objects outside of
>     >> OpenStack code (to logging or to web frameworks). Since the
>     logging code
>     >> is part of Oslo and the API code can be, this seemed to provide
>     >> isolation while removing most of the magic.
>     >>
>     >
>     > I think this is exactly what we have right now inherited form Havana.
>     > The _() returns a Message that is then translated on-demand by the API
>     > or in a special Translation log handler.
>     >
>     > We just did not make Message look and feel enough like a str() and
>     some
>     > outside components (jsonifier in Glance and log Formatter all
>     over) did
>     > not know how to handle non text types correctly when non-ascii
>     > characters were present.
>     >
>     > I think extending from unicode and removing all the implementations in
>     > place such that the unicode implementation kick in for all magic
>     methods
>     > will solve the problems we saw at the end of Havana.
> 
>     I'm relatively new to OpenStack so I can't comment on prior OpenStack
>     implementations but I'm a long standing veteran of Python i18n issues.
> 
>     What you're describing sounds a lot like problems that result from the
>     fact Python's default encoding is ASCII as opposed to the more sensible
>     UTF-8. I have a long write up on this issue from a few years ago but
>     I'll cut to the chase. Python will attempt to automatically encode
>     Unicode objects into ASCII during output which will fail if there are
>     non-ASCII code points in the Unicode. Python does this is in two
>     distinct contexts depending on whether destination of the output is a
>     file or terminal. If it's a terminal it attempts to use the encoding
>     associated with the TTY. Hence you can different results if you output
>     to a TTY or a file handle.
> 
> 
> That was related to the problem we had with logging and Message instances.
>  
> 
> 
>     The simple solution to many of the encoding exceptions that Python will
>     throw is to override the default encoding and change it to UTF-8. But
>     the default encoding is locked by site.py due to internal Python string
>     optimizations which cache the default encoded version of the string so
>     the encoding happens only once. Changing the default encoding would
>     invalidate cached strings and there is no mechanism to deal with that,
>     that's why the default encoding is locked. But you can change the
>     default encoding using this trick if you do early enough during the
>     module loading process:
> 
> 
> I don't think we want to have force the encoding at startup. Setting the
> locale properly through the environment and then using unicode objects
> also solves the issue without any startup timing issues, and allows
> deployers to choose the encoding for output.


Setting the locale only solves some of the problems, the locale is only
respected some of the time. The discrepancies and inconsistencies in how
Unicode conversion occurs in Python2 is maddening and one of the worst
aspects of Python2, it was never carefully thought out, Unicode in
Python2 is basically a bolted on hack that only works if every piece of
code plays by the exact same rules which of course they don't and never
will. I can almost guarantee unless you attack this problem at the core
you'll continue to get bitten. Either code is encoding aware and
explicitly forces a codec (presumably utf-8) or the code is encoding
naive and allows the default encoding to be applied, except when the
locale is respected which overrides the default encoding for the naive
case.

When Python3 was being worked on one of the major objectives was to
clean up the horrible state of strings and unicode in Python2. Python3
to the best of my knowledge has gotten it right. What's the default
encoding in Python3? UTF-8, Can you change the default encoding in
Python3? No. It's hardwired to UTF-8 period. You can override the
encoding at obvious points (e.g. when opening IO streams) or allow
things like TextIOWrapper to default to what
locale.getpreferredencoding() returns, but the main point is it's
consistently applied, it's not the haphazard mess in Python2 where
you're never quite sure how a Unicode string is going to be encoded (in
part because it depends on the destination of the IO).

Given UTF-8 is Python3's default, that UTF-8 is the default in virtually
every network protocol and that UTF-8 is the default in virtually every
Linux library making UTF-8 be default in Python2 applications makes
sense to me. So many problems in Python2 will go away if the default
encoding is UTF-8 but I realize this is not an opinion shared by
everyone. [1]

For those who say forcing the default encoding to be UTF-8 early in the
module load sounds like a terrible hack I would have to agree 100%. But
things aren't always pretty due to unfortunate history that can't be
undone, the best you can do is adapt to something sensible given the
constraints.

[1] Many of the objections centered around making the UTF-8 be the
default for the system supplied Python because every piece of Python
code every executed on the platform might be subject to some unexpected
behavior if the default changed. But we're not in that situation, we're
running a constrained set of code, we're not trying to support every
possible piece of Python code written, rather we need to ensure the code
that executes in OpenStack behaves as we expect to and that expectation
is the the encoding is UTF-8.

>  
> 
> 
>     import sys
>     reload(sys)
>     sys.setdefaultencoding('utf-8')
> 
>     The reason this works is because site.py deletes the setdefaultencoding
>     from the sys module, but after reloading sys it's available again. One
>     can also use a tiny CPython module to set the default encoding without
>     having to use the sys reload trick. The following illustrates the reload
>     trick:
> 
>     $ python
>     Python 2.7.3 (default, Aug  9 2012, 17:23:57)
>     [GCC 4.7.1 20120720 (Red Hat 4.7.1-5)] on linux2
>     Type "help", "copyright", "credits" or "license" for more information.
>     >>> import sys
>     >>> sys.getdefaultencoding()
>     'ascii'
>     >>> sys.setdefaultencoding('utf-8')
>     Traceback (most recent call last):
>       File "<stdin>", line 1, in <module>
>     AttributeError: 'module' object has no attribute 'setdefaultencoding'
>     >>> reload(sys)
>     <module 'sys' (built-in)>
>     >>> sys.setdefaultencoding('utf-8')
>     >>> sys.getdefaultencoding()
>     'utf-8'
> 
> 
>     Not fully undersanding the role of Python's default encoding and how
>     it's application differs between terminal and non-terminal output can
>     cause a lot of confusion and misunderstanding which can sometimes lead
>     to false conclusions as to what is going wrong.
> 
>     If I get a chance I'll try to publicly post my write-up on Python i18n
>     issues. 
> 
> 
> 
>     --
>     John
> 
>     _______________________________________________
>     OpenStack-dev mailing list
>     OpenStack-dev at lists.openstack.org
>     <mailto:OpenStack-dev at lists.openstack.org>
>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> 
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 


-- 
John



More information about the OpenStack-dev mailing list