<div dir="ltr"><br><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Oct 17, 2013 at 2:24 PM, John Dennis <span dir="ltr"><<a href="mailto:jdennis@redhat.com" target="_blank">jdennis@redhat.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On 10/17/2013 12:22 PM, Luis A. Garcia wrote:<br>
> On 10/16/2013 1:11 PM, Doug Hellmann wrote:<br>
>><br>
>> [snip]<br>
>> Option 3 is closer to the new plan for Icehouse, which is to have _()<br>
>> return a Message, allow Message to work in a few contexts like a string<br>
>> (so that, for example, log calls and exceptions can be left alone, even<br>
>> if they use % to combine a translated string with arguments), but then<br>
>> have the logging and API code explicitly handle the translation of<br>
>> Message instances so we can always pass unicode objects outside of<br>
>> OpenStack code (to logging or to web frameworks). Since the logging code<br>
>> is part of Oslo and the API code can be, this seemed to provide<br>
>> isolation while removing most of the magic.<br>
>><br>
><br>
> I think this is exactly what we have right now inherited form Havana.<br>
> The _() returns a Message that is then translated on-demand by the API<br>
> or in a special Translation log handler.<br>
><br>
> We just did not make Message look and feel enough like a str() and some<br>
> outside components (jsonifier in Glance and log Formatter all over) did<br>
> not know how to handle non text types correctly when non-ascii<br>
> characters were present.<br>
><br>
> I think extending from unicode and removing all the implementations in<br>
> place such that the unicode implementation kick in for all magic methods<br>
> will solve the problems we saw at the end of Havana.<br>
<br>
</div>I'm relatively new to OpenStack so I can't comment on prior OpenStack<br>
implementations but I'm a long standing veteran of Python i18n issues.<br>
<br>
What you're describing sounds a lot like problems that result from the<br>
fact Python's default encoding is ASCII as opposed to the more sensible<br>
UTF-8. I have a long write up on this issue from a few years ago but<br>
I'll cut to the chase. Python will attempt to automatically encode<br>
Unicode objects into ASCII during output which will fail if there are<br>
non-ASCII code points in the Unicode. Python does this is in two<br>
distinct contexts depending on whether destination of the output is a<br>
file or terminal. If it's a terminal it attempts to use the encoding<br>
associated with the TTY. Hence you can different results if you output<br>
to a TTY or a file handle.<br></blockquote><div><br></div><div>That was related to the problem we had with logging and Message instances.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
The simple solution to many of the encoding exceptions that Python will<br>
throw is to override the default encoding and change it to UTF-8. But<br>
the default encoding is locked by site.py due to internal Python string<br>
optimizations which cache the default encoded version of the string so<br>
the encoding happens only once. Changing the default encoding would<br>
invalidate cached strings and there is no mechanism to deal with that,<br>
that's why the default encoding is locked. But you can change the<br>
default encoding using this trick if you do early enough during the<br>
module loading process:<br></blockquote><div><br></div><div>I don't think we want to have force the encoding at startup. Setting the locale properly through the environment and then using unicode objects also solves the issue without any startup timing issues, and allows deployers to choose the encoding for output.</div>
<div><br></div><div>Doug</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
import sys<br>
reload(sys)<br>
sys.setdefaultencoding('utf-8')<br>
<br>
The reason this works is because site.py deletes the setdefaultencoding<br>
from the sys module, but after reloading sys it's available again. One<br>
can also use a tiny CPython module to set the default encoding without<br>
having to use the sys reload trick. The following illustrates the reload<br>
trick:<br>
<br>
$ python<br>
Python 2.7.3 (default, Aug 9 2012, 17:23:57)<br>
[GCC 4.7.1 20120720 (Red Hat 4.7.1-5)] on linux2<br>
Type "help", "copyright", "credits" or "license" for more information.<br>
>>> import sys<br>
>>> sys.getdefaultencoding()<br>
'ascii'<br>
>>> sys.setdefaultencoding('utf-8')<br>
Traceback (most recent call last):<br>
File "<stdin>", line 1, in <module><br>
AttributeError: 'module' object has no attribute 'setdefaultencoding'<br>
>>> reload(sys)<br>
<module 'sys' (built-in)><br>
>>> sys.setdefaultencoding('utf-8')<br>
>>> sys.getdefaultencoding()<br>
'utf-8'<br>
<br>
<br>
Not fully undersanding the role of Python's default encoding and how<br>
it's application differs between terminal and non-terminal output can<br>
cause a lot of confusion and misunderstanding which can sometimes lead<br>
to false conclusions as to what is going wrong.<br>
<br>
If I get a chance I'll try to publicly post my write-up on Python i18n<br>
issues. </blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<span class="HOEnZb"><font color="#888888"><br>
<br>
--<br>
John<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
_______________________________________________<br>
OpenStack-dev mailing list<br>
<a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
</div></div></blockquote></div><br></div></div>