[openstack-dev] [oslo] strutils: enhance safe_decode() and safe_encode()

Doug Hellmann doug.hellmann at dreamhost.com
Wed May 21 21:50:42 UTC 2014


On Wed, May 21, 2014 at 12:30 PM, John Dennis <jdennis at redhat.com> wrote:
> On 05/15/2014 11:41 AM, Victor Stinner wrote:
>> Hi,
>>
>> The functions safe_decode() and safe_encode() have been ported to Python 3,
>> and changed more than once. IMO we can still improve these functions to make
>> them more reliable and easier to use.
>>
>>
>> (1) My first concern is that these functions try to guess user expectation
>> about encodings. They use "sys.stdin.encoding or sys.getdefaultencoding()" as
>> the default encoding to decode, but this encoding depends on the locale
>> encoding (stdin encoding), on stdin (is stdin a TTY? is stdin mocked?), and on
>> the Python major version.
>>
>> IMO the default encoding should be UTF-8 because most OpenStack components
>> expect this encoding.
>>
>> Or maybe users want to display data to the terminal, and so the locale
>> encoding should be used? In this case, locale.getpreferredencoding() would be
>> more reliable than sys.stdin.encoding.
>
> The problem is you can't know the correct encoding to use until you know
> the encoding of the IO stream, therefore I don't think you can correctly
> write a generic encode/decode functions. What if you're trying to send
> the output to multiple IO streams potentially with different encodings?
> Think that's far fetched? Nope, it's one of the nastiest and common
> problems in Python2. The default encoding differs depending on whether
> the IO target is a tty or not. Therefore code that works fine when
> written to the terminal blows up with encoding errors when redirected to
> a file (because the TTY probably has UTF-8 and all other encodings
> default to ASCII due to sys.defaultencoding).
>
> Another problem is that Python2 default encoding is ASCII but in Python3
> it's UTF-8 (IMHO the default encoding in Python2 should have been UTF-8,
> that fact it was set to ASCII is the cause of 99% of the encoding
> exceptions in Python2).
>
> Given that you don't know what the encoding of the IO stream is I don't
> think you should base it on the locale nor sys.stdin. Rather I think we
> should just agree everything is UTF-8. If that messes up someones
> terminal output I think it's fair to say if you're running OpenStack
> you'll need to switch to UTF-8. Anything else requires way more
> knowledge than we have available in a generic function. Solving this so
> the encodings match for each and every IO stream is very complicated,
> note Python3 still punts on this.

Unfortunately we can't just agree to a single encoding in all cases.
Lots of people use encodings other than UTF-8 for terminals, and
that's where these functions are most frequently used.

Doug

>
>
> --
> John
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list