[openstack-dev] [oslo] strutils: enhance safe_decode() and safe_encode()

John Dennis jdennis at redhat.com
Wed May 21 16:30:51 UTC 2014


On 05/15/2014 11:41 AM, Victor Stinner wrote:
> Hi,
> 
> The functions safe_decode() and safe_encode() have been ported to Python 3, 
> and changed more than once. IMO we can still improve these functions to make 
> them more reliable and easier to use.
> 
> 
> (1) My first concern is that these functions try to guess user expectation 
> about encodings. They use "sys.stdin.encoding or sys.getdefaultencoding()" as 
> the default encoding to decode, but this encoding depends on the locale 
> encoding (stdin encoding), on stdin (is stdin a TTY? is stdin mocked?), and on 
> the Python major version.
> 
> IMO the default encoding should be UTF-8 because most OpenStack components 
> expect this encoding.
> 
> Or maybe users want to display data to the terminal, and so the locale 
> encoding should be used? In this case, locale.getpreferredencoding() would be 
> more reliable than sys.stdin.encoding.

The problem is you can't know the correct encoding to use until you know
the encoding of the IO stream, therefore I don't think you can correctly
write a generic encode/decode functions. What if you're trying to send
the output to multiple IO streams potentially with different encodings?
Think that's far fetched? Nope, it's one of the nastiest and common
problems in Python2. The default encoding differs depending on whether
the IO target is a tty or not. Therefore code that works fine when
written to the terminal blows up with encoding errors when redirected to
a file (because the TTY probably has UTF-8 and all other encodings
default to ASCII due to sys.defaultencoding).

Another problem is that Python2 default encoding is ASCII but in Python3
it's UTF-8 (IMHO the default encoding in Python2 should have been UTF-8,
that fact it was set to ASCII is the cause of 99% of the encoding
exceptions in Python2).

Given that you don't know what the encoding of the IO stream is I don't
think you should base it on the locale nor sys.stdin. Rather I think we
should just agree everything is UTF-8. If that messes up someones
terminal output I think it's fair to say if you're running OpenStack
you'll need to switch to UTF-8. Anything else requires way more
knowledge than we have available in a generic function. Solving this so
the encodings match for each and every IO stream is very complicated,
note Python3 still punts on this.


-- 
John



More information about the OpenStack-dev mailing list