[openstack-dev] [oslo] strutils: enhance safe_decode() and safe_encode()

Victor Stinner victor.stinner at enovance.com
Thu May 15 15:41:42 UTC 2014


Hi,

The functions safe_decode() and safe_encode() have been ported to Python 3, 
and changed more than once. IMO we can still improve these functions to make 
them more reliable and easier to use.


(1) My first concern is that these functions try to guess user expectation 
about encodings. They use "sys.stdin.encoding or sys.getdefaultencoding()" as 
the default encoding to decode, but this encoding depends on the locale 
encoding (stdin encoding), on stdin (is stdin a TTY? is stdin mocked?), and on 
the Python major version.

IMO the default encoding should be UTF-8 because most OpenStack components 
expect this encoding.

Or maybe users want to display data to the terminal, and so the locale 
encoding should be used? In this case, locale.getpreferredencoding() would be 
more reliable than sys.stdin.encoding.


(2) My second concern is that safe_encode(bytes, incoming, encoding) 
transcodes the bytes string from incoming to encoding if these two encodings 
are different.

When I port code to Python 3, I'm looking for a function to replace this 
common pattern:

    if isinstance(data, six.text_type):
        data = data.encode(encoding)

I don't want to modify data encoding if it is already a bytes string. So I 
would prefer to have:

    def safe_encode(data, encoding='utf-8'):
        if isinstance(data, six.text_type):
            data = data.encode(encoding)
        return data

Changing safe_encode() like this will break applications relying on the 
"transcode" feature (incoming => encoding). If such usage exists, I suggest to 
add a new function (ex: "transcode" ?) with an API fitting this use case. For 
example, the incoming encoding would be mandatory.

Is there really applications using the incoming parameter?


So, what do you think about that?

Victor



More information about the OpenStack-dev mailing list