[openstack-dev] [oslo] strutils: enhance safe_decode() and safe_encode()
Victor Stinner
victor.stinner at enovance.com
Thu May 15 15:41:42 UTC 2014
Hi,
The functions safe_decode() and safe_encode() have been ported to Python 3,
and changed more than once. IMO we can still improve these functions to make
them more reliable and easier to use.
(1) My first concern is that these functions try to guess user expectation
about encodings. They use "sys.stdin.encoding or sys.getdefaultencoding()" as
the default encoding to decode, but this encoding depends on the locale
encoding (stdin encoding), on stdin (is stdin a TTY? is stdin mocked?), and on
the Python major version.
IMO the default encoding should be UTF-8 because most OpenStack components
expect this encoding.
Or maybe users want to display data to the terminal, and so the locale
encoding should be used? In this case, locale.getpreferredencoding() would be
more reliable than sys.stdin.encoding.
(2) My second concern is that safe_encode(bytes, incoming, encoding)
transcodes the bytes string from incoming to encoding if these two encodings
are different.
When I port code to Python 3, I'm looking for a function to replace this
common pattern:
if isinstance(data, six.text_type):
data = data.encode(encoding)
I don't want to modify data encoding if it is already a bytes string. So I
would prefer to have:
def safe_encode(data, encoding='utf-8'):
if isinstance(data, six.text_type):
data = data.encode(encoding)
return data
Changing safe_encode() like this will break applications relying on the
"transcode" feature (incoming => encoding). If such usage exists, I suggest to
add a new function (ex: "transcode" ?) with an API fitting this use case. For
example, the incoming encoding would be mandatory.
Is there really applications using the incoming parameter?
So, what do you think about that?
Victor
More information about the OpenStack-dev
mailing list