[openstack-dev] [oslo] strutils: enhance safe_decode() and safe_encode()
Doug Hellmann
doug.hellmann at dreamhost.com
Wed May 21 15:32:56 UTC 2014
On Thu, May 15, 2014 at 11:41 AM, Victor Stinner
<victor.stinner at enovance.com> wrote:
> Hi,
>
> The functions safe_decode() and safe_encode() have been ported to Python 3,
> and changed more than once. IMO we can still improve these functions to make
> them more reliable and easier to use.
>
>
> (1) My first concern is that these functions try to guess user expectation
> about encodings. They use "sys.stdin.encoding or sys.getdefaultencoding()" as
> the default encoding to decode, but this encoding depends on the locale
> encoding (stdin encoding), on stdin (is stdin a TTY? is stdin mocked?), and on
> the Python major version.
>
> IMO the default encoding should be UTF-8 because most OpenStack components
> expect this encoding.
>
> Or maybe users want to display data to the terminal, and so the locale
> encoding should be used? In this case, locale.getpreferredencoding() would be
> more reliable than sys.stdin.encoding.
>From what I can see, most uses of the module are in the client
programs. If using locale to find a default encoding is the best
approach, perhaps we should go ahead and make the change you propose.
One place I see safe_decode() used in a questionable way is in heat in
heat/engine/parser.py where validation errors are being re-raised as
StackValidationFailed (line 376 in my version). It's not clear why the
message is processed the way it is, so I would want to understand the
history before proposing a change there.
>
>
> (2) My second concern is that safe_encode(bytes, incoming, encoding)
> transcodes the bytes string from incoming to encoding if these two encodings
> are different.
>
> When I port code to Python 3, I'm looking for a function to replace this
> common pattern:
>
> if isinstance(data, six.text_type):
> data = data.encode(encoding)
>
> I don't want to modify data encoding if it is already a bytes string. So I
> would prefer to have:
>
> def safe_encode(data, encoding='utf-8'):
> if isinstance(data, six.text_type):
> data = data.encode(encoding)
> return data
>
> Changing safe_encode() like this will break applications relying on the
> "transcode" feature (incoming => encoding). If such usage exists, I suggest to
> add a new function (ex: "transcode" ?) with an API fitting this use case. For
> example, the incoming encoding would be mandatory.
>
> Is there really applications using the incoming parameter?
The only place I see that parameter used in integrated projects is in
the tests for the module. I didn't check the non-integrated projects.
Given its symmetry with safe_decode(), I don't really see a problem
with the current name. Something like the shortcut you propose is
present in safe_encode(), so I'm not sure what benefit a new function
brings?
Doug
>
>
> So, what do you think about that?
>
> Victor
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-dev
mailing list