[openstack-dev] [oslo] strutils: enhance safe_decode() and safe_encode()

Flavio Percoco flavio at redhat.com
Thu May 22 12:34:31 UTC 2014


On 21/05/14 11:32 -0400, Doug Hellmann wrote:
>On Thu, May 15, 2014 at 11:41 AM, Victor Stinner
><victor.stinner at enovance.com> wrote:
>> Hi,
>>
>> The functions safe_decode() and safe_encode() have been ported to Python 3,
>> and changed more than once. IMO we can still improve these functions to make
>> them more reliable and easier to use.
>>
>>
>> (1) My first concern is that these functions try to guess user expectation
>> about encodings. They use "sys.stdin.encoding or sys.getdefaultencoding()" as
>> the default encoding to decode, but this encoding depends on the locale
>> encoding (stdin encoding), on stdin (is stdin a TTY? is stdin mocked?), and on
>> the Python major version.
>>
>> IMO the default encoding should be UTF-8 because most OpenStack components
>> expect this encoding.
>>
>> Or maybe users want to display data to the terminal, and so the locale
>> encoding should be used? In this case, locale.getpreferredencoding() would be
>> more reliable than sys.stdin.encoding.
>
>From what I can see, most uses of the module are in the client
>programs. If using locale to find a default encoding is the best
>approach, perhaps we should go ahead and make the change you propose.
>
>One place I see safe_decode() used in a questionable way is in heat in
>heat/engine/parser.py where validation errors are being re-raised as
>StackValidationFailed (line 376 in my version). It's not clear why the
>message is processed the way it is, so I would want to understand the
>history before proposing a change there.

The original intent for these 2 functions was to provide a reliable
way to encode/decode the input. As already mentioned in this thread,
it's not good to assume what the best encoding for every case is and I
would also prefer to keep these functions generci - as in, not thought
just for client libraries. We use this module in Glance as well,
unfortunately, not as much as I'd like.

I would prefer the improved-encoding guess to happen outside these
functions, if it's meant for client library. For example, glanceclient
could use `getpreferredencoding` and pass that to
safe_(encode|decode).

Flavio

>
>>
>>
>> (2) My second concern is that safe_encode(bytes, incoming, encoding)
>> transcodes the bytes string from incoming to encoding if these two encodings
>> are different.
>>
>> When I port code to Python 3, I'm looking for a function to replace this
>> common pattern:
>>
>>     if isinstance(data, six.text_type):
>>         data = data.encode(encoding)
>>
>> I don't want to modify data encoding if it is already a bytes string. So I
>> would prefer to have:
>>
>>     def safe_encode(data, encoding='utf-8'):
>>         if isinstance(data, six.text_type):
>>             data = data.encode(encoding)
>>         return data
>>
>> Changing safe_encode() like this will break applications relying on the
>> "transcode" feature (incoming => encoding). If such usage exists, I suggest to
>> add a new function (ex: "transcode" ?) with an API fitting this use case. For
>> example, the incoming encoding would be mandatory.
>>
>> Is there really applications using the incoming parameter?
>
>The only place I see that parameter used in integrated projects is in
>the tests for the module. I didn't check the non-integrated projects.
>Given its symmetry with safe_decode(), I don't really see a problem
>with the current name. Something like the shortcut you propose is
>present in safe_encode(), so I'm not sure what benefit a new function
>brings?

+1

Flavio

P.S: I'm working on graduating strutils from the incubator. I'm glad
you brought this up. I'm almost done with the graduation thing.

-- 
@flaper87
Flavio Percoco
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140522/23813c29/attachment.pgp>


More information about the OpenStack-dev mailing list