[openstack-dev] [all][python3] use of six.iteritems()

Clint Byrum clint at fewbar.com
Thu Jun 11 18:50:28 UTC 2015


Top posting as this is more a response to the whole thread.

My take aways from the most excellent discussion:

* There is some benefit to iteritems in python2 when you need it.
* OpenStack does not seem to need it
  - Except in places that are operating on tens of thousands of large
    objects concurrently such as the nova scheduler.
* six.anything is more code, and more code is more burden in general.

>From this I believe we should distill some clear developer
and reviewer recommendations which should go in our developer docs:

* Do not use six.iteritems in new patches without a clear reason
  stated and attached.
  - Reasons should clearly state why .items() would be a large enough
    burden, such as "this list will be large and stay resident in
    memory for the duration of the program. Each concurrent request
    will have similar lists."
* -1 patches using six.iteritems in flight now with "Please remove or
  justify six.iteritems usage."
* Patches touching code sections which uses six.iteritems should be
  allowed to remove its usage without justification.

I've gone ahead and added this suggestion in a patch to the
infra-manual:

https://review.openstack.org/190757

This looks quite a bit like a hacking rule definition. How strongly do
we feel about this, do we want to require a tag of some kind on lines
that use six.iteritems(), or are we comfortable with this just being in
our python3 porting documentation?

Excerpts from Robert Collins's message of 2015-06-09 17:15:33 -0700:
> I'm very glad folk are working on Python3 ports.
> 
> I'd like to call attention to one little wart in that process: I get
> the feeling that folk are applying a massive regex to find things like
> d.iteritems() and convert that to six.iteritems(d).
> 
> I'd very much prefer that such a regex approach move things to
> d.items(), which is much easier to read.
> 
> Here's why. Firstly, very very very few of our dict iterations are
> going to be performance sensitive in the way that iteritems() matters.
> Secondly, no really - unless you're doing HUGE dicts, it doesn't
> matter. Thirdly. Really, it doesn't.
> 
> At 1 million items the overhead is 54ms[1]. If we're doing inner loops
> on million item dictionaries anywhere in OpenStack today, we have a
> problem. We might want to in e.g. the scheduler... if it held
> in-memory state on a million hypervisors at once, because I don't
> really to to imagine it pulling a million rows from a DB on every
> action. But then, we'd be looking at a whole 54ms. I think we could
> survive, if we did that (which we don't).
> 
> So - please, no six.iteritems().
> 
> Thanks,
> Rob
> 
> 
> [1]
> python2.7 -m timeit -s 'd=dict(enumerate(range(1000000)))' 'for i in
> d.items(): pass'
> 10 loops, best of 3: 76.6 msec per loop
> python2.7 -m timeit -s 'd=dict(enumerate(range(1000000)))' 'for i in
> d.iteritems(): pass'
> 100 loops, best of 3: 22.6 msec per loop
> python3.4 -m timeit -s 'd=dict(enumerate(range(1000000)))' 'for i in
> d.items(): pass'
> 10 loops, best of 3: 18.9 msec per loop
> pypy2.3 -m timeit -s 'd=dict(enumerate(range(1000000)))' 'for i in
> d.items(): pass'
> 10 loops, best of 3: 65.8 msec per loop
> # and out of interest, assuming that that hadn't triggered the JIT....
> but it had.
>  pypy -m timeit -n 1000 -s 'd=dict(enumerate(range(1000000)))' 'for i
> in d.items(): pass'
> 1000 loops, best of 3: 64.3 msec per loop
> 



More information about the OpenStack-dev mailing list