[openstack-dev] [all][python3] use of six.iteritems()
Clint Byrum
clint at fewbar.com
Thu Jun 11 18:50:28 UTC 2015
Top posting as this is more a response to the whole thread.
My take aways from the most excellent discussion:
* There is some benefit to iteritems in python2 when you need it.
* OpenStack does not seem to need it
- Except in places that are operating on tens of thousands of large
objects concurrently such as the nova scheduler.
* six.anything is more code, and more code is more burden in general.
>From this I believe we should distill some clear developer
and reviewer recommendations which should go in our developer docs:
* Do not use six.iteritems in new patches without a clear reason
stated and attached.
- Reasons should clearly state why .items() would be a large enough
burden, such as "this list will be large and stay resident in
memory for the duration of the program. Each concurrent request
will have similar lists."
* -1 patches using six.iteritems in flight now with "Please remove or
justify six.iteritems usage."
* Patches touching code sections which uses six.iteritems should be
allowed to remove its usage without justification.
I've gone ahead and added this suggestion in a patch to the
infra-manual:
https://review.openstack.org/190757
This looks quite a bit like a hacking rule definition. How strongly do
we feel about this, do we want to require a tag of some kind on lines
that use six.iteritems(), or are we comfortable with this just being in
our python3 porting documentation?
Excerpts from Robert Collins's message of 2015-06-09 17:15:33 -0700:
> I'm very glad folk are working on Python3 ports.
>
> I'd like to call attention to one little wart in that process: I get
> the feeling that folk are applying a massive regex to find things like
> d.iteritems() and convert that to six.iteritems(d).
>
> I'd very much prefer that such a regex approach move things to
> d.items(), which is much easier to read.
>
> Here's why. Firstly, very very very few of our dict iterations are
> going to be performance sensitive in the way that iteritems() matters.
> Secondly, no really - unless you're doing HUGE dicts, it doesn't
> matter. Thirdly. Really, it doesn't.
>
> At 1 million items the overhead is 54ms[1]. If we're doing inner loops
> on million item dictionaries anywhere in OpenStack today, we have a
> problem. We might want to in e.g. the scheduler... if it held
> in-memory state on a million hypervisors at once, because I don't
> really to to imagine it pulling a million rows from a DB on every
> action. But then, we'd be looking at a whole 54ms. I think we could
> survive, if we did that (which we don't).
>
> So - please, no six.iteritems().
>
> Thanks,
> Rob
>
>
> [1]
> python2.7 -m timeit -s 'd=dict(enumerate(range(1000000)))' 'for i in
> d.items(): pass'
> 10 loops, best of 3: 76.6 msec per loop
> python2.7 -m timeit -s 'd=dict(enumerate(range(1000000)))' 'for i in
> d.iteritems(): pass'
> 100 loops, best of 3: 22.6 msec per loop
> python3.4 -m timeit -s 'd=dict(enumerate(range(1000000)))' 'for i in
> d.items(): pass'
> 10 loops, best of 3: 18.9 msec per loop
> pypy2.3 -m timeit -s 'd=dict(enumerate(range(1000000)))' 'for i in
> d.items(): pass'
> 10 loops, best of 3: 65.8 msec per loop
> # and out of interest, assuming that that hadn't triggered the JIT....
> but it had.
> pypy -m timeit -n 1000 -s 'd=dict(enumerate(range(1000000)))' 'for i
> in d.items(): pass'
> 1000 loops, best of 3: 64.3 msec per loop
>
More information about the OpenStack-dev
mailing list