[openstack-dev] [all][python3] use of six.iteritems()

gordon chung gord at live.ca
Wed Jun 10 05:22:25 UTC 2015


maybe the suggestion should be "don't blindly apply six.iteritems or items" rather than don't apply iteritems at all. admittedly, it's a massive eyesore, but it's a very real use case that some projects deal with large data results and to enforce the latter policy can have negative effects[1].  one "million item dictionary" might be negligible but in a multi-user, multi-* environment that can have a significant impact on the amount memory required to store everything.

[1] disclaimer: i have no real world results but i assume memory management was the reason for the switch in logic from py2 to py3

cheers,
gord


----------------------------------------
> Date: Wed, 10 Jun 2015 12:15:33 +1200
> From: robertc at robertcollins.net
> To: openstack-dev at lists.openstack.org
> Subject: [openstack-dev] [all][python3] use of six.iteritems()
>
> I'm very glad folk are working on Python3 ports.
>
> I'd like to call attention to one little wart in that process: I get
> the feeling that folk are applying a massive regex to find things like
> d.iteritems() and convert that to six.iteritems(d).
>
> I'd very much prefer that such a regex approach move things to
> d.items(), which is much easier to read.
>
> Here's why. Firstly, very very very few of our dict iterations are
> going to be performance sensitive in the way that iteritems() matters.
> Secondly, no really - unless you're doing HUGE dicts, it doesn't
> matter. Thirdly. Really, it doesn't.
>
> At 1 million items the overhead is 54ms[1]. If we're doing inner loops
> on million item dictionaries anywhere in OpenStack today, we have a
> problem. We might want to in e.g. the scheduler... if it held
> in-memory state on a million hypervisors at once, because I don't
> really to to imagine it pulling a million rows from a DB on every
> action. But then, we'd be looking at a whole 54ms. I think we could
> survive, if we did that (which we don't).
>
> So - please, no six.iteritems().
>
> Thanks,
> Rob
>
>
> [1]
> python2.7 -m timeit -s 'd=dict(enumerate(range(1000000)))' 'for i in
> d.items(): pass'
> 10 loops, best of 3: 76.6 msec per loop
> python2.7 -m timeit -s 'd=dict(enumerate(range(1000000)))' 'for i in
> d.iteritems(): pass'
> 100 loops, best of 3: 22.6 msec per loop
> python3.4 -m timeit -s 'd=dict(enumerate(range(1000000)))' 'for i in
> d.items(): pass'
> 10 loops, best of 3: 18.9 msec per loop
> pypy2.3 -m timeit -s 'd=dict(enumerate(range(1000000)))' 'for i in
> d.items(): pass'
> 10 loops, best of 3: 65.8 msec per loop
> # and out of interest, assuming that that hadn't triggered the JIT....
> but it had.
> pypy -m timeit -n 1000 -s 'd=dict(enumerate(range(1000000)))' 'for i
> in d.items(): pass'
> 1000 loops, best of 3: 64.3 msec per loop
>
> --
> Robert Collins <rbtcollins at hp.com>
> Distinguished Technologist
> HP Converged Cloud
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 		 	   		  


More information about the OpenStack-dev mailing list