[openstack-dev] [all][python3] use of six.iteritems()

Mike Bayer mbayer at redhat.com
Thu Jun 11 14:12:10 UTC 2015



On 6/10/15 11:48 PM, Dolph Mathews wrote:
> tl;dr *.iteritems() is faster and more memory efficient than .items() 
> in python2*
>
>
> Using xrange() in python2 instead of range() because it's more memory 
> efficient and consistent between python 2 and 3...
>
> # xrange() + .items()
> python -m timeit -n 20 for\ i\ in\ 
> dict(enumerate(xrange(1000000))).items():\ pass
> 20 loops, best of 3: 729 msec per loop
> peak memory usage: 203 megabytes
>
> # xrange() + .iteritems()
> python -m timeit -n 20 for\ i\ in\ 
> dict(enumerate(xrange(1000000))).iteritems():\ pass
> 20 loops, best of 3: 644 msec per loop
> peak memory usage: 176 megabytes
>
> # python 3
> python3 -m timeit -n 20 for\ i\ in\ 
> dict(enumerate(range(1000000))).items():\ pass
> 20 loops, best of 3: 826 msec per loop
> peak memory usage: 198 megabytes
it is just me, or are these differences pretty negligible considering 
this is the "1 million item dictionary", which in itself is a unicorn in 
openstack code or really most code anywhere?

as was stated before, if we have million-item dictionaries floating 
around, that code has problems.   I already have to wait full seconds 
for responses to come back when I play around with Neutron + Horizon in 
a devstack VM, and that's with no data at all.  100ms extra for a 
hypothetical million item structure would be long after the whole app 
has fallen over from having just ten thousand of anything, much less a 
million.

My only concern with items() is that it is semantically different in 
Py2k / Py3k.  Code that would otherwise have a "dictionary changed size" 
issue under iteritems() / py3k items() would succeed under py2k 
items().   If such a coding mistake is not covered by tests (as this is 
a data-dependent error condition), it would manifest as a sudden error 
condition on Py3k only.



>
>
> And if you really want to see the results with range() in python2...
>
> # range() + .items()
> python -m timeit -n 20 for\ i\ in\ 
> dict(enumerate(range(1000000))).items():\ pass
> 20 loops, best of 3: 851 msec per loop
> peak memory usage: 254 megabytes
>
> # range() + .iteritems()
> python -m timeit -n 20 for\ i\ in\ 
> dict(enumerate(range(1000000))).iteritems():\ pass
> 20 loops, best of 3: 919 msec per loop
> peak memory usage: 184 megabytes
>
>
> To benchmark memory consumption, I used the following on bare metal:
>
> $ valgrind --tool=massif --pages-as-heap=yes 
> -massif-out-file=massif.out $COMMAND_FROM_ABOVE
> $ cat massif.out | grep mem_heap_B | sort -u
>
> $ python2 --version
> Python 2.7.9
>
> $ python3 --version
> Python 3.4.3
>
>
> On Wed, Jun 10, 2015 at 8:36 PM, gordon chung <gord at live.ca 
> <mailto:gord at live.ca>> wrote:
>
>     > Date: Wed, 10 Jun 2015 21:33:44 +1200
>     > From: robertc at robertcollins.net <mailto:robertc at robertcollins.net>
>     > To: openstack-dev at lists.openstack.org
>     <mailto:openstack-dev at lists.openstack.org>
>     > Subject: Re: [openstack-dev] [all][python3] use of six.iteritems()
>     >
>     > On 10 June 2015 at 17:22, gordon chung <gord at live.ca
>     <mailto:gord at live.ca>> wrote:
>     > > maybe the suggestion should be "don't blindly apply
>     six.iteritems or items" rather than don't apply iteritems at all.
>     admittedly, it's a massive eyesore, but it's a very real use case
>     that some projects deal with large data results and to enforce the
>     latter policy can have negative effects[1]. one "million item
>     dictionary" might be negligible but in a multi-user, multi-*
>     environment that can have a significant impact on the amount
>     memory required to store everything.
>     >
>     > > [1] disclaimer: i have no real world results but i assume
>     memory management was the reason for the switch in logic from py2
>     to py3
>     >
>     > I wouldn't make that assumption.
>     >
>     > And no, memory isn't an issue. If you have a million item dict,
>     > ignoring the internal overheads, the dict needs 1 million object
>     > pointers. The size of a list with those pointers in it is 1M
>     (pointer
>     > size in bytes). E.g. 4M or 8M. Nothing to worry about given the
>     > footprint of such a program :)
>
>     iiuc, items() (in py2) will create a copy of  the dictionary in
>     memory to be processed. this is useful for cases such as
>     concurrency where you want to ensure consistency but doing a quick
>     test i noticed a massive spike in memory usage between items() and
>     iteritems.
>
>     'for i in dict(enumerate(range(1000000))).items(): pass' consumes
>     significantly more memory than 'for i in
>     dict(enumerate(range(1000000))).iteritems(): pass'. on my system,
>     the difference in memory consumption was double when using items()
>     vs iteritems() and the cpu util was significantly more as well...
>     let me know if there's anything that stands out as inaccurate.
>
>     unless there's something wrong with my ignorant testing above, i
>     think it's something projects should consider when mass applying
>     any iteritems/items patch.
>
>     cheers,
>     gord
>
>     __________________________________________________________________________
>     OpenStack Development Mailing List (not for usage questions)
>     Unsubscribe:
>     OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>     <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150611/28a37178/attachment.html>


More information about the OpenStack-dev mailing list