[openstack-dev] [all][python3] use of six.iteritems()
Mike Bayer
mbayer at redhat.com
Thu Jun 11 14:12:10 UTC 2015
On 6/10/15 11:48 PM, Dolph Mathews wrote:
> tl;dr *.iteritems() is faster and more memory efficient than .items()
> in python2*
>
>
> Using xrange() in python2 instead of range() because it's more memory
> efficient and consistent between python 2 and 3...
>
> # xrange() + .items()
> python -m timeit -n 20 for\ i\ in\
> dict(enumerate(xrange(1000000))).items():\ pass
> 20 loops, best of 3: 729 msec per loop
> peak memory usage: 203 megabytes
>
> # xrange() + .iteritems()
> python -m timeit -n 20 for\ i\ in\
> dict(enumerate(xrange(1000000))).iteritems():\ pass
> 20 loops, best of 3: 644 msec per loop
> peak memory usage: 176 megabytes
>
> # python 3
> python3 -m timeit -n 20 for\ i\ in\
> dict(enumerate(range(1000000))).items():\ pass
> 20 loops, best of 3: 826 msec per loop
> peak memory usage: 198 megabytes
it is just me, or are these differences pretty negligible considering
this is the "1 million item dictionary", which in itself is a unicorn in
openstack code or really most code anywhere?
as was stated before, if we have million-item dictionaries floating
around, that code has problems. I already have to wait full seconds
for responses to come back when I play around with Neutron + Horizon in
a devstack VM, and that's with no data at all. 100ms extra for a
hypothetical million item structure would be long after the whole app
has fallen over from having just ten thousand of anything, much less a
million.
My only concern with items() is that it is semantically different in
Py2k / Py3k. Code that would otherwise have a "dictionary changed size"
issue under iteritems() / py3k items() would succeed under py2k
items(). If such a coding mistake is not covered by tests (as this is
a data-dependent error condition), it would manifest as a sudden error
condition on Py3k only.
>
>
> And if you really want to see the results with range() in python2...
>
> # range() + .items()
> python -m timeit -n 20 for\ i\ in\
> dict(enumerate(range(1000000))).items():\ pass
> 20 loops, best of 3: 851 msec per loop
> peak memory usage: 254 megabytes
>
> # range() + .iteritems()
> python -m timeit -n 20 for\ i\ in\
> dict(enumerate(range(1000000))).iteritems():\ pass
> 20 loops, best of 3: 919 msec per loop
> peak memory usage: 184 megabytes
>
>
> To benchmark memory consumption, I used the following on bare metal:
>
> $ valgrind --tool=massif --pages-as-heap=yes
> -massif-out-file=massif.out $COMMAND_FROM_ABOVE
> $ cat massif.out | grep mem_heap_B | sort -u
>
> $ python2 --version
> Python 2.7.9
>
> $ python3 --version
> Python 3.4.3
>
>
> On Wed, Jun 10, 2015 at 8:36 PM, gordon chung <gord at live.ca
> <mailto:gord at live.ca>> wrote:
>
> > Date: Wed, 10 Jun 2015 21:33:44 +1200
> > From: robertc at robertcollins.net <mailto:robertc at robertcollins.net>
> > To: openstack-dev at lists.openstack.org
> <mailto:openstack-dev at lists.openstack.org>
> > Subject: Re: [openstack-dev] [all][python3] use of six.iteritems()
> >
> > On 10 June 2015 at 17:22, gordon chung <gord at live.ca
> <mailto:gord at live.ca>> wrote:
> > > maybe the suggestion should be "don't blindly apply
> six.iteritems or items" rather than don't apply iteritems at all.
> admittedly, it's a massive eyesore, but it's a very real use case
> that some projects deal with large data results and to enforce the
> latter policy can have negative effects[1]. one "million item
> dictionary" might be negligible but in a multi-user, multi-*
> environment that can have a significant impact on the amount
> memory required to store everything.
> >
> > > [1] disclaimer: i have no real world results but i assume
> memory management was the reason for the switch in logic from py2
> to py3
> >
> > I wouldn't make that assumption.
> >
> > And no, memory isn't an issue. If you have a million item dict,
> > ignoring the internal overheads, the dict needs 1 million object
> > pointers. The size of a list with those pointers in it is 1M
> (pointer
> > size in bytes). E.g. 4M or 8M. Nothing to worry about given the
> > footprint of such a program :)
>
> iiuc, items() (in py2) will create a copy of the dictionary in
> memory to be processed. this is useful for cases such as
> concurrency where you want to ensure consistency but doing a quick
> test i noticed a massive spike in memory usage between items() and
> iteritems.
>
> 'for i in dict(enumerate(range(1000000))).items(): pass' consumes
> significantly more memory than 'for i in
> dict(enumerate(range(1000000))).iteritems(): pass'. on my system,
> the difference in memory consumption was double when using items()
> vs iteritems() and the cpu util was significantly more as well...
> let me know if there's anything that stands out as inaccurate.
>
> unless there's something wrong with my ignorant testing above, i
> think it's something projects should consider when mass applying
> any iteritems/items patch.
>
> cheers,
> gord
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150611/28a37178/attachment.html>
More information about the OpenStack-dev
mailing list