[openstack-dev] [all][python3] use of six.iteritems()
Robert Collins
robertc at robertcollins.net
Thu Jun 11 05:16:54 UTC 2015
On 11 June 2015 at 15:48, Dolph Mathews <dolph.mathews at gmail.com> wrote:
> tl;dr .iteritems() is faster and more memory efficient than .items() in
> python2
>
>
> Using xrange() in python2 instead of range() because it's more memory
> efficient and consistent between python 2 and 3...
>
> # xrange() + .items()
> python -m timeit -n 20 for\ i\ in\
> dict(enumerate(xrange(1000000))).items():\ pass
> 20 loops, best of 3: 729 msec per loop
> peak memory usage: 203 megabytes
This test conflates setup and execution. Better like my example,
because otherwise you're not testing iteritems vs items, you're
testing dictionary creation time; likewise memory pressure. Your times
are meaningless as it stands.
To test memory pressure, don't use timeit. Just use the interpreter.
$ python
Python 2.7.8 (default, Oct 20 2014, 15:05:19)
[GCC 4.9.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> d = dict(enumerate(range(1000000)))
>>> import os
>>> os.getpid()
28345
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
28345 robertc 20 0 127260 104568 4744 S 0.0 0.6 0:00.17
python
>>> i = d.items()
28345 robertc 20 0 206524 183560 4744 S 0.0 1.1 0:00.59
python
183560-104568=80M to hold a reference to all 1 million items, which
indeed is not as efficient as python3. So *IF* we had a million item
dict, and absolutely nothing else around, we should care.
But again - where in OpenStack does this matter the slightest?
No one has disputed that they are different. The assertion that it
matters is what is out of line with our reality.
10000 items:
28399 robertc 20 0 31404 8480 4612 S 0.0 0.1 0:00.01
python
28399 robertc 20 0 32172 9268 4612 S 0.0 0.1 0:00.01
python
9268-8489=0.8M which is indeed 2 orders of magnitude less. And I'd
STILL challenge anyone to find a place where 10000 items are being
passed around within OpenStack's components without it being a bug
today.
Optimising away under a M of data when we shouldn't have that many
rows/items/whatever in memory in the first place is just entirely
missing the point of programming in Python.
-Rob
--
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Converged Cloud
More information about the OpenStack-dev
mailing list