[openstack-dev] [all] warning about __init__ importing modules - fast CLIs
Robert Collins
robertc at robertcollins.net
Mon Mar 16 23:23:41 UTC 2015
So, one of the things that we sometimes do in an __init__.py is this:
all = ["submodule"]
import submodule
This means users can do
import mymodule
mymodule.submodule
and it works.
This is actually a bit of an anti-pattern in the Python space, because
to import mymodule.othersubmodule we'll always pay the import cost of
mymodule.submodule whether or not any code from it is used.
And the import cost can be substantial.
Take for instance http://pad.lv/1431649 which is about osc being slow,
and some of the slowness is likely due to the cost of importing unused
modules from python-keystoneclient.
In general, it is important for snappy short lived processes that only
the needed code is imported. And that implies a few things in library
code that they consume. CLI's are the most prevalent example of such
short lived processes (including rootwrap's CLI thunk still).
https://files.bemusement.org/talks/OSDC2008-FastPython/ is a nice
summary of this btw by one of the other bzr cores back in the day -
and not much has changed since then. We'll likely want to port the
profile-imports facility over to our tooling to really track things
down, since the default Python tools don't give us timestamps (hey,
someone want to add that to python -v ?).
So - the constraints I'd propose for libraries we use from CLI's,
including our python-*client:
- import libraryname should be fast - no more than a ms or so. Timing
with .pyc files is ok.
To time it (hot cache) - something like the following
python -m timeit -s 'import sys; o=dict(sys.modules)' 'import
keystoneclient; sys.modules.clear();sys.modules.update(o)'
right now keystoneclient is somewhat slow: 10 loops, best of 3: 220
msec per loop
Timing cold cache is harder, something like:
import datetime
import subprocess
subprocess.call('echo 3 | sudo tee /proc/sys/vm/drop_caches', shell=True)
start = datetime.datetime.now()
import keystoneclient
stop = datetime.datetime.now()
print stop-start
should get a decent approximation. Right now I see 0:00:00.506059 -
or 500ms. On an SSD. Try it on spinning rust and I think you'll cry.
- as a corollary, __init__ should not import things unless *every use
of the library ever* will need it.
-Rob
--
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Converged Cloud
More information about the OpenStack-dev
mailing list