[openstack-dev] [all] warning about __init__ importing modules - fast CLIs

Robert Collins robertc at robertcollins.net
Mon Mar 16 23:23:41 UTC 2015


So, one of the things that we sometimes do in an __init__.py is this:

all = ["submodule"]
import submodule

This means users can do

import mymodule
mymodule.submodule

and it works.

This is actually a bit of an anti-pattern in the Python space, because
to import mymodule.othersubmodule we'll always pay the import cost of
mymodule.submodule whether or not any code from it is used.

And the import cost can be substantial.

Take for instance http://pad.lv/1431649 which is about osc being slow,
and some of the slowness is likely due to the cost of importing unused
modules from python-keystoneclient.

In general, it is important for snappy short lived processes that only
the needed code is imported. And that implies a few things in library
code that they consume. CLI's are the most prevalent example of such
short lived processes (including rootwrap's CLI thunk still).
https://files.bemusement.org/talks/OSDC2008-FastPython/ is a nice
summary of this btw by one of the other bzr cores back in the day -
and not much has changed since then. We'll likely want to port the
profile-imports facility over to our tooling to really track things
down, since the default Python tools don't give us timestamps (hey,
someone want to add that to python -v ?).

So - the constraints I'd propose for libraries we use from CLI's,
including our python-*client:
 - import libraryname should be fast - no more than a ms or so. Timing
with .pyc files is ok.
   To time it (hot cache) - something like the following
     python -m timeit -s 'import sys; o=dict(sys.modules)' 'import
keystoneclient; sys.modules.clear();sys.modules.update(o)'
   right now keystoneclient is somewhat slow: 10 loops, best of 3: 220
msec per loop
   Timing cold cache is harder, something like:
    import datetime
    import subprocess
    subprocess.call('echo 3 | sudo tee /proc/sys/vm/drop_caches', shell=True)
    start = datetime.datetime.now()
    import keystoneclient
    stop = datetime.datetime.now()
    print stop-start

   should get a decent approximation. Right now I see 0:00:00.506059 -
or 500ms. On an SSD. Try it on spinning rust and I think you'll cry.

 - as a corollary, __init__ should not import things unless *every use
of the library ever* will need it.

-Rob

-- 
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Converged Cloud



More information about the OpenStack-dev mailing list