removing use of pkg_resources to improve command line app performance

Doug Hellmann doug at doughellmann.com
Mon Jul 6 19:21:06 UTC 2020



> On Jul 6, 2020, at 2:37 PM, Doug Hellmann <doug at doughellmann.com> wrote:
> 
> We have had a long-standing issue with the performance of the openstack command line tool. At least part of the startup cost is the time taken in scanning for all of the plugins that are installed, which is a side-effect of importing pkg_resources. To fix that, we need to eliminate all use of pkg_resources in code that would be used by a command line application (long-running services are candidates, too, but the benefit is bigger in short-lived command line apps).
> 
> Python 3.8 added a new library importlib.metadata, which also has an entry points API. It is more efficient, and produces data in a format that can be cached to make it even faster. I have started adding support for that caching to stevedore [0], which is the Oslo library for managing application plugins. For version of python earlier than 3.8, the same library is available on PyPI as “importlib_metadata”.
> 
> A big part of the implementation work will actually be removing the use of pkg_resources in places other than stevedore. We have a couple of different use patterns to consider and replace in different ways.
> 
> First, anything using iter_entry_points() should use a stevedore extension manager instead. There are a few of them to choose from, based on how the plugins will be used. The stevedore docs [1] include a tutorial and documentation for all of the classes and their uses. Most calls to iter_entry_points() can be replaced with a stevedore.ExtensionManager directly, but the other managers are meant to implement common access patterns like selecting a subset (or just one) of the available plugins by name.
> 
> Second, we have a few places where pkg_resources.get_distribution(name).version is used to discover a package’s installed version. Those can be changed to use importlib.metadata.version() instead, as in [2]. This is *much* faster because importlib goes directly to the metadata file for the named package instead of looking through all of the installed packages.
> 
> Finally, any code using any properties of the EntryPoint returned by stevedore other than “name” and “load()” may need to be updated. The new EntryPoint class in importlib.metadata is not 100% compatible with the one from pkg_resources. The same data is there, but sometimes it is named differently. If we need a compatibility layer we could put that in stevedore, but it is unusual to need access to any of the internals of EntryPoint and it’s typically better to use the manager abstractions in stevedore instead of manipulating EntryPoint instances directly.
> 
> I have started making some of the changes [3], but I’m doing this in my quarantine-induced spare time so it’s likely to take a while. If you want to pitch in, I would appreciate it. I am using the topic “osc-performance”, since the work is related to making python-openstackclient faster. Feel free to tag me for reviews on your patches.
> 
> Doug
> 
> [0] https://review.opendev.org/#/c/739306/
> [1] https://docs.openstack.org/stevedore/latest/
> [2] https://review.opendev.org/#/c/739379/2
> [3] https://review.opendev.org/#/q/topic:osc-performance

I neglected to mention that there are uses of pkg_resources outside of OpenStack code in libraries used by python-openstackclient. I found a use in dogpile and another in cmd2. I haven’t started working on patches to those, yet. If someone wants to do a more extensive search that would be very helpful. I started an etherpad to keep track of the work that’s in progress: https://etherpad.opendev.org/p/osc-performance




More information about the openstack-discuss mailing list