removing use of pkg_resources to improve command line app performance

Doug Hellmann doug at doughellmann.com
Mon Jul 6 19:03:00 UTC 2020



> On Jul 6, 2020, at 2:54 PM, Sean Mooney <smooney at redhat.com> wrote:
> 
> On Mon, 2020-07-06 at 14:37 -0400, Doug Hellmann wrote:
>> We have had a long-standing issue with the performance of the openstack command line tool. At least part of the
>> startup cost is the time taken in scanning for all of the plugins that are installed, which is a side-effect of
>> importing pkg_resources. To fix that, we need to eliminate all use of pkg_resources in code that would be used by a
>> command line application (long-running services are candidates, too, but the benefit is bigger in short-lived command
>> line apps).
>> 
>> Python 3.8 added a new library importlib.metadata, which also has an entry points API. It is more efficient, and
>> produces data in a format that can be cached to make it even faster. I have started adding support for that caching to
>> stevedore [0], which is the Oslo library for managing application plugins. For version of python earlier than 3.8, the
>> same library is available on PyPI as “importlib_metadata”.
> based on https://opendev.org/openstack/governance/src/branch/master/reference/runtimes/victoria.rst we still need
> to support 3.6 for victoria. is there a backport lib like mock for this on older python releases?

Yes, importlib_metadata is on PyPI and available all the way back to 2.7. It is already in the requirements list, and if applications switch to using stevedore instead of scanning plugins themselves the implementation details of which version of the library is invoked will be hidden.

>> 
>> A big part of the implementation work will actually be removing the use of pkg_resources in places other than
>> stevedore. We have a couple of different use patterns to consider and replace in different ways.
>> 
>> First, anything using iter_entry_points() should use a stevedore extension manager instead. There are a few of them to
>> choose from, based on how the plugins will be used. The stevedore docs [1] include a tutorial and documentation for
>> all of the classes and their uses. Most calls to iter_entry_points() can be replaced with a stevedore.ExtensionManager
>> directly, but the other managers are meant to implement common access patterns like selecting a subset (or just one)
>> of the available plugins by name.
>> 
>> Second, we have a few places where pkg_resources.get_distribution(name).version is used to discover a package’s
>> installed version. Those can be changed to use importlib.metadata.version() instead, as in [2]. This is *much* faster
>> because importlib goes directly to the metadata file for the named package instead of looking through all of the
>> installed packages.
>> 
>> Finally, any code using any properties of the EntryPoint returned by stevedore other than “name” and “load()” may need
>> to be updated. The new EntryPoint class in importlib.metadata is not 100% compatible with the one from pkg_resources.
>> The same data is there, but sometimes it is named differently. If we need a compatibility layer we could put that in
>> stevedore, but it is unusual to need access to any of the internals of EntryPoint and it’s typically better to use the
>> manager abstractions in stevedore instead of manipulating EntryPoint instances directly.
>> 
>> I have started making some of the changes [3], but I’m doing this in my quarantine-induced spare time so it’s likely
>> to take a while. If you want to pitch in, I would appreciate it. I am using the topic “osc-performance”, since the
>> work is related to making python-openstackclient faster. Feel free to tag me for reviews on your patches.
>> 
>> Doug
>> 
>> [0] https://review.opendev.org/#/c/739306/
>> [1] https://docs.openstack.org/stevedore/latest/
>> [2] https://review.opendev.org/#/c/739379/2
>> [3] https://review.opendev.org/#/q/topic:osc-performance




More information about the openstack-discuss mailing list