removing use of pkg_resources to improve command line app performance
We have had a long-standing issue with the performance of the openstack command line tool. At least part of the startup cost is the time taken in scanning for all of the plugins that are installed, which is a side-effect of importing pkg_resources. To fix that, we need to eliminate all use of pkg_resources in code that would be used by a command line application (long-running services are candidates, too, but the benefit is bigger in short-lived command line apps). Python 3.8 added a new library importlib.metadata, which also has an entry points API. It is more efficient, and produces data in a format that can be cached to make it even faster. I have started adding support for that caching to stevedore [0], which is the Oslo library for managing application plugins. For version of python earlier than 3.8, the same library is available on PyPI as “importlib_metadata”. A big part of the implementation work will actually be removing the use of pkg_resources in places other than stevedore. We have a couple of different use patterns to consider and replace in different ways. First, anything using iter_entry_points() should use a stevedore extension manager instead. There are a few of them to choose from, based on how the plugins will be used. The stevedore docs [1] include a tutorial and documentation for all of the classes and their uses. Most calls to iter_entry_points() can be replaced with a stevedore.ExtensionManager directly, but the other managers are meant to implement common access patterns like selecting a subset (or just one) of the available plugins by name. Second, we have a few places where pkg_resources.get_distribution(name).version is used to discover a package’s installed version. Those can be changed to use importlib.metadata.version() instead, as in [2]. This is *much* faster because importlib goes directly to the metadata file for the named package instead of looking through all of the installed packages. Finally, any code using any properties of the EntryPoint returned by stevedore other than “name” and “load()” may need to be updated. The new EntryPoint class in importlib.metadata is not 100% compatible with the one from pkg_resources. The same data is there, but sometimes it is named differently. If we need a compatibility layer we could put that in stevedore, but it is unusual to need access to any of the internals of EntryPoint and it’s typically better to use the manager abstractions in stevedore instead of manipulating EntryPoint instances directly. I have started making some of the changes [3], but I’m doing this in my quarantine-induced spare time so it’s likely to take a while. If you want to pitch in, I would appreciate it. I am using the topic “osc-performance”, since the work is related to making python-openstackclient faster. Feel free to tag me for reviews on your patches. Doug [0] https://review.opendev.org/#/c/739306/ [1] https://docs.openstack.org/stevedore/latest/ [2] https://review.opendev.org/#/c/739379/2 [3] https://review.opendev.org/#/q/topic:osc-performance
We have had a long-standing issue with the performance of the openstack command line tool. At least part of the startup cost is the time taken in scanning for all of the plugins that are installed, which is a side-effect of importing pkg_resources. To fix that, we need to eliminate all use of pkg_resources in code that would be used by a command line application (long-running services are candidates, too, but the benefit is bigger in short-lived command line apps).
Python 3.8 added a new library importlib.metadata, which also has an entry points API. It is more efficient, and produces data in a format that can be cached to make it even faster. I have started adding support for that caching to stevedore [0], which is the Oslo library for managing application plugins. For version of python earlier than 3.8, the same library is available on PyPI as “importlib_metadata”.
On Mon, 2020-07-06 at 14:37 -0400, Doug Hellmann wrote: based on https://opendev.org/openstack/governance/src/branch/master/reference/runtime... we still need to support 3.6 for victoria. is there a backport lib like mock for this on older python releases?
A big part of the implementation work will actually be removing the use of pkg_resources in places other than stevedore. We have a couple of different use patterns to consider and replace in different ways.
First, anything using iter_entry_points() should use a stevedore extension manager instead. There are a few of them to choose from, based on how the plugins will be used. The stevedore docs [1] include a tutorial and documentation for all of the classes and their uses. Most calls to iter_entry_points() can be replaced with a stevedore.ExtensionManager directly, but the other managers are meant to implement common access patterns like selecting a subset (or just one) of the available plugins by name.
Second, we have a few places where pkg_resources.get_distribution(name).version is used to discover a package’s installed version. Those can be changed to use importlib.metadata.version() instead, as in [2]. This is *much* faster because importlib goes directly to the metadata file for the named package instead of looking through all of the installed packages.
Finally, any code using any properties of the EntryPoint returned by stevedore other than “name” and “load()” may need to be updated. The new EntryPoint class in importlib.metadata is not 100% compatible with the one from pkg_resources. The same data is there, but sometimes it is named differently. If we need a compatibility layer we could put that in stevedore, but it is unusual to need access to any of the internals of EntryPoint and it’s typically better to use the manager abstractions in stevedore instead of manipulating EntryPoint instances directly.
I have started making some of the changes [3], but I’m doing this in my quarantine-induced spare time so it’s likely to take a while. If you want to pitch in, I would appreciate it. I am using the topic “osc-performance”, since the work is related to making python-openstackclient faster. Feel free to tag me for reviews on your patches.
Doug
[0] https://review.opendev.org/#/c/739306/ [1] https://docs.openstack.org/stevedore/latest/ [2] https://review.opendev.org/#/c/739379/2 [3] https://review.opendev.org/#/q/topic:osc-performance
On 2020-07-06 19:54:05 +0100 (+0100), Sean Mooney wrote:
On Mon, 2020-07-06 at 14:37 -0400, Doug Hellmann wrote: [...]
Python 3.8 added a new library importlib.metadata, which also has an entry points API. It is more efficient, and produces data in a format that can be cached to make it even faster. I have started adding support for that caching to stevedore [0], which is the Oslo library for managing application plugins. For version of python earlier than 3.8, the same library is available on PyPI as “importlib_metadata”.
based on https://opendev.org/openstack/governance/src/branch/master/reference/runtime... we still need to support 3.6 for victoria. is there a backport lib like mock for this on older python releases? [...]
According to https://pypi.org/project/importlib-metadata/ the current version (1.7.0) supports Python 3.5 and later. Won't that work? -- Jeremy Stanley
On Jul 6, 2020, at 2:54 PM, Sean Mooney <smooney@redhat.com> wrote:
We have had a long-standing issue with the performance of the openstack command line tool. At least part of the startup cost is the time taken in scanning for all of the plugins that are installed, which is a side-effect of importing pkg_resources. To fix that, we need to eliminate all use of pkg_resources in code that would be used by a command line application (long-running services are candidates, too, but the benefit is bigger in short-lived command line apps).
Python 3.8 added a new library importlib.metadata, which also has an entry points API. It is more efficient, and produces data in a format that can be cached to make it even faster. I have started adding support for that caching to stevedore [0], which is the Oslo library for managing application plugins. For version of python earlier than 3.8, the same library is available on PyPI as “importlib_metadata”.
On Mon, 2020-07-06 at 14:37 -0400, Doug Hellmann wrote: based on https://opendev.org/openstack/governance/src/branch/master/reference/runtime... we still need to support 3.6 for victoria. is there a backport lib like mock for this on older python releases?
Yes, importlib_metadata is on PyPI and available all the way back to 2.7. It is already in the requirements list, and if applications switch to using stevedore instead of scanning plugins themselves the implementation details of which version of the library is invoked will be hidden.
A big part of the implementation work will actually be removing the use of pkg_resources in places other than stevedore. We have a couple of different use patterns to consider and replace in different ways.
First, anything using iter_entry_points() should use a stevedore extension manager instead. There are a few of them to choose from, based on how the plugins will be used. The stevedore docs [1] include a tutorial and documentation for all of the classes and their uses. Most calls to iter_entry_points() can be replaced with a stevedore.ExtensionManager directly, but the other managers are meant to implement common access patterns like selecting a subset (or just one) of the available plugins by name.
Second, we have a few places where pkg_resources.get_distribution(name).version is used to discover a package’s installed version. Those can be changed to use importlib.metadata.version() instead, as in [2]. This is *much* faster because importlib goes directly to the metadata file for the named package instead of looking through all of the installed packages.
Finally, any code using any properties of the EntryPoint returned by stevedore other than “name” and “load()” may need to be updated. The new EntryPoint class in importlib.metadata is not 100% compatible with the one from pkg_resources. The same data is there, but sometimes it is named differently. If we need a compatibility layer we could put that in stevedore, but it is unusual to need access to any of the internals of EntryPoint and it’s typically better to use the manager abstractions in stevedore instead of manipulating EntryPoint instances directly.
I have started making some of the changes [3], but I’m doing this in my quarantine-induced spare time so it’s likely to take a while. If you want to pitch in, I would appreciate it. I am using the topic “osc-performance”, since the work is related to making python-openstackclient faster. Feel free to tag me for reviews on your patches.
Doug
[0] https://review.opendev.org/#/c/739306/ [1] https://docs.openstack.org/stevedore/latest/ [2] https://review.opendev.org/#/c/739379/2 [3] https://review.opendev.org/#/q/topic:osc-performance
On Mon, 2020-07-06 at 15:03 -0400, Doug Hellmann wrote:
On Jul 6, 2020, at 2:54 PM, Sean Mooney <smooney@redhat.com> wrote:
On Mon, 2020-07-06 at 14:37 -0400, Doug Hellmann wrote:
We have had a long-standing issue with the performance of the openstack command line tool. At least part of the startup cost is the time taken in scanning for all of the plugins that are installed, which is a side-effect of importing pkg_resources. To fix that, we need to eliminate all use of pkg_resources in code that would be used by a command line application (long-running services are candidates, too, but the benefit is bigger in short-lived command line apps).
Python 3.8 added a new library importlib.metadata, which also has an entry points API. It is more efficient, and produces data in a format that can be cached to make it even faster. I have started adding support for that caching to stevedore [0], which is the Oslo library for managing application plugins. For version of python earlier than 3.8, the same library is available on PyPI as “importlib_metadata”.
based on https://opendev.org/openstack/governance/src/branch/master/reference/runtime... we still need to support 3.6 for victoria. is there a backport lib like mock for this on older python releases?
Yes, importlib_metadata is on PyPI and available all the way back to 2.7. It is already in the requirements list, and if applications switch to using stevedore instead of scanning plugins themselves the implementation details of which version of the library is invoked will be hidden. cool i will need to check os-vif more closely but i think we do everthing via the stevedore extension manager https://github.com/openstack/os-vif/blob/master/os_vif/__init__.py#L38-L49 maybe some plugins are doing some things tehy should not but the intent was to rely only on stevedore and its apis. so it sound like this should just work for os-vif at least.
A big part of the implementation work will actually be removing the use of pkg_resources in places other than stevedore. We have a couple of different use patterns to consider and replace in different ways.
First, anything using iter_entry_points() should use a stevedore extension manager instead. There are a few of them to choose from, based on how the plugins will be used. The stevedore docs [1] include a tutorial and documentation for all of the classes and their uses. Most calls to iter_entry_points() can be replaced with a stevedore.ExtensionManager directly, but the other managers are meant to implement common access patterns like selecting a subset (or just one) of the available plugins by name.
Second, we have a few places where pkg_resources.get_distribution(name).version is used to discover a package’s installed version. Those can be changed to use importlib.metadata.version() instead, as in [2]. This is *much* faster because importlib goes directly to the metadata file for the named package instead of looking through all of the installed packages.
Finally, any code using any properties of the EntryPoint returned by stevedore other than “name” and “load()” may need to be updated. The new EntryPoint class in importlib.metadata is not 100% compatible with the one from pkg_resources. The same data is there, but sometimes it is named differently. If we need a compatibility layer we could put that in stevedore, but it is unusual to need access to any of the internals of EntryPoint and it’s typically better to use the manager abstractions in stevedore instead of manipulating EntryPoint instances directly.
I have started making some of the changes [3], but I’m doing this in my quarantine-induced spare time so it’s likely to take a while. If you want to pitch in, I would appreciate it. I am using the topic “osc-performance”, since the work is related to making python-openstackclient faster. Feel free to tag me for reviews on your patches.
Doug
[0] https://review.opendev.org/#/c/739306/ [1] https://docs.openstack.org/stevedore/latest/ [2] https://review.opendev.org/#/c/739379/2 [3] https://review.opendev.org/#/q/topic:osc-performance
On Jul 6, 2020, at 3:30 PM, Sean Mooney <smooney@redhat.com> wrote:
On Mon, 2020-07-06 at 15:03 -0400, Doug Hellmann wrote:
On Jul 6, 2020, at 2:54 PM, Sean Mooney <smooney@redhat.com> wrote:
On Mon, 2020-07-06 at 14:37 -0400, Doug Hellmann wrote:
We have had a long-standing issue with the performance of the openstack command line tool. At least part of the startup cost is the time taken in scanning for all of the plugins that are installed, which is a side-effect of importing pkg_resources. To fix that, we need to eliminate all use of pkg_resources in code that would be used by a command line application (long-running services are candidates, too, but the benefit is bigger in short-lived command line apps).
Python 3.8 added a new library importlib.metadata, which also has an entry points API. It is more efficient, and produces data in a format that can be cached to make it even faster. I have started adding support for that caching to stevedore [0], which is the Oslo library for managing application plugins. For version of python earlier than 3.8, the same library is available on PyPI as “importlib_metadata”.
based on https://opendev.org/openstack/governance/src/branch/master/reference/runtime... we still need to support 3.6 for victoria. is there a backport lib like mock for this on older python releases?
Yes, importlib_metadata is on PyPI and available all the way back to 2.7. It is already in the requirements list, and if applications switch to using stevedore instead of scanning plugins themselves the implementation details of which version of the library is invoked will be hidden. cool i will need to check os-vif more closely but i think we do everthing via the stevedore extension manager https://github.com/openstack/os-vif/blob/master/os_vif/__init__.py#L38-L49 maybe some plugins are doing some things tehy should not but the intent was to rely only on stevedore and its apis. so it sound like this should just work for os-vif at least.
That’s definitely the goal of putting the cache behind the stevedore API.
A big part of the implementation work will actually be removing the use of pkg_resources in places other than stevedore. We have a couple of different use patterns to consider and replace in different ways.
First, anything using iter_entry_points() should use a stevedore extension manager instead. There are a few of them to choose from, based on how the plugins will be used. The stevedore docs [1] include a tutorial and documentation for all of the classes and their uses. Most calls to iter_entry_points() can be replaced with a stevedore.ExtensionManager directly, but the other managers are meant to implement common access patterns like selecting a subset (or just one) of the available plugins by name.
Second, we have a few places where pkg_resources.get_distribution(name).version is used to discover a package’s installed version. Those can be changed to use importlib.metadata.version() instead, as in [2]. This is *much* faster because importlib goes directly to the metadata file for the named package instead of looking through all of the installed packages.
Finally, any code using any properties of the EntryPoint returned by stevedore other than “name” and “load()” may need to be updated. The new EntryPoint class in importlib.metadata is not 100% compatible with the one from pkg_resources. The same data is there, but sometimes it is named differently. If we need a compatibility layer we could put that in stevedore, but it is unusual to need access to any of the internals of EntryPoint and it’s typically better to use the manager abstractions in stevedore instead of manipulating EntryPoint instances directly.
I have started making some of the changes [3], but I’m doing this in my quarantine-induced spare time so it’s likely to take a while. If you want to pitch in, I would appreciate it. I am using the topic “osc-performance”, since the work is related to making python-openstackclient faster. Feel free to tag me for reviews on your patches.
Doug
[0] https://review.opendev.org/#/c/739306/ [1] https://docs.openstack.org/stevedore/latest/ [2] https://review.opendev.org/#/c/739379/2 [3] https://review.opendev.org/#/q/topic:osc-performance
On Mon, Jul 6, 2020 at 9:00 PM Sean Mooney <smooney@redhat.com> wrote:
We have had a long-standing issue with the performance of the openstack command line tool. At least part of the startup cost is the time taken in scanning for all of the plugins that are installed, which is a side-effect of importing pkg_resources. To fix that, we need to eliminate all use of pkg_resources in code that would be used by a command line application (long-running services are candidates, too, but the benefit is bigger in short-lived command line apps).
Python 3.8 added a new library importlib.metadata, which also has an entry points API. It is more efficient, and produces data in a format that can be cached to make it even faster. I have started adding support for that caching to stevedore [0], which is the Oslo library for managing application plugins. For version of python earlier than 3.8, the same library is available on PyPI as “importlib_metadata”.
On Mon, 2020-07-06 at 14:37 -0400, Doug Hellmann wrote: based on https://opendev.org/openstack/governance/src/branch/master/reference/runtime... we still need to support 3.6 for victoria. is there a backport lib like mock for this on older python releases?
Is [1] that Doug mentioned not what you mean? It seems to support 3.5+ As a general remark, I've already seen the WIP. Very excited to see this performance bottleneck eliminated. [1] https://pypi.org/project/importlib-metadata/ -yoctozepto
On Jul 6, 2020, at 2:37 PM, Doug Hellmann <doug@doughellmann.com> wrote:
We have had a long-standing issue with the performance of the openstack command line tool. At least part of the startup cost is the time taken in scanning for all of the plugins that are installed, which is a side-effect of importing pkg_resources. To fix that, we need to eliminate all use of pkg_resources in code that would be used by a command line application (long-running services are candidates, too, but the benefit is bigger in short-lived command line apps).
Python 3.8 added a new library importlib.metadata, which also has an entry points API. It is more efficient, and produces data in a format that can be cached to make it even faster. I have started adding support for that caching to stevedore [0], which is the Oslo library for managing application plugins. For version of python earlier than 3.8, the same library is available on PyPI as “importlib_metadata”.
A big part of the implementation work will actually be removing the use of pkg_resources in places other than stevedore. We have a couple of different use patterns to consider and replace in different ways.
First, anything using iter_entry_points() should use a stevedore extension manager instead. There are a few of them to choose from, based on how the plugins will be used. The stevedore docs [1] include a tutorial and documentation for all of the classes and their uses. Most calls to iter_entry_points() can be replaced with a stevedore.ExtensionManager directly, but the other managers are meant to implement common access patterns like selecting a subset (or just one) of the available plugins by name.
Second, we have a few places where pkg_resources.get_distribution(name).version is used to discover a package’s installed version. Those can be changed to use importlib.metadata.version() instead, as in [2]. This is *much* faster because importlib goes directly to the metadata file for the named package instead of looking through all of the installed packages.
Finally, any code using any properties of the EntryPoint returned by stevedore other than “name” and “load()” may need to be updated. The new EntryPoint class in importlib.metadata is not 100% compatible with the one from pkg_resources. The same data is there, but sometimes it is named differently. If we need a compatibility layer we could put that in stevedore, but it is unusual to need access to any of the internals of EntryPoint and it’s typically better to use the manager abstractions in stevedore instead of manipulating EntryPoint instances directly.
I have started making some of the changes [3], but I’m doing this in my quarantine-induced spare time so it’s likely to take a while. If you want to pitch in, I would appreciate it. I am using the topic “osc-performance”, since the work is related to making python-openstackclient faster. Feel free to tag me for reviews on your patches.
Doug
[0] https://review.opendev.org/#/c/739306/ [1] https://docs.openstack.org/stevedore/latest/ [2] https://review.opendev.org/#/c/739379/2 [3] https://review.opendev.org/#/q/topic:osc-performance
I neglected to mention that there are uses of pkg_resources outside of OpenStack code in libraries used by python-openstackclient. I found a use in dogpile and another in cmd2. I haven’t started working on patches to those, yet. If someone wants to do a more extensive search that would be very helpful. I started an etherpad to keep track of the work that’s in progress: https://etherpad.opendev.org/p/osc-performance
On 2020-07-06 15:21:06 -0400 (-0400), Doug Hellmann wrote: [...]
I neglected to mention that there are uses of pkg_resources outside of OpenStack code in libraries used by python-openstackclient. I found a use in dogpile and another in cmd2. I haven’t started working on patches to those, yet. If someone wants to do a more extensive search that would be very helpful. I started an etherpad to keep track of the work that’s in progress: https://etherpad.opendev.org/p/osc-performance
Looking at some other uses of pkg_resources, seems like this would be the new way to get the abbreviated Git commit ID stored by PBR: json.loads( importlib.metadata.distribution(packagename).read_text('pbr.json') )['git_version'] -- Jeremy Stanley
participants (4)
-
Doug Hellmann
-
Jeremy Stanley
-
Radosław Piliszek
-
Sean Mooney