[requirements][placement] orjson instead of stdlib json
Requirements people: For the past few months we've been doing profiling of placement, modeling large clouds where many results (7000 resource providers) will be returned in response to common requests. One of the last remaining chunks of slowness is serializing the very large dicts representing those providers (and associated allocation requests) to JSON (greater than 100k lines when pretty printed). Using the oslo jsonutils (which is a light wrapper over the stdlib 'json' module) approximately 25% of time is consumed by dumps(). Using orjson [1] instead, time consumption drops to 1%, so this is something it would be good to make real in placement (and perhaps other projects that have large result sets). There's a WIP that demonstrates its use at [2]. I'm posting about it because JSON serialization seems like it might be an area that the requirements team has greater interest than some other areas. Also, orjson is only Python 3, so use needs to be adapted for Python 2, as shown in the WIP [2]. Otherwise it's fine: active, Apache2 or MIT license. Thoughts? [1] https://pypi.org/project/orjson/ [2] https://review.opendev.org/#/c/674661/ -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent
Requirements people:
For the past few months we've been doing profiling of placement, modeling large clouds where many results (7000 resource providers) will be returned in response to common requests.
One of the last remaining chunks of slowness is serializing the very large dicts representing those providers (and associated allocation requests) to JSON (greater than 100k lines when pretty printed). Using the oslo jsonutils (which is a light wrapper over the stdlib 'json' module) approximately 25% of time is consumed by dumps().
Using orjson [1] instead, time consumption drops to 1%, so this is something it would be good to make real in placement (and perhaps other projects that have large result sets). if the 25% to 1% numbers are real. e.g. the cpu time does not move to somewhere else instead then i would be very interested to see this used in oslo.versioned object and/or the sdk. There's a WIP that demonstrates its use at [2].
I'm posting about it because JSON serialization seems like it might be an area that the requirements team has greater interest than some other areas.
Also, orjson is only Python 3, so use needs to be adapted for Python 2, as shown in the WIP [2]. or wait untill octber/novemebr when we officly drop python 2 support
On Wed, 2019-08-28 at 09:47 +0100, Chris Dent wrote: that said from a redhat point of view all our future openstack release form stien will be on rhel8 which is python36 only by default(python2 is not install on rhel8). so we will be shipping openstack on python 3 stating with stien.
Otherwise it's fine: active, Apache2 or MIT license.
Thoughts?
[1] https://pypi.org/project/orjson/ [2] https://review.opendev.org/#/c/674661/
On 19-08-28 10:13:01, Sean Mooney wrote:
Requirements people:
For the past few months we've been doing profiling of placement, modeling large clouds where many results (7000 resource providers) will be returned in response to common requests.
One of the last remaining chunks of slowness is serializing the very large dicts representing those providers (and associated allocation requests) to JSON (greater than 100k lines when pretty printed). Using the oslo jsonutils (which is a light wrapper over the stdlib 'json' module) approximately 25% of time is consumed by dumps().
Using orjson [1] instead, time consumption drops to 1%, so this is something it would be good to make real in placement (and perhaps other projects that have large result sets). if the 25% to 1% numbers are real. e.g. the cpu time does not move to somewhere else instead then i would be very interested to see this used in oslo.versioned object and/or the sdk. There's a WIP that demonstrates its use at [2].
I'm posting about it because JSON serialization seems like it might be an area that the requirements team has greater interest than some other areas.
Also, orjson is only Python 3, so use needs to be adapted for Python 2, as shown in the WIP [2]. or wait untill octber/novemebr when we officly drop python 2 support
On Wed, 2019-08-28 at 09:47 +0100, Chris Dent wrote: that said from a redhat point of view all our future openstack release form stien will be on rhel8 which is python36 only by default(python2 is not install on rhel8). so we will be shipping openstack on python 3 stating with stien.
Otherwise it's fine: active, Apache2 or MIT license.
Thoughts?
[1] https://pypi.org/project/orjson/ [2] https://review.opendev.org/#/c/674661/
One thing we seek to avoid is duplicating requirements we already track. Though we have allowed C based python libs for speed. Is it possible to try ujson as that's approved and in global-reqs already (and supported by anyjson, also in global-reqs). It hasn't been updated since 2016 and simple PRs/issues are unresolved, though monasca-common and x/kiloeyes use it. Anyjson itself seems to be in a similiar situation. If we'd move to another c based json-lib I'd like to remove ujson (and possibly look at anyjson too, though it has slightly more usage). There may be other json-libs in global reqs that meet your needs though. I'd recommend checking it out first. https://github.com/openstack/requirements/blob/master/global-requirements.tx... UJSON +--------------------------+----------------------------------------------------------+------+-------------------+ | Repository | Filename | Line | Text | +--------------------------+----------------------------------------------------------+------+-------------------+ | openstack/monasca-common | requirements.txt | 11 | ujson>=1.35 # BSD | | x/kiloeyes | requirements.txt | 19 | ujson>=1.33 | +--------------------------+----------------------------------------------------------+------+-------------------+ ANYJSON +-----------------------------+---------------------------------------------------------------------+------+-----------------------+ | Repository | Filename | Line | Text | +-----------------------------+---------------------------------------------------------------------+------+-----------------------+ | openstack/faafo | requirements.txt | 5 | anyjson>=0.3.3 | | openstack/fuel-qa | fuelweb_test/requirements.txt | 5 | anyjson>=0.3.3 # BSD | | openstack/fuel-web | nailgun/requirements.txt | 8 | anyjson>=0.3.3 | | openstack/murano-agent | requirements.txt | 5 | anyjson>=0.3.3 # BSD | | openstack/os-apply-config | requirements.txt | 6 | anyjson>=0.3.3 # BSD | | openstack/os-collect-config | requirements.txt | 6 | anyjson>=0.3.3 # BSD | | openstack/os-net-config | requirements.txt | 5 | anyjson>=0.3.3 # BSD | | openstack/tacker | requirements.txt | 9 | anyjson>=0.3.3 # BSD | | starlingx/config | sysinv/sysinv/sysinv/requirements.txt | 4 | anyjson>=0.3.3 | | starlingx/ha | service-mgmt-client/sm-client/requirements.txt | 2 | anyjson>=0.3.3 | | starlingx/metal | inventory/inventory/requirements.txt | 8 | anyjson>=0.3.3 | | x/apmec | requirements.txt | 9 | anyjson>=0.3.3 # BSD | | x/daisycloud-core | code/daisy/requirements.txt | 9 | anyjson>=0.3.3 | | x/novajoin | test-requirements.txt | 8 | anyjson>=0.3.3 # BSD | | x/omni | creds_manager/test-requirements.txt | 8 | anyjson>=0.3.3 # BSD | +-----------------------------+---------------------------------------------------------------------+------+-----------------------+ -- Matthew Thode
On 19-08-28 06:30:36, Matthew Thode wrote:
On 19-08-28 10:13:01, Sean Mooney wrote:
Requirements people:
For the past few months we've been doing profiling of placement, modeling large clouds where many results (7000 resource providers) will be returned in response to common requests.
One of the last remaining chunks of slowness is serializing the very large dicts representing those providers (and associated allocation requests) to JSON (greater than 100k lines when pretty printed). Using the oslo jsonutils (which is a light wrapper over the stdlib 'json' module) approximately 25% of time is consumed by dumps().
Using orjson [1] instead, time consumption drops to 1%, so this is something it would be good to make real in placement (and perhaps other projects that have large result sets). if the 25% to 1% numbers are real. e.g. the cpu time does not move to somewhere else instead then i would be very interested to see this used in oslo.versioned object and/or the sdk. There's a WIP that demonstrates its use at [2].
I'm posting about it because JSON serialization seems like it might be an area that the requirements team has greater interest than some other areas.
Also, orjson is only Python 3, so use needs to be adapted for Python 2, as shown in the WIP [2]. or wait untill octber/novemebr when we officly drop python 2 support
On Wed, 2019-08-28 at 09:47 +0100, Chris Dent wrote: that said from a redhat point of view all our future openstack release form stien will be on rhel8 which is python36 only by default(python2 is not install on rhel8). so we will be shipping openstack on python 3 stating with stien.
Otherwise it's fine: active, Apache2 or MIT license.
Thoughts?
[1] https://pypi.org/project/orjson/ [2] https://review.opendev.org/#/c/674661/
One thing we seek to avoid is duplicating requirements we already track. Though we have allowed C based python libs for speed.
Is it possible to try ujson as that's approved and in global-reqs already (and supported by anyjson, also in global-reqs). It hasn't been updated since 2016 and simple PRs/issues are unresolved, though monasca-common and x/kiloeyes use it. Anyjson itself seems to be in a similiar situation.
If we'd move to another c based json-lib I'd like to remove ujson (and possibly look at anyjson too, though it has slightly more usage). There may be other json-libs in global reqs that meet your needs though. I'd recommend checking it out first.
https://github.com/openstack/requirements/blob/master/global-requirements.tx...
UJSON +--------------------------+----------------------------------------------------------+------+-------------------+ | Repository | Filename | Line | Text | +--------------------------+----------------------------------------------------------+------+-------------------+ | openstack/monasca-common | requirements.txt | 11 | ujson>=1.35 # BSD | | x/kiloeyes | requirements.txt | 19 | ujson>=1.33 | +--------------------------+----------------------------------------------------------+------+-------------------+
ANYJSON +-----------------------------+---------------------------------------------------------------------+------+-----------------------+ | Repository | Filename | Line | Text | +-----------------------------+---------------------------------------------------------------------+------+-----------------------+ | openstack/faafo | requirements.txt | 5 | anyjson>=0.3.3 | | openstack/fuel-qa | fuelweb_test/requirements.txt | 5 | anyjson>=0.3.3 # BSD | | openstack/fuel-web | nailgun/requirements.txt | 8 | anyjson>=0.3.3 | | openstack/murano-agent | requirements.txt | 5 | anyjson>=0.3.3 # BSD | | openstack/os-apply-config | requirements.txt | 6 | anyjson>=0.3.3 # BSD | | openstack/os-collect-config | requirements.txt | 6 | anyjson>=0.3.3 # BSD | | openstack/os-net-config | requirements.txt | 5 | anyjson>=0.3.3 # BSD | | openstack/tacker | requirements.txt | 9 | anyjson>=0.3.3 # BSD | | starlingx/config | sysinv/sysinv/sysinv/requirements.txt | 4 | anyjson>=0.3.3 | | starlingx/ha | service-mgmt-client/sm-client/requirements.txt | 2 | anyjson>=0.3.3 | | starlingx/metal | inventory/inventory/requirements.txt | 8 | anyjson>=0.3.3 | | x/apmec | requirements.txt | 9 | anyjson>=0.3.3 # BSD | | x/daisycloud-core | code/daisy/requirements.txt | 9 | anyjson>=0.3.3 | | x/novajoin | test-requirements.txt | 8 | anyjson>=0.3.3 # BSD | | x/omni | creds_manager/test-requirements.txt | 8 | anyjson>=0.3.3 # BSD | +-----------------------------+---------------------------------------------------------------------+------+-----------------------+
Looking through global-reqs, some more it looks like simplejson can be c based and is in global-reqs as well. Looking at the benchmarks orjson itself provides it's hard to see how it can be 100x faster than even the native json. -- Matthew Thode
On Wed, 28 Aug 2019, Matthew Thode wrote:
Looking through global-reqs, some more it looks like simplejson can be c based and is in global-reqs as well.
Looking at the benchmarks orjson itself provides it's hard to see how it can be 100x faster than even the native json.
I've tried the other options and they don't provide as much improvement as orjson, nor are they as healthy with regard to project activity and JSON correctness. I suspect the reason my tests are seeing such a huge improvement (different from orjson's benchmark) is because this is one single very large (2583330 bytes when JSON) python structure being dumped just once. The benchmarks on the pypi page that fit most are those with canada.json (which happens to the be one where orjson seems to have the biggest advantage). It's not the end of the world if we don't switch, but thought I would raise the topic to see if it was worth pursuing. -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent
On Aug 28, 2019, at 7:30 AM, Chris Dent <cdent+os@anticdent.org> wrote:
I suspect the reason my tests are seeing such a huge improvement (different from orjson's benchmark) is because this is one single very large (2583330 bytes when JSON) python structure being dumped just once.
While optimizing the performance of JSON-ifying this huge thing is always a good idea, it might be better to consider why such a huge object is necessary in the first place. "Doctor, it hurts when I do this" "Well, then stop doing that!" -- Ed Leafe
On 2019-08-28 09:47:22 +0100 (+0100), Chris Dent wrote: [...]
Otherwise it's fine: active, Apache2 or MIT license.
Thoughts?
[1] https://pypi.org/project/orjson/ [...]
One thing worth noting is that they don't publish an sdist to PyPI, only wheels: https://pypi.org/project/orjson/#files This means that when I try to install it into a venv created with my local build of Python v3.8.0b3 it fails to find any installable orjson because there's no fallback on PyPI beyond the Python releases and platforms for which they've explicitly produced wheels. There's https://github.com/ijl/orjson/issues/18 open for over two months requesting wheels for Python 3.8, but it's going to be a treadmill if they're not also publishing an sdist. It's further a potential license concern, since they're not publishing the source code alongside the wheels (so if for example the upstream Git repository goes away...). The official Python Packaging Guide notes that sdist publication is strongly recommended, and that publishing wheels in addition to that is optional: https://packaging.python.org/guides/distributing-packages-using-setuptools/#... If this is a library you care about using, it may make sense to attempt to make these points to its maintainer(s). -- Jeremy Stanley
On Wed, Aug 28, 2019 at 04:46:11PM +0000, Jeremy Stanley wrote:
On 2019-08-28 09:47:22 +0100 (+0100), Chris Dent wrote: [...]
Otherwise it's fine: active, Apache2 or MIT license.
Thoughts?
[1] https://pypi.org/project/orjson/ [...]
One thing worth noting is that they don't publish an sdist to PyPI, only wheels: https://pypi.org/project/orjson/#files
This means that when I try to install it into a venv created with my local build of Python v3.8.0b3 it fails to find any installable orjson because there's no fallback on PyPI beyond the Python releases and platforms for which they've explicitly produced wheels.
There's https://github.com/ijl/orjson/issues/18 open for over two months requesting wheels for Python 3.8, but it's going to be a treadmill if they're not also publishing an sdist. It's further a potential license concern, since they're not publishing the source code alongside the wheels (so if for example the upstream Git repository goes away...). The official Python Packaging Guide notes that sdist publication is strongly recommended, and that publishing wheels in addition to that is optional:
https://packaging.python.org/guides/distributing-packages-using-setuptools/#...
If this is a library you care about using, it may make sense to attempt to make these points to its maintainer(s).
I expect that it's because orjson is actually a python api for rust code. It looks like they chose to use pyo3-pack instead of setuptools-rust (which is normally what I use). pyo3-pack only recently added support for building sdists but it's only been included in a beta release so far (It was a longstanding issue with pyo3-pack https://github.com/PyO3/pyo3-pack/issues/2 ). Also, even assuming orjson start publishing sdists you'll still need nightly rust installed to compile it since pyo3 only works with the nightly builds of rust at this point. Which while not difficult to do, is not something people typically have installed.
On 2019-08-28 13:19:17 -0400 (-0400), Matthew Treinish wrote: [...]
Also, even assuming orjson start publishing sdists you'll still need nightly rust installed to compile it since pyo3 only works with the nightly builds of rust at this point. Which while not difficult to do, is not something people typically have installed.
This points out yet another portability issue. After watching the Debian community struggle to backport security-supported versions of Firefox to their stable releases, which involved needing to backport a full rust toolchain along with it, I'm unconvinced that's the sort of scenario we should be inflicting on our users either. -- Jeremy Stanley
participants (7)
-
Chris Dent
-
Ed Leafe
-
Jeremy Stanley
-
Matt Riedemann
-
Matthew Thode
-
Matthew Treinish
-
Sean Mooney