[nova] [cyborg] Impact of moving bind to compute
sundar.nadathur at intel.com
Thu May 23 12:00:26 UTC 2019
The feedback in the Nova - Cyborg interaction spec  is to move the call for creating/binding accelerator requests (ARQs) from the conductor (just before the call to build_and_run_instance, ) to the compute manager (just before spawn, without holding the build sempahore ). The point where the results of the bind are needed is in the virt driver  - that is not changing. The reason for the move is to enable Cyborg to notify Nova  instead of Nova virt driver polling Cyborg, thus making the interaction similar to other services like Neutron.
The binding involves device preparation by Cyborg, which may take some time (ballpark: milliseconds to few seconds to perhaps 10s of seconds - of course devices vary a lot). We want to overlap as much of this as possible with other tasks, by starting the binding as early as possible and making it asynchronous, so that bulk VM creation rate etc. are not affected. These considerations are probably specific to Cyborg, so trying to make it uniform with other projects deserve a closer look before we commit to it.
Moving the binding from  to  reduces this overlap. I did some measurements of the time window from  to : it was consistently between 20 and 50 milliseconds, whether I launched 1 VM at a time, 2 at a time, etc. This seems acceptable.
But this was just in a two-node deployment. Are there situations where this window could get much larger (thus reducing the overlap)? Such as in larger deployments, or issues with RabbitMQ messaging, etc. Are there larger considerations of performance or scaling for this approach?
Thanks in advance.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the openstack-discuss