[nova] [cyborg] Impact of moving bind to compute
Hi, The feedback in the Nova - Cyborg interaction spec [1] is to move the call for creating/binding accelerator requests (ARQs) from the conductor (just before the call to build_and_run_instance, [2]) to the compute manager (just before spawn, without holding the build sempahore [3]). The point where the results of the bind are needed is in the virt driver [4] - that is not changing. The reason for the move is to enable Cyborg to notify Nova [5] instead of Nova virt driver polling Cyborg, thus making the interaction similar to other services like Neutron.
The binding involves device preparation by Cyborg, which may take some time (ballpark: milliseconds to few seconds to perhaps 10s of seconds - of course devices vary a lot). We want to overlap as much of this as possible with other tasks, by starting the binding as early as possible and making it asynchronous, so that bulk VM creation rate etc. are not affected. These considerations are probably specific to Cyborg, so trying to make it uniform with other projects deserve a closer look before we commit to it.
Moving the binding from [2] to [3] reduces this overlap. I did some measurements of the time window from [2] to [3]: it was consistently between 20 and 50 milliseconds, whether I launched 1 VM at a time, 2 at a time, etc. This seems acceptable.
But this was just in a two-node deployment. Are there situations where this window could get much larger (thus reducing the overlap)? Such as in larger deployments, or issues with RabbitMQ messaging, etc. Are there larger considerations of performance or scaling for this approach?
Thanks in advance.
[1] https://review.opendev.org/#/c/603955/ [2] https://github.com/openstack/nova/blob/master/nova/conductor/manager.py#L150... [3] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1882 [4] https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L3... [5] https://wiki.openstack.org/wiki/Nova/ExternalEventAPI
Regards, Sundar
On 5/23/2019 7:00 AM, Nadathur, Sundar wrote:
Hi,
The feedback in the Nova – Cyborg interaction spec [1] is to move the call for creating/binding accelerator requests (ARQs) from the conductor (just before the call to build_and_run_instance, [2]) to the compute manager (just before spawn, without holding the build sempahore [3]). The point where the results of the bind are needed is in the virt driver [4] – that is not changing. The reason for the move is to enable Cyborg to notify Nova [5] instead of Nova virt driver polling Cyborg, thus making the interaction similar to other services like Neutron.
The binding involves device preparation by Cyborg, which may take some time (ballpark: milliseconds to few seconds to perhaps 10s of seconds – of course devices vary a lot). We want to overlap as much of this as possible with other tasks, by starting the binding as early as possible and making it asynchronous, so that bulk VM creation rate etc. are not affected. These considerations are probably specific to Cyborg, so trying to make it uniform with other projects deserve a closer look before we commit to it.
Moving the binding from [2] to [3] reduces this overlap. I did some measurements of the time window from [2] to [3]: it was consistently between 20 and 50 milliseconds, whether I launched 1 VM at a time, 2 at a time, etc. This seems acceptable.
But this was just in a two-node deployment. Are there situations where this window could get much larger (thus reducing the overlap)? Such as in larger deployments, or issues with RabbitMQ messaging, etc. Are there larger considerations of performance or scaling for this approach?
Thanks in advance.
[1] https://review.opendev.org/#/c/603955/
[2] https://github.com/openstack/nova/blob/master/nova/conductor/manager.py#L150...
[3] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1882
[4] https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L3...
[5] https://wiki.openstack.org/wiki/Nova/ExternalEventAPI
Regards,
Sundar
I'm OK with binding in the compute since that's where we trigger the callback event and want to setup something to wait for it before proceeding, like we do with port binding.
What I've talked about in detail in the spec is doing the ARQ *creation* in conductor rather than compute. I realize that doing the creation in the compute service means fewer (if any) RPC API changes to get phase 1 of this code going, but I can't imagine any RPC API changes for that would be very big (it's a new parameter to the compute service methods, or something we lump into the RequestSpec).
The bigger concern I have is that we've long talked about moving port (and at times volume) creation from the compute service to conductor because it's less expensive to manage external resources there if something fails, e.g. going over-quota creating volumes. The problem with failing late in the compute is we have to cleanup other things (ports and volumes) and then reschedule, which may also fail on the next alternate host. Failing fast in conductor is more efficient and also helps take some of the guesswork out of which service is managing the resources (we've had countless bugs over the years about ports and volumes being leaked because we didn't clean them up properly on failure). Take a look at any of the error handling in the server create flow in the ComputeManager and you'll see what I'm talking about.
Anyway, if we're voting I vote that ARQ creation happens in conductor and binding happens in compute.
-----Original Message----- From: Matt Riedemann mriedemos@gmail.com Sent: Thursday, June 6, 2019 1:33 PM To: openstack-discuss@lists.openstack.org Subject: Re: [nova] [cyborg] Impact of moving bind to compute
On 5/23/2019 7:00 AM, Nadathur, Sundar wrote:
[....] Moving the binding from [2] to [3] reduces this overlap. I did some measurements of the time window from [2] to [3]: it was consistently between 20 and 50 milliseconds, whether I launched 1 VM at a time, 2 at a time, etc. This seems acceptable.
[2] https://github.com/openstack/nova/blob/master/nova/conductor/manager.py#L150...
[3] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1882
Regards,
Sundar
I'm OK with binding in the compute since that's where we trigger the callback event and want to setup something to wait for it before proceeding, like we do with port binding.
What I've talked about in detail in the spec is doing the ARQ *creation* in conductor rather than compute. I realize that doing the creation in the compute service means fewer (if any) RPC API changes to get phase 1 of this code going, but I can't imagine any RPC API changes for that would be very big (it's a new parameter to the compute service methods, or something we lump into the RequestSpec).
The bigger concern I have is that we've long talked about moving port (and at times volume) creation from the compute service to conductor because it's less expensive to manage external resources there if something fails, e.g. going over-quota creating volumes. The problem with failing late in the compute is we have to cleanup other things (ports and volumes) and then reschedule, which may also fail on the next alternate host.
The ARQ creation could be done at [1], followed by the binding, before acquiring the semaphore or creating other resources. Why is that not a good option?
[1] https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1898
Failing fast in conductor is more efficient and also helps take some of the guesswork out of which service is managing the resources (we've had countless bugs over the years about ports and volumes being leaked because we didn't clean them up properly on failure). Take a look at any of the error handling in the server create flow in the ComputeManager and you'll see what I'm talking about.
Anyway, if we're voting I vote that ARQ creation happens in conductor and binding happens in compute.
--
Thanks,
Matt
Regards, Sundar
On 6/7/2019 12:17 AM, Nadathur, Sundar wrote:
The ARQ creation could be done at [1], followed by the binding, before acquiring the semaphore or creating other resources. Why is that not a good option?
[1]https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L1898
If we created the ARQs in compute I think we'd do it in the ComputeManager._build_resources method to be consistent with where we create volumes and ports. My bigger point is if the ARQ creation fails in compute for whatever reason, then we have to rollback any other resources we create (ports and volumes) which gets messy.
Doing the ARQ creation before _build_resources in ComputeManager (what you're suggesting) would side-step that bit but then we've got inconsistencies in where the server create flow creates external resources within the compute service, which I don't love.
So I think if we're going to do the ARQ creation early then we should do it in the conductor so we can fail fast and avoid a reschedule from the compute.
participants (2)
-
Matt Riedemann
-
Nadathur, Sundar