New subject: [nova] [cyborg] Impact of moving bind to compute

26 Nov 2019

      Hi,
     We had a thread [1] on this subject from May of this year. The preference was that "ARQ creation happens in conductor
and binding happens in compute" [2].

The ARQ binding involves device preparation and FPGA programming, which may take a while. So, it is done asynchronously. It is desirable to kickstart the binding ASAP, to maximize the overlap with other tasks needed for VM creation.

We wound up doing all of binding in the compute for the following reason. If we call Cyborg to initiate ARQ binding and then wait for the notification event, we may miss the event if it comes in the window in between. So we had to call wait_for_instance_event() and, within its scope, call Cyborg for binding. This logic moved everything to compute.

But now we are close to having an improved  wait_for_instance_event() [3]. So I propose to:

A.      Start the binding in the conductor. This gets maximum concurrency between binding and other tasks.

B.      Wait for the binding notification in the compute manager (without losing the event). In fact, we can wait inside _build_resources, which is where Neutron/Cinder resources are gathered as well. That will allow for doing the cleanup in a consistent manner as today.

C.       Call Cyborg to get the ARQs in the virt driver, like today.

Please LMK if you have any objections.

[1] http://lists.openstack.org/pipermail/openstack-discuss/2019-May/006541.html
[2] http://lists.openstack.org/pipermail/openstack-discuss/2019-June/006979.html
[3] https://review.opendev.org/#/c/695985/

Regards,
Sundar

Re: [nova] [cyborg] Impact of moving bind to compute

Nadathur, Sundar

Eric Fried

Dan Smith

Dan Smith

Nadathur, Sundar

tags

participants (3)