Open Stack

Mon Aug 6 22:16:31 UTC 2018

Hi,
    The Cyborg agent in a compute node collects information about 
devices from the Cyborg drivers on that node. It then needs to push that 
information to the Cyborg conductor in the controller, which then needs 
to persist it in the Cyborg db and update Placement. Further, the agent 
needs to collect and update this information periodically (or possibly 
in response to notifications) to handle hot add/delete of devices, 
reprogramming (for FPGAs), health failure of devices, etc.

In this morning's call, we discussed how to do this periodic update [1]. 
In particular, we talked about how to compute the difference between the 
previous device configuration in a compute node and the current one, 
whether the agent do should do that diff or the controller, etc. Since 
there are many fields per device, and they are tree-structured, the 
complexity of doing the diff seemed large.

On taking a closer look, however, the amount of computation needed to do 
the update is not huge. Say, for discussion's sake, that the controller 
has a snapshot of the entire device config for a specific compute node, 
i.e. an array of device structures NewConfig[]. It reads the current 
list of devices for that node from the db, CurrentConfig[]. Then the 
controller's logic is like this:

  * Determine the list of devices in NewConfig[] but not in
    CurrentConfig[] (this is a set difference in Python [2]): they are
    the newly added ones. For each newly added device, do a single
    transaction to add all the fields to the db together.
  * Determine the list of devices in CurrentConfig[] but not in
    NewConfig[]: they are the deleted devices.For each such device, do a
    single transaction to delete that entry.
  * For each modified device, compute what has changed, and update that
    alone. This is the per-field diff.

Say each field in the device structure is a string of 100 characters, 
and it takes 1 nanosecond to add, delete or modify a character. So, each 
field takes 100 ns to update (add/delete/modify). Say 20 fields per 
device: so 2 us to add, delete or modify a device. Say 10 devices per 
compute node: so 20 us per node. 500 nodes will take 10 milliseconds. 
So, if each node sends a refresh every second, the controller will spend 
a very small fraction of that time in updating the db, even including 
transaction costs, set difference computation, etc.

This back-of-the-envelope calculation shows that we need not try to 
optimize too early: the agent should send the entire device config over 
to the controller, and let it update the db per-device and per-field.

[1] https://etherpad.openstack.org/p/cyborg-rocky-development
[2] https://docs.python.org/2/library/sets.html

Regards,
Sundar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20180806/bbda699c/attachment.html>

Open Stack

[openstack-dev] [Cyborg] Agent - Conductor update

OpenStack

Community

Documentation

Branding & Legal