Hi all, I was looking over the NovaClusterDataModelCollector code today and trying to learn more about how watcher builds the nova CDM (and when) and got digging into this change from Stein [1] where I noted what appear to be several issues. I'd like to enumerate a few of those issues here and then figure out how to proceed. 1. In general, a lot of this code for building the compute node model is based on at least using the 2.53 microversion (Pike) in nova where the hypervisor.id is a UUID - this is actually necessary for a multi-cell environment like CERN. The nova_client.api_version config option already defaults to 2.56 which was in Queens. I'm not sure what the compatibility matrix looks like for Watcher, but would it be possible for us to say that Watcher requires nova at least at Queens level API (so nova_client.api_version >= 2.60), add a release note and a "watcher-status upgrade check" if necessary. This might make things a bit cleaner in the nova CDM code to know we can rely on a given minimum version. [licanwei]:We set the default nova api version to 2.56 , but it's better to add a release note 2. I had a question about when the nova CDM gets built now [2]. It looks like the nova CDM only gets built when there is an audit? But I thought the CDM was supposed to get built on start of the decision-engine service and then refreshed every hour (by default) on a periodic task or as notifications are processed that change the model. Does this mean the nova CDM is rebuilt fresh whenever there is an audit even if the audit is not scoped? If so, isn't that potentially inefficient (and an unnecessary load on the compute API every time an audit runs?). [licanwei]:Yes, the CDM will be built when the first audit being created. and don't rebuild if the next new audit with the same scope. 3. The host_aggregates and availability_zone compute audit scopes don't appear to be documented in the docs or the API reference, just the spec [3]. Should I open a docs bug about what are the supported audit scopes and how they work (it looks like the host_aggregates scope works for aggregate ids or names and availability_zone scope works for AZ names). [licanwei]:There is an example in CLI command 'watcher help create audittemplate' and it's a good idea to documented these. 4. There are a couple of issues with how the unscoped compute nodes are retrieved from nova [4]. a) With microversion 2.33 there is a server-side configurable limit applied when listing hypervisors (defaults to 1000). In a large cloud this could be a problem since the watch client-side code is not paging. b) The code is listing hypervisors with details, but then throwing away those details to just get the hypervisor_hostname, then iterating over each of those node names and getting the details per hypervisor again. I see why this is done because of the scope vs unscoped cases, but we could still optimize this I think (we might need some changes to python-novaclient for this though, which should be easy enough to add). [licanwei]: Yes, If novaclient can do some changes, we can optimize the code. 5. For each server on a node, we get the details of the server in separate API calls to nova [5]. Why can't we just do a GET /servers/detail and filter on "host" or "node" so it's a single API call to nova per hypervisor? [licanwei] This also depends on novaclient. I'm happy to work on any of this but if there are any reasons things need to be done this way please let me know before I get started. Also, how would the core team like these kinds of improvements tracked? With bugs? [licanwei]: welcome to improve Watcher. bug or other kind is not important [1] https://review.opendev.org/#/c/640585/ [2] https://review.opendev.org/#/c/640585/10/watcher/decision_engine/model/colle... [3] https://specs.openstack.org/openstack/watcher-specs/specs/stein/implemented/... [4] https://review.opendev.org/#/c/640585/10/watcher/decision_engine/model/colle... [5] https://review.opendev.org/#/c/640585/10/watcher/decision_engine/model/colle... -- Thanks, Matt