Hi,

I feel it's a nice update and agree the change.

> - first it creates unnecessary load on the neutron api and neutron
> db/network backend in creasing latency and reducing overall performance.

The periodic task causes huge API workload not only for Neutron API but for the Keystone API too, because of token issue in nova-compute and token validation in neutorn-server.

best regards,

Masahito

-----Original Message-----
From: "Sean Mooney"<smooney@redhat.com>
To: "openstack-discuss@lists.openstack.org"<openstack-discuss@lists.openstack.org>;
Cc:
Sent: 2025/01/22(水) 22:24 (GMT+09:00)
Subject: [nova][neutron][ops] disabling the heal instance info cache periodic

hi folks o/

This is a short PSA and request for comments

The nova and neutron teams have discussed the existence and importantly
the default behavior of
nova's "heal instance info cache periodic" on an off for the better part
of the last 4+ years but we never had time to take any action
on it until now.

Context for those who are not aware of it, nova has an instance info cache
which we use to cache networking information such as ip address related
to ports
associated with an instance. This was introduced in the Essex release when
nova networks was being replaced with quantum/neutron. Its rater fitting
but over due that in the Epoxy release we are now turning this off.

for more context i have captured some of the important point in the
release note
of the proposed change

https://urldefense.com/v3/__https://review.opendev.org/c/openstack/nova/*/939476/2/releasenotes/notes/disable_heal_instance_info_cache_interval-0d9ae7c12793bf7b.yaml__;Kw!!AEH8rfA!wqVM2KmqC5zD9juAXcpnlU5fm0kHaZOMSIiAxw71rY_uyVOWVaDlxck_iuUXR8WV92NenZ8ZfdxfH0XC5dlg7U13QA$

But ill provide a TL;DR here.

By default, as of essex, [compute]heal_instance_info_cache_interval
defaulted to 60.
it has been possible to disable this by setting the value to <=0
(recommended to use -1 by convention).
when the periodic is enabled, every nova compute agent loop over the
instance on the current host, retrieves all there ports form neutron
and fully rebuilds the cache(as of stein). on a small cloud (less then
50 computes) this has minimal impact on the performance
of the neutron server, on moderate to large clouds this creates 10s of
MB of constant db read load.

This has several side-effects:

- first it creates unnecessary load on the neutron api and neutron
db/network backend in creasing latency and reducing overall performance.

- second, it create unnecessary load on rabbitmq as each compute service
need to do an RPC to update the info cache(every 60 seconds).

- third both of the first two waste power doing extra work that is not
needed.

- finally as of Icehouse this periodic has been optional because we
added the os-server-external-events api so that neutron can tell
nova if there has been a change to refresh the cache as needed.

With that in mind hte nova team is proposing that we disable this going
forward,
for this release i have not deprecated the periodic so operators can
re-enable it if they find its required for any reason.
in the 2026.1 release we will likely revisit this based on operator
feedback/bug reports.

In this weeks team meeting we agree to wait 2 more weeks to get feedback
from OPS if they had any concerns with this change
hence this email.

If anyone knows of a reason why we should not proceed with this default
change feel free to let us know
via this email thread or the gerrit review
https://urldefense.com/v3/__https://review.opendev.org/c/openstack/nova/*/939476__;Kw!!AEH8rfA!wqVM2KmqC5zD9juAXcpnlU5fm0kHaZOMSIiAxw71rY_uyVOWVaDlxck_iuUXR8WV92NenZ8ZfdxfH0XC5dmLzxiKDg$

regards

sean