[Openstack-operators] [nova] nova-compute automatically disabling itself?
Chris Apsey
bitskrieg at bitskrieg.net
Wed Jan 31 21:16:16 UTC 2018
All,
Running in to a strange issue I haven't seen before.
Randomly, the nova-compute services on compute nodes are disabling
themselves (as if someone ran openstack compute service set --disable
hostX nova-compute. When this happens, the node continues to report
itself as 'up' - the service is just disabled. As a result, if enough
of these occur, we get scheduling errors due to lack of available
resources (which makes sense). Re-enabling them works just fine and
they continue on as if nothing happened. I looked through the logs and
I can find the API calls where we re-enable the services (PUT
/v2.1/os-services/enable), but I do not see any API calls where the
services are getting disabled initially.
Is anyone aware of any cases where compute nodes will automatically
disable their nova-compute service on their own, or has anyone seen this
before and might know a root cause? We have plenty of spare vcpus and
RAM on each node - like less than 25% utilization (both in absolute
terms and in terms of applied ratios).
We're seeing follow-on errors regarding rmq messages getting lost and
vif-plug failures, but we think those are a symptom, not a cause.
Currently running pike on Xenial.
---
v/r
Chris Apsey
bitskrieg at bitskrieg.net
https://www.bitskrieg.net
More information about the OpenStack-operators
mailing list