[Openstack-operators] [nova] nova-compute automatically disabling itself?
Matt Riedemann
mriedemos at gmail.com
Wed Jan 31 21:40:36 UTC 2018
On 1/31/2018 3:16 PM, Chris Apsey wrote:
> All,
>
> Running in to a strange issue I haven't seen before.
>
> Randomly, the nova-compute services on compute nodes are disabling
> themselves (as if someone ran openstack compute service set --disable
> hostX nova-compute. When this happens, the node continues to report
> itself as 'up' - the service is just disabled. As a result, if enough
> of these occur, we get scheduling errors due to lack of available
> resources (which makes sense). Re-enabling them works just fine and
> they continue on as if nothing happened. I looked through the logs and
> I can find the API calls where we re-enable the services (PUT
> /v2.1/os-services/enable), but I do not see any API calls where the
> services are getting disabled initially.
>
> Is anyone aware of any cases where compute nodes will automatically
> disable their nova-compute service on their own, or has anyone seen this
> before and might know a root cause? We have plenty of spare vcpus and
> RAM on each node - like less than 25% utilization (both in absolute
> terms and in terms of applied ratios).
>
> We're seeing follow-on errors regarding rmq messages getting lost and
> vif-plug failures, but we think those are a symptom, not a cause.
>
> Currently running pike on Xenial.
>
> ---
> v/r
>
> Chris Apsey
> bitskrieg at bitskrieg.net
> https://www.bitskrieg.net
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
This is actually a feature added in Pike:
https://review.openstack.org/#/c/463597/
This came up in discussion with operators at the Forum in Boston.
The vif-plug failures are likely the reason those computes are getting
disabled.
There is a config option "consecutive_build_service_disable_threshold"
which you can set to disable the auto-disable behavior as some have
experienced issues with it:
https://bugs.launchpad.net/nova/+bug/1742102
--
Thanks,
Matt
More information about the OpenStack-operators
mailing list