[Openstack-operators] Slow DNS resolver causing problems for OSLO
george.shuklin at gmail.com
Mon Aug 4 23:29:18 UTC 2014
In our installation all endpoints is defined by DNS names. Zone is
hosted by two DNS servers, and both of them are stated in
/etc/resolv.conf on each node.
Today on of the two DNS servers fails badly due hardware malfunction and
stops to reply to any network traffic.
Theoretically, it should cause only small delays in operations - second
DNS server is alive and reply normally.
But in practice I found that every component of openstack starts to
cripple up to level of 500 errors from api's (nova, neutron, etc). I
didn't finish all logs reading, but it seems that slow DNS resolving
causing congestion in connection pools of components, probably to
keystone for token validation.
It was not just 'slow', it was pure '500 errors' in API, problems with
nova/neutron/ceilometer interoperations and so on.
I'll continue to read logs tomorrow, but it really strange.
More information about the OpenStack-operators