[Openstack-operators] Slow DNS resolver causing problems for OSLO

George Shuklin george.shuklin at gmail.com
Mon Aug 4 23:29:18 UTC 2014


In our installation all endpoints is defined by DNS names. Zone is 
hosted by two DNS servers, and both of them are stated in 
/etc/resolv.conf on each node.

Today on of the two DNS servers fails badly due hardware malfunction and 
stops to reply to any network traffic.

Theoretically, it should cause only small delays in operations - second 
DNS server is alive and reply normally.

But in practice I found that every component of openstack starts to 
cripple up to level of 500 errors from api's (nova, neutron, etc). I 
didn't finish all logs reading, but it seems that slow DNS resolving 
causing congestion in connection pools of components, probably to 
keystone for token validation.

It was not just 'slow', it was pure '500 errors' in API, problems with 
nova/neutron/ceilometer interoperations and so on.

I'll continue to read logs tomorrow, but it really strange.

Any ideas/comments?

More information about the OpenStack-operators mailing list