[Openstack-operators] OT?: problems with IRQ-TLB on computes (network interruptions)
gustavo panizzo (gfa)
gfa at zumbi.com.ar
Thu Jul 2 02:40:28 UTC 2015
Hello
we are having a problem were our compute nodes, and the vm running on
them, suddenly and for some seconds lost network connectivity.
the root cause appears to be the increase of irb-tlb from low values
(less than 20) to more than >100k, that spike only last for some seconds
then everything goes back to normal
we have computes running precise (qemu 1.5, ovs 2.0.2, libvirt 1.2.2 and
kernel 3.13) where the issue is frequent. also we have an small % of our
fleet running trusty (qemu 2.0.0 ovs 2.0.2 libvirt 1.2.2 and kernel
3.16) where the problem seemed to be nonexistent until today :(
issue seems to be isolated to < 10% of our hypervisors, some hypervisors
had this problem every few days, others only once or twice. our vm are a
black box to us we don't know what users run on them, but mostly cpu and
network bound workload
as anyone seen this before? as anyone fixed it?
PS: we run libvirt+kvm hypervisor, neutron ovs agent, icehouse (but i
don't think is a control plane issue)
thanks!
--
1AE0 322E B8F7 4717 BDEA BF1D 44BB 1BA7 9F6C 6333
keybase: http://keybase.io/gfa
More information about the OpenStack-operators
mailing list