[Openstack-operators] OT?: problems with IRQ-TLB on computes (network interruptions)

gustavo panizzo (gfa) gfa at zumbi.com.ar
Thu Jul 2 02:40:28 UTC 2015


Hello
	we are having a problem were our compute nodes, and the vm running on 
them, suddenly and for some seconds lost network connectivity.
the root cause appears to be the increase of irb-tlb from low values 
(less than 20) to more than >100k, that spike only last for some seconds 
then everything goes back to normal

we have computes running precise (qemu 1.5, ovs 2.0.2, libvirt 1.2.2 and 
kernel 3.13) where the issue is frequent. also we have an small % of our 
fleet running trusty (qemu 2.0.0 ovs 2.0.2 libvirt 1.2.2 and kernel 
3.16) where the problem seemed to be nonexistent until today :(

issue seems to be isolated to < 10% of our hypervisors, some hypervisors 
had this problem every few days, others only once or twice. our vm are a 
black box to us we don't know what users run on them, but mostly cpu and 
network bound workload

as anyone seen this before? as anyone fixed it?


PS: we run libvirt+kvm hypervisor, neutron ovs agent, icehouse (but i 
don't think is a control plane issue)

thanks!

-- 
1AE0 322E B8F7 4717 BDEA BF1D 44BB 1BA7 9F6C 6333

keybase: http://keybase.io/gfa



More information about the OpenStack-operators mailing list