[Openstack-operators] OT?: problems with IRQ-TLB on computes (network interruptions)

gustavo panizzo (gfa) gfa at zumbi.com.ar
Thu Jul 2 02:44:29 UTC 2015


an screenshot of collectd from the affected hypervisor

http://zumbi.com.ar/tmp/irq-tlb.png

On 2015-07-02 10:40, gustavo panizzo (gfa) wrote:
> Hello
>      we are having a problem were our compute nodes, and the vm running
> on them, suddenly and for some seconds lost network connectivity.
> the root cause appears to be the increase of irb-tlb from low values
> (less than 20) to more than >100k, that spike only last for some seconds
> then everything goes back to normal
>
> we have computes running precise (qemu 1.5, ovs 2.0.2, libvirt 1.2.2 and
> kernel 3.13) where the issue is frequent. also we have an small % of our
> fleet running trusty (qemu 2.0.0 ovs 2.0.2 libvirt 1.2.2 and kernel
> 3.16) where the problem seemed to be nonexistent until today :(
>
> issue seems to be isolated to < 10% of our hypervisors, some hypervisors
> had this problem every few days, others only once or twice. our vm are a
> black box to us we don't know what users run on them, but mostly cpu and
> network bound workload
>
> as anyone seen this before? as anyone fixed it?
>
>
> PS: we run libvirt+kvm hypervisor, neutron ovs agent, icehouse (but i
> don't think is a control plane issue)
>
> thanks!
>

-- 
1AE0 322E B8F7 4717 BDEA BF1D 44BB 1BA7 9F6C 6333

keybase: http://keybase.io/gfa



More information about the OpenStack-operators mailing list