[Openstack] High Latency to VMs

André Aranha andre.f.aranha at gmail.com
Mon Dec 15 17:02:36 UTC 2014


Our kernel version in controller is 3.13.0-37-generic, on ComputeNode
is 3.13.0-24-generic and in the NetworkNode is 3.13.0-35-generic.

On 13 December 2014 at 04:39, Min Pae <sputnik13 at gmail.com> wrote:
>
> What kernel version are you running on the host?
>
> On Fri, Dec 12, 2014 at 12:09 PM, André Aranha <andre.f.aranha at gmail.com>
> wrote:
> > Our compute nodes are using vhost_net, we haven't made any changes to
> buffer
> > our NIC.
> > The system is not over loaded, cpu usage aren't higher than 30%
> >
> > On 12 December 2014 at 02:35, mad Engineer <themadengin33r at gmail.com>
> wrote:
> >>
> >> so looks like its not the issue with openvswitch,missed is quite
> >> normal,it is not the reason for packet loss
> >> is your guests using vhost_net?
> >> do
> >> ps aux|grep vhost
> >> also have you made any changes to buffer size of your NIC?
> >> is the system over loaded what is the cpu usage
> >>
> >> On Thu, Dec 11, 2014 at 6:20 PM, André Aranha <andre.f.aranha at gmail.com
> >
> >> wrote:
> >> > Thanks for the advice, i've run the command in NetworkNode and in a
> >> > ComputeNode and lost is 0, but missed is a high value.
> >> >
> >> > NetworkNode
> >> > system at ovs-system:
> >> > lookups: hit:425667155 missed:2962922 lost:0
> >> > flows: 27
> >> > port 0: ovs-system (internal)
> >> > port 1: br-ex (internal)
> >> > port 2: br-tun (internal)
> >> > port 3: eth1
> >> > port 4: br-int (internal)
> >> > port 5: tapbdc3d959-d8 (internal)
> >> > port 6: gre_system (gre: df_default=false, ttl=0)
> >> > port 7: qr-4063db49-6b (internal)
> >> > port 8: qg-e427e527-92 (internal)
> >> >
> >> >
> >> > ComputeNode
> >> > system at ovs-system:
> >> > lookups: hit:28660666 missed:200922 lost:0
> >> > flows: 19
> >> > port 0: ovs-system (internal)
> >> > port 1: br-int (internal)
> >> > port 2: br-tun (internal)
> >> > port 3: gre_system (gre: df_default=false, ttl=0)
> >> > port 4: em1
> >> > port 5: br-private (internal)
> >> > port 6: qvo9a959049-a0
> >> > port 7: qvodd0ef077-e1
> >> > port 8: qvoac2b566b-65
> >> > port 9: qvo9e4ab149-5c
> >> > port 10: qvoc2d2625c-0c
> >> > port 11: qvo3069daeb-4a
> >> > port 12: qvo7f82a3cf-0c
> >> > port 13: qvo83b77d2d-1a
> >> > port 14: qvobbadd8c2-30
> >> > port 15: qvocfd0b8e8-ad
> >> > port 16: qvo714fab88-60
> >> > port 17: qvob9ddde49-86
> >> > port 18: qvo42ef9f3b-ac
> >> > port 19: qvof4ae7868-41
> >> > port 20: qvoa4408a18-03
> >> > port 22: qvo36c64d52-9b
> >> >
> >> > On 11 December 2014 at 06:17, mad Engineer <themadengin33r at gmail.com>
> >> > wrote:
> >> >>
> >> >> sorry its 2.3.0 not 2.1.3
> >> >>
> >> >> On Thu, Dec 11, 2014 at 2:43 PM, mad Engineer
> >> >> <themadengin33r at gmail.com>
> >> >> wrote:
> >> >> > Not in openstack,i had performance issue, with OVS and bursty
> traffic
> >> >> > upgrading to later version improved the performance.A lot of
> >> >> > performance features have been added in  2.1.3.
> >> >> >
> >> >> > Do you have lots of lost: value in
> >> >> > ovs-dpctl show
> >> >> >
> >> >> >
> >> >> > On Thu, Dec 11, 2014 at 2:33 AM, André Aranha
> >> >> > <andre.f.aranha at gmail.com>
> >> >> > wrote:
> >> >> >> Yes, we are using version 2.0.2.
> >> >> >> The process uses only about 0.3% on network node and compute node.
> >> >> >> Did you have the same issue?
> >> >> >>
> >> >> >> On 10 December 2014 at 14:31, mad Engineer
> >> >> >> <themadengin33r at gmail.com>
> >> >> >> wrote:
> >> >> >>>
> >> >> >>> are you using openvswitch? which version?
> >> >> >>> if yes,is it consuming a lot of CPU?
> >> >> >>>
> >> >> >>> On Wed, Dec 10, 2014 at 7:45 PM, André Aranha
> >> >> >>> <andre.f.aranha at gmail.com>
> >> >> >>> wrote:
> >> >> >>> > Well, here we are using de Icehouse with Ubuntu 14.04 LTS
> >> >> >>> >
> >> >> >>> > We found this thread in the community  and we apply the changes
> >> >> >>> > in
> >> >> >>> > the
> >> >> >>> > compute nodes (change VHOST_NET_ENABLED to 1 in
> >> >> >>> > /etc/default/qemu-kvm).
> >> >> >>> > After do this, a few instances the problem doesn't exists
> >> >> >>> > anymore.
> >> >> >>> > This
> >> >> >>> > link
> >> >> >>> > show an investigation to find the problem.
> >> >> >>> >
> >> >> >>> > About the MTU in our cloud (using iperf),
> >> >> >>> >
> >> >> >>> > 1-from any the Desktop to the Network Node
> >> >> >>> > MSS size 1448 bytes (MTU 1500 bytes, ethernet)
> >> >> >>> >
> >> >> >>> > 2-from any Desktop to the instance
> >> >> >>> > MSS size 1348 bytes (MTU 1388 bytes, unknown interface)
> >> >> >>> >
> >> >> >>> > 3- from any instance to the Network Node
> >> >> >>> > MSS size 1348 bytes (MTU 1388 bytes, unknown interface)
> >> >> >>> >
> >> >> >>> > 4- from any instance to the Desktop
> >> >> >>> > MSS size 1348 bytes (MTU 1388 bytes, unknown interface)
> >> >> >>> >
> >> >> >>> > 5-from Network Node to any ComputeNode
> >> >> >>> > MSS size 1448 bytes (MTU 1500 bytes, ethernet)
> >> >> >>> >
> >> >> >>> > 6-from any ComputeNode to NetworkNode
> >> >> >>> > MSS size 1448 bytes (MTU 1500 bytes, ethernet)
> >> >> >>> >
> >> >> >>> > On 10 December 2014 at 10:31, somshekar kadam
> >> >> >>> > <som_kadam at yahoo.co.in>
> >> >> >>> > wrote:
> >> >> >>> >>
> >> >> >>> >> Sorry for wrong post mail chain.
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >> Regards
> >> >> >>> >> Neelu
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >> On Wednesday, 10 December 2014 6:59 PM, somshekar kadam
> >> >> >>> >> <som_kadam at yahoo.co.in> wrote:
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >> Hi All,
> >> >> >>> >>
> >> >> >>> >> Please recommend which stable Host OS to use for Controller
> and
> >> >> >>> >> Compute
> >> >> >>> >> node.
> >> >> >>> >> I have tried Fedora20 seems lot of tweaking is required,
> corerct
> >> >> >>> >> me
> >> >> >>> >> If
> >> >> >>> >> I
> >> >> >>> >> am wrong.
> >> >> >>> >> I see that most of it is tested on ubuntu and centos.
> >> >> >>> >> I am planning to use JUNO stable version.
> >> >> >>> >> Please help on this
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >> Regards
> >> >> >>> >> Neelu
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >> On Wednesday, 10 December 2014 5:42 PM, Hannah Fordham
> >> >> >>> >> <hfordham at radiantworlds.com> wrote:
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >> I'm afraid we didn't, we're still struggling with some VMs
> with
> >> >> >>> >> this
> >> >> >>> >> problem. Sorry!
> >> >> >>> >>
> >> >> >>> >> On 9 December 2014 14:09:32 GMT+00:00, "André Aranha"
> >> >> >>> >> <andre.f.aranha at gmail.com> wrote:
> >> >> >>> >>
> >> >> >>> >> Hi,
> >> >> >>> >>
> >> >> >>> >> We are with the same issue here, and already try some
> solutions
> >> >> >>> >> that
> >> >> >>> >> didn't work at all. Did you solved this problem?
> >> >> >>> >>
> >> >> >>> >> Thank you,
> >> >> >>> >> Andre Aranha
> >> >> >>> >>
> >> >> >>> >> On 27 August 2014 at 08:17, Hannah Fordham
> >> >> >>> >> <hfordham at radiantworlds.com>
> >> >> >>> >> wrote:
> >> >> >>> >>
> >> >> >>> >> I’ve been trying to figure this one out for a while, so I’ll
> try
> >> >> >>> >> and be
> >> >> >>> >> as
> >> >> >>> >> thorough as possible in this post but apologies if I miss
> >> >> >>> >> anything
> >> >> >>> >> pertinent
> >> >> >>> >> out.
> >> >> >>> >>
> >> >> >>> >> First off, I’m running a set up with one control node and 5
> >> >> >>> >> compute
> >> >> >>> >> nodes,
> >> >> >>> >> all created using the Stackgeek scripts -
> >> >> >>> >> http://www.stackgeek.com/guides/gettingstarted.html. The
> first
> >> >> >>> >> two
> >> >> >>> >> (compute1
> >> >> >>> >> and compute 2) were created at the same time, compute3, 4 and
> 5
> >> >> >>> >> were
> >> >> >>> >> added
> >> >> >>> >> as needed later. My VMs are predominantly CentOS, while my
> >> >> >>> >> Openstack
> >> >> >>> >> nodes
> >> >> >>> >> are Ubuntu 14.04.1
> >> >> >>> >>
> >> >> >>> >> The symptom: irregular high latency/packet loss to VMs on all
> >> >> >>> >> compute
> >> >> >>> >> boxes except compute3. Mostly a pain when trying to do
> anything
> >> >> >>> >> via
> >> >> >>> >> ssh
> >> >> >>> >> on a
> >> >> >>> >> VM because the lag makes it difficult to do anything, but it
> >> >> >>> >> shows
> >> >> >>> >> itself
> >> >> >>> >> quite nicely through pings as well:
> >> >> >>> >> --- 10.0.102.47 ping statistics ---
> >> >> >>> >> 111 packets transmitted, 103 received, 7% packet loss, time
> >> >> >>> >> 110024ms
> >> >> >>> >> rtt min/avg/max/mdev = 0.096/367.220/5593.100/1146.920 ms,
> pipe
> >> >> >>> >> 6
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >> I have tested these pings:
> >> >> >>> >> VM to itself (via its external IP) seems fine
> >> >> >>> >> VM to another VM is not fine
> >> >> >>> >> Hosting compute node to VM is not fine
> >> >> >>> >> My PC to VM is not fine (however the other way round works
> fine)
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >> Top on a (32 core) compute node with laggy VMs:
> >> >> >>> >> top - 12:09:20 up 33 days, 21:35,  1 user,  load average:
> 2.37,
> >> >> >>> >> 4.95,
> >> >> >>> >> 6.23
> >> >> >>> >> Tasks: 431 total,   2 running, 429 sleeping,   0 stopped,   0
> >> >> >>> >> zombie
> >> >> >>> >> %Cpu(s):  0.6 us,  3.4 sy,  0.0 ni, 96.0 id,  0.0 wa,  0.0 hi,
> >> >> >>> >> 0.0
> >> >> >>> >> si,
> >> >> >>> >> 0.0 st
> >> >> >>> >> KiB Mem:  65928256 total, 44210348 used, 21717908 free,
>  341172
> >> >> >>> >> buffers
> >> >> >>> >> KiB Swap:  7812092 total,  1887864 used,  5924228 free.
> 7134740
> >> >> >>> >> cached
> >> >> >>> >> Mem
> >> >> >>> >>
> >> >> >>> >> And for comparison, on the one compute node that doesn’t seem
> to
> >> >> >>> >> be
> >> >> >>> >> suffering from this:
> >> >> >>> >> top - 12:12:20 up 33 days, 21:38,  1 user,  load average:
> 0.28,
> >> >> >>> >> 0.18,
> >> >> >>> >> 0.15
> >> >> >>> >> Tasks: 399 total,   3 running, 396 sleeping,   0 stopped,   0
> >> >> >>> >> zombie
> >> >> >>> >> %Cpu(s):  0.3 us,  0.1 sy,  0.0 ni, 98.9 id,  0.6 wa,  0.0 hi,
> >> >> >>> >> 0.0
> >> >> >>> >> si,
> >> >> >>> >> 0.0 st
> >> >> >>> >> KiB Mem:  65928256 total, 49986064 used, 15942192 free,
>  335788
> >> >> >>> >> buffers
> >> >> >>> >> KiB Swap:  7812092 total,   919392 used,  6892700 free.
> 39272312
> >> >> >>> >> cached
> >> >> >>> >> Mem
> >> >> >>> >>
> >> >> >>> >> Top on a laggy VM:
> >> >> >>> >> top - 11:02:53 up 27 days, 33 min,  3 users,  load average:
> >> >> >>> >> 0.00,
> >> >> >>> >> 0.00,
> >> >> >>> >> 0.00
> >> >> >>> >> Tasks:  91 total,   1 running,  90 sleeping,   0 stopped,   0
> >> >> >>> >> zombie
> >> >> >>> >> Cpu(s):  0.2%us,  0.1%sy,  0.0%ni, 99.5%id,  0.1%wa,  0.0%hi,
> >> >> >>> >> 0.0%si,
> >> >> >>> >> 0.0%st
> >> >> >>> >> Mem:   1020400k total,   881004k used,   139396k free,
>  162632k
> >> >> >>> >> buffers
> >> >> >>> >> Swap:  1835000k total,    14984k used,  1820016k free,
>  220644k
> >> >> >>> >> cached
> >> >> >>> >>
> >> >> >>> >> http://imgur.com/blULjDa shows the hypervisor panel of
> Horizon.
> >> >> >>> >> As
> >> >> >>> >> you
> >> >> >>> >> can
> >> >> >>> >> see, Compute 3 has fewer resources used, but none of the
> compute
> >> >> >>> >> nodes
> >> >> >>> >> should be anywhere near overloaded from what I can tell.
> >> >> >>> >>
> >> >> >>> >> Any ideas? Let me know if I’m missing anything obvious that
> >> >> >>> >> would
> >> >> >>> >> help
> >> >> >>> >> with figuring this out!
> >> >> >>> >>
> >> >> >>> >> Hannah
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >> ***********
> >> >> >>> >>
> >> >> >>> >> Radiant Worlds Limited is registered in England (company no:
> >> >> >>> >> 07822337).
> >> >> >>> >> This message is intended solely for the addressee and may
> >> >> >>> >> contain
> >> >> >>> >> confidential information. If you have received this message in
> >> >> >>> >> error
> >> >> >>> >> please
> >> >> >>> >> send it back to us and immediately and permanently delete it
> >> >> >>> >> from
> >> >> >>> >> your
> >> >> >>> >> system. Do not use, copy or disclose the information contained
> >> >> >>> >> in
> >> >> >>> >> this
> >> >> >>> >> message or in any attachment. Please also note that
> transmission
> >> >> >>> >> cannot
> >> >> >>> >> be
> >> >> >>> >> guaranteed to be secure or error-free.
> >> >> >>> >>
> >> >> >>> >> _______________________________________________
> >> >> >>> >> Mailing list:
> >> >> >>> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> >> >> >>> >> Post to     : openstack at lists.openstack.org
> >> >> >>> >> Unsubscribe :
> >> >> >>> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >> --
> >> >> >>> >> Sent from my Android device with K-9 Mail. Please excuse my
> >> >> >>> >> brevity.
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >> ***********
> >> >> >>> >>
> >> >> >>> >> Radiant Worlds Limited is registered in England (company no:
> >> >> >>> >> 07822337).
> >> >> >>> >> This message is intended solely for the addressee and may
> >> >> >>> >> contain
> >> >> >>> >> confidential information. If you have received this message in
> >> >> >>> >> error
> >> >> >>> >> please
> >> >> >>> >> send it back to us and immediately and permanently delete it
> >> >> >>> >> from
> >> >> >>> >> your
> >> >> >>> >> system. Do not use, copy or disclose the information contained
> >> >> >>> >> in
> >> >> >>> >> this
> >> >> >>> >> message or in any attachment. Please also note that
> transmission
> >> >> >>> >> cannot
> >> >> >>> >> be
> >> >> >>> >> guaranteed to be secure or error-free.
> >> >> >>> >>
> >> >> >>> >> _______________________________________________
> >> >> >>> >> Mailing list:
> >> >> >>> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> >> >> >>> >> Post to    : openstack at lists.openstack.org
> >> >> >>> >> Unsubscribe :
> >> >> >>> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >> _______________________________________________
> >> >> >>> >> Mailing list:
> >> >> >>> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> >> >> >>> >> Post to    : openstack at lists.openstack.org
> >> >> >>> >> Unsubscribe :
> >> >> >>> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> >> >> >>> >>
> >> >> >>> >>
> >> >> >>> >
> >> >> >>> >
> >> >> >>> > _______________________________________________
> >> >> >>> > Mailing list:
> >> >> >>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> >> >> >>> > Post to     : openstack at lists.openstack.org
> >> >> >>> > Unsubscribe :
> >> >> >>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> >> >> >>> >
> >> >> >>
> >> >> >>
> >> >
> >> >
> >
> >
> > _______________________________________________
> > Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> > Post to     : openstack at lists.openstack.org
> > Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20141215/f0f4391d/attachment.html>


More information about the Openstack mailing list