[Openstack] High Latency to VMs

Adrián Norte Fernández adrian at bashlines.com
Mon Dec 15 17:23:02 UTC 2014


Have you tried disabling offloading on the network cards?
El 15/12/2014 18:21, "André Aranha" <andre.f.aranha at gmail.com> escribió:

> Our kernel version in controller is 3.13.0-37-generic, on ComputeNode
> is 3.13.0-24-generic and in the NetworkNode is 3.13.0-35-generic.
>
> On 13 December 2014 at 04:39, Min Pae <sputnik13 at gmail.com> wrote:
>>
>> What kernel version are you running on the host?
>>
>> On Fri, Dec 12, 2014 at 12:09 PM, André Aranha <andre.f.aranha at gmail.com>
>> wrote:
>> > Our compute nodes are using vhost_net, we haven't made any changes to
>> buffer
>> > our NIC.
>> > The system is not over loaded, cpu usage aren't higher than 30%
>> >
>> > On 12 December 2014 at 02:35, mad Engineer <themadengin33r at gmail.com>
>> wrote:
>> >>
>> >> so looks like its not the issue with openvswitch,missed is quite
>> >> normal,it is not the reason for packet loss
>> >> is your guests using vhost_net?
>> >> do
>> >> ps aux|grep vhost
>> >> also have you made any changes to buffer size of your NIC?
>> >> is the system over loaded what is the cpu usage
>> >>
>> >> On Thu, Dec 11, 2014 at 6:20 PM, André Aranha <
>> andre.f.aranha at gmail.com>
>> >> wrote:
>> >> > Thanks for the advice, i've run the command in NetworkNode and in a
>> >> > ComputeNode and lost is 0, but missed is a high value.
>> >> >
>> >> > NetworkNode
>> >> > system at ovs-system:
>> >> > lookups: hit:425667155 missed:2962922 lost:0
>> >> > flows: 27
>> >> > port 0: ovs-system (internal)
>> >> > port 1: br-ex (internal)
>> >> > port 2: br-tun (internal)
>> >> > port 3: eth1
>> >> > port 4: br-int (internal)
>> >> > port 5: tapbdc3d959-d8 (internal)
>> >> > port 6: gre_system (gre: df_default=false, ttl=0)
>> >> > port 7: qr-4063db49-6b (internal)
>> >> > port 8: qg-e427e527-92 (internal)
>> >> >
>> >> >
>> >> > ComputeNode
>> >> > system at ovs-system:
>> >> > lookups: hit:28660666 missed:200922 lost:0
>> >> > flows: 19
>> >> > port 0: ovs-system (internal)
>> >> > port 1: br-int (internal)
>> >> > port 2: br-tun (internal)
>> >> > port 3: gre_system (gre: df_default=false, ttl=0)
>> >> > port 4: em1
>> >> > port 5: br-private (internal)
>> >> > port 6: qvo9a959049-a0
>> >> > port 7: qvodd0ef077-e1
>> >> > port 8: qvoac2b566b-65
>> >> > port 9: qvo9e4ab149-5c
>> >> > port 10: qvoc2d2625c-0c
>> >> > port 11: qvo3069daeb-4a
>> >> > port 12: qvo7f82a3cf-0c
>> >> > port 13: qvo83b77d2d-1a
>> >> > port 14: qvobbadd8c2-30
>> >> > port 15: qvocfd0b8e8-ad
>> >> > port 16: qvo714fab88-60
>> >> > port 17: qvob9ddde49-86
>> >> > port 18: qvo42ef9f3b-ac
>> >> > port 19: qvof4ae7868-41
>> >> > port 20: qvoa4408a18-03
>> >> > port 22: qvo36c64d52-9b
>> >> >
>> >> > On 11 December 2014 at 06:17, mad Engineer <themadengin33r at gmail.com
>> >
>> >> > wrote:
>> >> >>
>> >> >> sorry its 2.3.0 not 2.1.3
>> >> >>
>> >> >> On Thu, Dec 11, 2014 at 2:43 PM, mad Engineer
>> >> >> <themadengin33r at gmail.com>
>> >> >> wrote:
>> >> >> > Not in openstack,i had performance issue, with OVS and bursty
>> traffic
>> >> >> > upgrading to later version improved the performance.A lot of
>> >> >> > performance features have been added in  2.1.3.
>> >> >> >
>> >> >> > Do you have lots of lost: value in
>> >> >> > ovs-dpctl show
>> >> >> >
>> >> >> >
>> >> >> > On Thu, Dec 11, 2014 at 2:33 AM, André Aranha
>> >> >> > <andre.f.aranha at gmail.com>
>> >> >> > wrote:
>> >> >> >> Yes, we are using version 2.0.2.
>> >> >> >> The process uses only about 0.3% on network node and compute
>> node.
>> >> >> >> Did you have the same issue?
>> >> >> >>
>> >> >> >> On 10 December 2014 at 14:31, mad Engineer
>> >> >> >> <themadengin33r at gmail.com>
>> >> >> >> wrote:
>> >> >> >>>
>> >> >> >>> are you using openvswitch? which version?
>> >> >> >>> if yes,is it consuming a lot of CPU?
>> >> >> >>>
>> >> >> >>> On Wed, Dec 10, 2014 at 7:45 PM, André Aranha
>> >> >> >>> <andre.f.aranha at gmail.com>
>> >> >> >>> wrote:
>> >> >> >>> > Well, here we are using de Icehouse with Ubuntu 14.04 LTS
>> >> >> >>> >
>> >> >> >>> > We found this thread in the community  and we apply the
>> changes
>> >> >> >>> > in
>> >> >> >>> > the
>> >> >> >>> > compute nodes (change VHOST_NET_ENABLED to 1 in
>> >> >> >>> > /etc/default/qemu-kvm).
>> >> >> >>> > After do this, a few instances the problem doesn't exists
>> >> >> >>> > anymore.
>> >> >> >>> > This
>> >> >> >>> > link
>> >> >> >>> > show an investigation to find the problem.
>> >> >> >>> >
>> >> >> >>> > About the MTU in our cloud (using iperf),
>> >> >> >>> >
>> >> >> >>> > 1-from any the Desktop to the Network Node
>> >> >> >>> > MSS size 1448 bytes (MTU 1500 bytes, ethernet)
>> >> >> >>> >
>> >> >> >>> > 2-from any Desktop to the instance
>> >> >> >>> > MSS size 1348 bytes (MTU 1388 bytes, unknown interface)
>> >> >> >>> >
>> >> >> >>> > 3- from any instance to the Network Node
>> >> >> >>> > MSS size 1348 bytes (MTU 1388 bytes, unknown interface)
>> >> >> >>> >
>> >> >> >>> > 4- from any instance to the Desktop
>> >> >> >>> > MSS size 1348 bytes (MTU 1388 bytes, unknown interface)
>> >> >> >>> >
>> >> >> >>> > 5-from Network Node to any ComputeNode
>> >> >> >>> > MSS size 1448 bytes (MTU 1500 bytes, ethernet)
>> >> >> >>> >
>> >> >> >>> > 6-from any ComputeNode to NetworkNode
>> >> >> >>> > MSS size 1448 bytes (MTU 1500 bytes, ethernet)
>> >> >> >>> >
>> >> >> >>> > On 10 December 2014 at 10:31, somshekar kadam
>> >> >> >>> > <som_kadam at yahoo.co.in>
>> >> >> >>> > wrote:
>> >> >> >>> >>
>> >> >> >>> >> Sorry for wrong post mail chain.
>> >> >> >>> >>
>> >> >> >>> >>
>> >> >> >>> >> Regards
>> >> >> >>> >> Neelu
>> >> >> >>> >>
>> >> >> >>> >>
>> >> >> >>> >> On Wednesday, 10 December 2014 6:59 PM, somshekar kadam
>> >> >> >>> >> <som_kadam at yahoo.co.in> wrote:
>> >> >> >>> >>
>> >> >> >>> >>
>> >> >> >>> >> Hi All,
>> >> >> >>> >>
>> >> >> >>> >> Please recommend which stable Host OS to use for Controller
>> and
>> >> >> >>> >> Compute
>> >> >> >>> >> node.
>> >> >> >>> >> I have tried Fedora20 seems lot of tweaking is required,
>> corerct
>> >> >> >>> >> me
>> >> >> >>> >> If
>> >> >> >>> >> I
>> >> >> >>> >> am wrong.
>> >> >> >>> >> I see that most of it is tested on ubuntu and centos.
>> >> >> >>> >> I am planning to use JUNO stable version.
>> >> >> >>> >> Please help on this
>> >> >> >>> >>
>> >> >> >>> >>
>> >> >> >>> >> Regards
>> >> >> >>> >> Neelu
>> >> >> >>> >>
>> >> >> >>> >>
>> >> >> >>> >> On Wednesday, 10 December 2014 5:42 PM, Hannah Fordham
>> >> >> >>> >> <hfordham at radiantworlds.com> wrote:
>> >> >> >>> >>
>> >> >> >>> >>
>> >> >> >>> >> I'm afraid we didn't, we're still struggling with some VMs
>> with
>> >> >> >>> >> this
>> >> >> >>> >> problem. Sorry!
>> >> >> >>> >>
>> >> >> >>> >> On 9 December 2014 14:09:32 GMT+00:00, "André Aranha"
>> >> >> >>> >> <andre.f.aranha at gmail.com> wrote:
>> >> >> >>> >>
>> >> >> >>> >> Hi,
>> >> >> >>> >>
>> >> >> >>> >> We are with the same issue here, and already try some
>> solutions
>> >> >> >>> >> that
>> >> >> >>> >> didn't work at all. Did you solved this problem?
>> >> >> >>> >>
>> >> >> >>> >> Thank you,
>> >> >> >>> >> Andre Aranha
>> >> >> >>> >>
>> >> >> >>> >> On 27 August 2014 at 08:17, Hannah Fordham
>> >> >> >>> >> <hfordham at radiantworlds.com>
>> >> >> >>> >> wrote:
>> >> >> >>> >>
>> >> >> >>> >> I’ve been trying to figure this one out for a while, so I’ll
>> try
>> >> >> >>> >> and be
>> >> >> >>> >> as
>> >> >> >>> >> thorough as possible in this post but apologies if I miss
>> >> >> >>> >> anything
>> >> >> >>> >> pertinent
>> >> >> >>> >> out.
>> >> >> >>> >>
>> >> >> >>> >> First off, I’m running a set up with one control node and 5
>> >> >> >>> >> compute
>> >> >> >>> >> nodes,
>> >> >> >>> >> all created using the Stackgeek scripts -
>> >> >> >>> >> http://www.stackgeek.com/guides/gettingstarted.html. The
>> first
>> >> >> >>> >> two
>> >> >> >>> >> (compute1
>> >> >> >>> >> and compute 2) were created at the same time, compute3, 4
>> and 5
>> >> >> >>> >> were
>> >> >> >>> >> added
>> >> >> >>> >> as needed later. My VMs are predominantly CentOS, while my
>> >> >> >>> >> Openstack
>> >> >> >>> >> nodes
>> >> >> >>> >> are Ubuntu 14.04.1
>> >> >> >>> >>
>> >> >> >>> >> The symptom: irregular high latency/packet loss to VMs on all
>> >> >> >>> >> compute
>> >> >> >>> >> boxes except compute3. Mostly a pain when trying to do
>> anything
>> >> >> >>> >> via
>> >> >> >>> >> ssh
>> >> >> >>> >> on a
>> >> >> >>> >> VM because the lag makes it difficult to do anything, but it
>> >> >> >>> >> shows
>> >> >> >>> >> itself
>> >> >> >>> >> quite nicely through pings as well:
>> >> >> >>> >> --- 10.0.102.47 ping statistics ---
>> >> >> >>> >> 111 packets transmitted, 103 received, 7% packet loss, time
>> >> >> >>> >> 110024ms
>> >> >> >>> >> rtt min/avg/max/mdev = 0.096/367.220/5593.100/1146.920 ms,
>> pipe
>> >> >> >>> >> 6
>> >> >> >>> >>
>> >> >> >>> >>
>> >> >> >>> >> I have tested these pings:
>> >> >> >>> >> VM to itself (via its external IP) seems fine
>> >> >> >>> >> VM to another VM is not fine
>> >> >> >>> >> Hosting compute node to VM is not fine
>> >> >> >>> >> My PC to VM is not fine (however the other way round works
>> fine)
>> >> >> >>> >>
>> >> >> >>> >>
>> >> >> >>> >> Top on a (32 core) compute node with laggy VMs:
>> >> >> >>> >> top - 12:09:20 up 33 days, 21:35,  1 user,  load average:
>> 2.37,
>> >> >> >>> >> 4.95,
>> >> >> >>> >> 6.23
>> >> >> >>> >> Tasks: 431 total,   2 running, 429 sleeping,   0 stopped,   0
>> >> >> >>> >> zombie
>> >> >> >>> >> %Cpu(s):  0.6 us,  3.4 sy,  0.0 ni, 96.0 id,  0.0 wa,  0.0
>> hi,
>> >> >> >>> >> 0.0
>> >> >> >>> >> si,
>> >> >> >>> >> 0.0 st
>> >> >> >>> >> KiB Mem:  65928256 total, 44210348 used, 21717908 free,
>>  341172
>> >> >> >>> >> buffers
>> >> >> >>> >> KiB Swap:  7812092 total,  1887864 used,  5924228 free.
>> 7134740
>> >> >> >>> >> cached
>> >> >> >>> >> Mem
>> >> >> >>> >>
>> >> >> >>> >> And for comparison, on the one compute node that doesn’t
>> seem to
>> >> >> >>> >> be
>> >> >> >>> >> suffering from this:
>> >> >> >>> >> top - 12:12:20 up 33 days, 21:38,  1 user,  load average:
>> 0.28,
>> >> >> >>> >> 0.18,
>> >> >> >>> >> 0.15
>> >> >> >>> >> Tasks: 399 total,   3 running, 396 sleeping,   0 stopped,   0
>> >> >> >>> >> zombie
>> >> >> >>> >> %Cpu(s):  0.3 us,  0.1 sy,  0.0 ni, 98.9 id,  0.6 wa,  0.0
>> hi,
>> >> >> >>> >> 0.0
>> >> >> >>> >> si,
>> >> >> >>> >> 0.0 st
>> >> >> >>> >> KiB Mem:  65928256 total, 49986064 used, 15942192 free,
>>  335788
>> >> >> >>> >> buffers
>> >> >> >>> >> KiB Swap:  7812092 total,   919392 used,  6892700 free.
>> 39272312
>> >> >> >>> >> cached
>> >> >> >>> >> Mem
>> >> >> >>> >>
>> >> >> >>> >> Top on a laggy VM:
>> >> >> >>> >> top - 11:02:53 up 27 days, 33 min,  3 users,  load average:
>> >> >> >>> >> 0.00,
>> >> >> >>> >> 0.00,
>> >> >> >>> >> 0.00
>> >> >> >>> >> Tasks:  91 total,   1 running,  90 sleeping,   0 stopped,   0
>> >> >> >>> >> zombie
>> >> >> >>> >> Cpu(s):  0.2%us,  0.1%sy,  0.0%ni, 99.5%id,  0.1%wa,  0.0%hi,
>> >> >> >>> >> 0.0%si,
>> >> >> >>> >> 0.0%st
>> >> >> >>> >> Mem:   1020400k total,   881004k used,   139396k free,
>>  162632k
>> >> >> >>> >> buffers
>> >> >> >>> >> Swap:  1835000k total,    14984k used,  1820016k free,
>>  220644k
>> >> >> >>> >> cached
>> >> >> >>> >>
>> >> >> >>> >> http://imgur.com/blULjDa shows the hypervisor panel of
>> Horizon.
>> >> >> >>> >> As
>> >> >> >>> >> you
>> >> >> >>> >> can
>> >> >> >>> >> see, Compute 3 has fewer resources used, but none of the
>> compute
>> >> >> >>> >> nodes
>> >> >> >>> >> should be anywhere near overloaded from what I can tell.
>> >> >> >>> >>
>> >> >> >>> >> Any ideas? Let me know if I’m missing anything obvious that
>> >> >> >>> >> would
>> >> >> >>> >> help
>> >> >> >>> >> with figuring this out!
>> >> >> >>> >>
>> >> >> >>> >> Hannah
>> >> >> >>> >>
>> >> >> >>> >>
>> >> >> >>> >>
>> >> >> >>> >>
>> >> >> >>> >>
>> >> >> >>> >>
>> >> >> >>> >> ***********
>> >> >> >>> >>
>> >> >> >>> >> Radiant Worlds Limited is registered in England (company no:
>> >> >> >>> >> 07822337).
>> >> >> >>> >> This message is intended solely for the addressee and may
>> >> >> >>> >> contain
>> >> >> >>> >> confidential information. If you have received this message
>> in
>> >> >> >>> >> error
>> >> >> >>> >> please
>> >> >> >>> >> send it back to us and immediately and permanently delete it
>> >> >> >>> >> from
>> >> >> >>> >> your
>> >> >> >>> >> system. Do not use, copy or disclose the information
>> contained
>> >> >> >>> >> in
>> >> >> >>> >> this
>> >> >> >>> >> message or in any attachment. Please also note that
>> transmission
>> >> >> >>> >> cannot
>> >> >> >>> >> be
>> >> >> >>> >> guaranteed to be secure or error-free.
>> >> >> >>> >>
>> >> >> >>> >> _______________________________________________
>> >> >> >>> >> Mailing list:
>> >> >> >>> >>
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> >> >> >>> >> Post to     : openstack at lists.openstack.org
>> >> >> >>> >> Unsubscribe :
>> >> >> >>> >>
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> >> >> >>> >>
>> >> >> >>> >>
>> >> >> >>> >>
>> >> >> >>> >> --
>> >> >> >>> >> Sent from my Android device with K-9 Mail. Please excuse my
>> >> >> >>> >> brevity.
>> >> >> >>> >>
>> >> >> >>> >>
>> >> >> >>> >> ***********
>> >> >> >>> >>
>> >> >> >>> >> Radiant Worlds Limited is registered in England (company no:
>> >> >> >>> >> 07822337).
>> >> >> >>> >> This message is intended solely for the addressee and may
>> >> >> >>> >> contain
>> >> >> >>> >> confidential information. If you have received this message
>> in
>> >> >> >>> >> error
>> >> >> >>> >> please
>> >> >> >>> >> send it back to us and immediately and permanently delete it
>> >> >> >>> >> from
>> >> >> >>> >> your
>> >> >> >>> >> system. Do not use, copy or disclose the information
>> contained
>> >> >> >>> >> in
>> >> >> >>> >> this
>> >> >> >>> >> message or in any attachment. Please also note that
>> transmission
>> >> >> >>> >> cannot
>> >> >> >>> >> be
>> >> >> >>> >> guaranteed to be secure or error-free.
>> >> >> >>> >>
>> >> >> >>> >> _______________________________________________
>> >> >> >>> >> Mailing list:
>> >> >> >>> >>
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> >> >> >>> >> Post to    : openstack at lists.openstack.org
>> >> >> >>> >> Unsubscribe :
>> >> >> >>> >>
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> >> >> >>> >>
>> >> >> >>> >>
>> >> >> >>> >>
>> >> >> >>> >> _______________________________________________
>> >> >> >>> >> Mailing list:
>> >> >> >>> >>
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> >> >> >>> >> Post to    : openstack at lists.openstack.org
>> >> >> >>> >> Unsubscribe :
>> >> >> >>> >>
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> >> >> >>> >>
>> >> >> >>> >>
>> >> >> >>> >
>> >> >> >>> >
>> >> >> >>> > _______________________________________________
>> >> >> >>> > Mailing list:
>> >> >> >>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> >> >> >>> > Post to     : openstack at lists.openstack.org
>> >> >> >>> > Unsubscribe :
>> >> >> >>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> >> >> >>> >
>> >> >> >>
>> >> >> >>
>> >> >
>> >> >
>> >
>> >
>> > _______________________________________________
>> > Mailing list:
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> > Post to     : openstack at lists.openstack.org
>> > Unsubscribe :
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> >
>>
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack at lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20141215/67f21d8e/attachment.html>


More information about the Openstack mailing list