[Openstack] nova-api-os-compute slowdown

Robert Varjasi robert at componentsoft.io
Wed Oct 17 15:24:57 UTC 2018


Hi,

Finally, the issue solved. It is caused by the low number of connections
set to memcached servers: 1024. I noticed too many files open and too
many sockets errors in my logs. Changing the memcached max allowed
connections to 2048 or higher solved my problem.

You can check your memcached server: echo stats|nc localhost 11211|grep
listen_disabled . If you have listen_disabled > 1 you need to increase
the max connections for you memcached server.

For the OSA you need to set this parameter: |memcached_connections: 4096|

Regards,
Robert Varjasi
consultant at Component Soft Ltd.
Tel: +36/30-259-9221

On 10/12/2018 04:35 PM, Robert Varjasi wrote:
> Hi,
>
> I found that my controller nodes were a bit overloaded with 16 uwsgi
> nova-api-os compute processes. I reduced the nova-api-os uwsgi processes
> to 10 and timeout and slowdowns were eliminated. My cloud went stable
> and the response times went lower. I have 20 vcpus on a Xeon(R) CPU
> E5-2630 v4 @ 2.20GHz.
>
> For the openstack-ansible I need to change this variable from 16 to 10:
> nova_wsgi_processes_max: 10. Seems I need to set it to an equal number
> of my cpu cores.
>
> Regards,
> Robert Varjasi
> consultant at Component Soft Ltd.
> Tel: +36/30-259-9221
>
> On 10/08/2018 06:33 PM, Robert Varjasi wrote:
>> Hi,
>>
>> After a few tempest run I noticed slowdowns in the nova-api-os-compute
>> uwsgi  processes. I check the processes with py-spy and found that a lot
>> of process blocked on read(). Here is my py-spy output from one of my
>> nova-api-os-compute uwsgi process: http://paste.openstack.org/show/731677/
>>
>> And the stack trace:
>>
>> thread_id = Thread-2 filename = /usr/lib/python2.7/threading.py lineno =
>> 774 function = __bootstrap line = self.__bootstrap_inner()
>> thread_id = Thread-2 filename = /usr/lib/python2.7/threading.py lineno =
>> 801 function = __bootstrap_inner line = self.run()
>> thread_id = Thread-2 filename = /usr/lib/python2.7/threading.py lineno =
>> 754 function = run line = self.__target(*self.__args, **self.__kwargs)
>> thread_id = Thread-2 filename =
>> /openstack/venvs/nova-17.0.4/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py
>> lineno = 382 function = poll line =
>> self.conn.consume(timeout=current_timeout)
>> thread_id = Thread-2 filename =
>> /openstack/venvs/nova-17.0.4/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py
>> lineno = 1083 function = consume line = error_callback=_error_callback)
>> thread_id = Thread-2 filename =
>> /openstack/venvs/nova-17.0.4/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py
>> lineno = 807 function = ensure line = ret, channel = autoretry_method()
>> thread_id = Thread-2 filename =
>> /openstack/venvs/nova-17.0.4/lib/python2.7/site-packages/kombu/connection.py
>> lineno = 494 function = _ensured line = return fun(*args, **kwargs)
>> thread_id = Thread-2 filename =
>> /openstack/venvs/nova-17.0.4/lib/python2.7/site-packages/kombu/connection.py
>> lineno = 570 function = __call__ line = return fun(*args,
>> channel=channels[0], **kwargs), channels[0]
>> thread_id = Thread-2 filename =
>> /openstack/venvs/nova-17.0.4/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py
>> lineno = 796 function = execute_method line = method()
>> thread_id = Thread-2 filename =
>> /openstack/venvs/nova-17.0.4/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py
>> lineno = 1068 function = _consume line =
>> self.connection.drain_events(timeout=poll_timeout)
>> thread_id = Thread-2 filename =
>> /openstack/venvs/nova-17.0.4/lib/python2.7/site-packages/kombu/connection.py
>> lineno = 301 function = drain_events line = return
>> self.transport.drain_events(self.connection, **kwargs)
>> thread_id = Thread-2 filename =
>> /openstack/venvs/nova-17.0.4/lib/python2.7/site-packages/kombu/transport/pyamqp.py
>> lineno = 103 function = drain_events line = return
>> connection.drain_events(**kwargs)
>> thread_id = Thread-2 filename =
>> /openstack/venvs/nova-17.0.4/lib/python2.7/site-packages/amqp/connection.py
>> lineno = 471 function = drain_events line = while not
>> self.blocking_read(timeout):
>> thread_id = Thread-2 filename =
>> /openstack/venvs/nova-17.0.4/lib/python2.7/site-packages/amqp/connection.py
>> lineno = 476 function = blocking_read line = frame =
>> self.transport.read_frame()
>> thread_id = Thread-2 filename =
>> /openstack/venvs/nova-17.0.4/lib/python2.7/site-packages/amqp/transport.py
>> lineno = 226 function = read_frame line = frame_header = read(7, True)
>> thread_id = Thread-2 filename =
>> /openstack/venvs/nova-17.0.4/lib/python2.7/site-packages/amqp/transport.py
>> lineno = 346 function = _read line = s = recv(n - len(rbuf))  # see note
>> above
>> thread_id = Thread-2 filename = /usr/lib/python2.7/ssl.py lineno = 643
>> function = read line = v = self._sslobj.read(len)
>>
>> I am using nova 17.0.4.dev1, amqp (2.2.2), oslo.messaging (5.35.0),
>> kombu (4.1.0). I have 3 controller nodes. The openstack deployed by OSA
>> 17.0.4.
>>
>> I can reproduce the read() block if I click on "Log" in Horizon to see
>> the console outputs from one of my VM or run a tempest test:
>> tempest.api.compute.admin.test_hypervisor.HypervisorAdminTestJSON.test_get_hypervisor_uptime.
>>
>> The nova-api response time increasing when more and more nova-api
>> processes get blocked at this read. Is it a normal behavior?
>>
>> --- 
>> Regards,
>> Robert Varjasi
>> consultant at Component Soft Ltd.
>>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20181017/3e66fe4c/attachment.html>


More information about the Openstack mailing list