Re: [neutron][OpenStack-ansible] Performance issues with trunk ports

4 Apr 2023

      Yes, totally, that would be great to sort out why using uWSGI make
such difference in performance. I was just trying to provide you with
a fix solution in the meantime, if its needed and affecting your
production deployment.

Also, if neutron_use_uwsgi is set to false, neutron-rpc-server should
be stopped as this service is not used for scenario without uWSGI at
all. Though I can imagine having a bug in OpenStack-Ansible that we
leave neutron-rpc-server running when switching from uwsgi to
eventlet, despite it not needed for that scenario.

In addition to that, I'm not sure about the current state, but during
Zed uWSGI known not to work at all with OVN driver, for instance. It
could be fixed now though.

вт, 4 апр. 2023 г. в 16:47, John Bartelme <bartelme@gmail.com>:
...
Dmitriy wrote:
...
Have you tried out of interest to set "neutron_use_uwsgi: false" in your user_variables.yml
Thank you for that suggestion.  I did think about changing that
option but reading up on some of the change logs it looked like
everything is trying to be migrated over to uWSGI.  When I set that
option things are indeed much better.  The update_subport_bindings RPC
call is still not being handled by the RPC worker threads but the
neutron-server parent thread is able to handle the calls and much more
quickly than the uWSGI threads were, i.e. in that 1-2 second
timeframe.   What are the ramifications of not using uWSGI?  Is this
an ok configuration for a production deployment?  Are there any
thoughts as to why the uWSGI threads are having such performance
issues?  Thanks so much for all of the help.
I’ll continue to write up a bug for RPC threads not handling
update_subport_bindings calls and for uWSGI handling them which may be
unexpected.
Thanks, john
On 4/4/23, Lajos Katona <katonalala@gmail.com> wrote:
...
Hi,
Perfect, please do that.
Lajos
John Bartelme <bartelme@gmail.com> ezt írta (időpont: 2023. ápr. 4., K,
15:12):
...
When you say trunk issue do you mean about the RPC calls going to
uWSGI threads or this general issue with long times.  For the long
times I'm not sure I have enough detail to write a bug but I could for
the RPC calls.
Also I'm using LinuxBridge on the backend.
Thanks, john
...
Hi,
could you open a bug report on https://bugs.launchpad.net/neutron/ for
On 4/4/23, Lajos Katona <katonalala@gmail.com> wrote:
the
...
trunk issue with reproduction steps?
It is also important which backend you use? OVS or something else?
Thanks in advance
Lajos Katona (lajoskatona)
John Bartelme <bartelme@gmail.com> ezt írta (időpont: 2023. ápr. 4., K,
14:15):
...
Hello,
I'm currently experiencing some pretty severe performance issues with
my
openstack-ansible deployed cluster(yoga) while deploying trunk ports
and
I'm looking for some help determining what might be the cause of this
poor
performance.
In my simplest case I'm deploying 2 servers each with one trunk port
each.
The first trunk has 2 subports and the second 6 subports. Both servers
also
have 3 other regular ports. When deploying the first trunk port its
subports are often provisioned quickly and the second trunk port takes
anywhere from 30 seconds to 18 minutes. This happens even when I
isolate
neutron-server to a single physical machine with 44(88 threads) and
256GB
ram. Further diagnosis has shown me some things i didn't quite
understand.
My deployment with OpenStack-ansible deploys neutron-server with 16
uWSGI
processes and neutron-rpc-server with 16 rpc workers. However the way
that
the trunk RPC server is implemented it is only run on the parent RPC
thread
and instead runs in all of the uWSGI processes as well. This means
that
most of my trunk RPC calls are being handled by the uWSGI instead of
the
RPC workers. When the parent RPC thread handles the trunk port
creation
calls I constantly see creation times of 1-1.5 seconds. I've isolated
it
so
that this thread does all of the trunk RPC calls and this works quite
well
but this doesn't seem ideal. What could be causing such poor
performance
in
the uWSGI side of the house? I'm having a really hard time getting a
good
feeling for what might be slowing it down so much. I'm wondering if it
could be green thread preemption but I really don't know. I've tried
setting 'enable-threads' false for uWSGI but I don't think that is
improving performance. Putting the profiled decorator on
update_subport_bindings shows different places taking longer every
time,
but in general a lot of time(tottime, i.e. not subfunction time) spent
in
webob/dec.py(__call__), paste/urlmap.py(__call__),
webob/request.py(call_application),webob/request.py(send). What else
can
I
do to try and look for why this is taking so long?
As a side question it seems counterintuitive that the uWSGI handles
most
of
the trunk RPC calls and not the RPC workers?
A couple other notes about my environment that could indicate my
challenges:
I had to disable rabbitmq heartbeats for neutron as they kept not
getting
sent reliably and connections were terminated. I tried with
heartbeat_in_pthread both true and false but still had issues. It
looks
like nova also sometimes experiences this but not near as often.
I was overzealous with my vxlan ranges in my first configuration and
gave
it a range of 10,000,000 not realizing that would create that many
rows
in
the database. Looking into that I saw that pymysql in my cluster takes
3.5
minutes to retrieve those rows. mysql CLI only takes 4 seconds.
Perhaps
that is just the overhead of pymysql? I've greatly scaled down the
vxlan
range now.
I'm provisioning the 2 servers with a heat template that contains
around
200 custom resources. For 198 of the resources they are set to
conditionally not create any OpenStack native resources. Deploying
this
template of mostly no-op resources still takes about 3 minutes.
Horizon works but almost every page load take a few seconds to load.
I'm
not sure if that is normal or not.
Thanks for any help anyone can provide.
john

Re: [neutron][OpenStack-ansible] Performance issues with trunk ports

Dmitriy Rabotyagov