<div dir="ltr">Hi,<div>Perfect, please do that.</div><div><br></div><div>Lajos</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">John Bartelme <<a href="mailto:bartelme@gmail.com">bartelme@gmail.com</a>> ezt írta (időpont: 2023. ápr. 4., K, 15:12):<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">When you say trunk issue do you mean about the RPC calls going to<br>

uWSGI threads or this general issue with long times.  For the long<br>

times I'm not sure I have enough detail to write a bug but I could for<br>

the RPC calls.<br>

<br>

Also I'm using LinuxBridge on the backend.<br>

<br>

Thanks, john<br>

<br>

On 4/4/23, Lajos Katona <<a href="mailto:katonalala@gmail.com" target="_blank">katonalala@gmail.com</a>> wrote:<br>

> Hi,<br>

> could you open a bug report on <a href="https://bugs.launchpad.net/neutron/" rel="noreferrer" target="_blank">https://bugs.launchpad.net/neutron/</a> for the<br>

> trunk issue with reproduction steps?<br>

> It is also important which backend you use? OVS or something else?<br>

><br>

> Thanks in advance<br>

> Lajos Katona (lajoskatona)<br>

><br>

> John Bartelme <<a href="mailto:bartelme@gmail.com" target="_blank">bartelme@gmail.com</a>> ezt írta (időpont: 2023. ápr. 4., K,<br>

> 14:15):<br>

><br>

>> Hello,<br>

>><br>

>> I'm currently experiencing some pretty severe performance issues with my<br>

>> openstack-ansible deployed cluster(yoga) while deploying trunk ports and<br>

>> I'm looking for some help determining what might be the cause of this<br>

>> poor<br>

>> performance.<br>

>><br>

>> In my simplest case I'm deploying 2 servers each with one trunk port<br>

>> each.<br>

>> The first trunk has 2 subports and the second 6 subports. Both servers<br>

>> also<br>

>> have 3 other regular ports. When deploying the first trunk port its<br>

>> subports are often provisioned quickly and the second trunk port takes<br>

>> anywhere from 30 seconds to 18 minutes. This happens even when I isolate<br>

>> neutron-server to a single physical machine with 44(88 threads) and 256GB<br>

>> ram. Further diagnosis has shown me some things i didn't quite<br>

>> understand.<br>

>> My deployment with OpenStack-ansible deploys neutron-server with 16 uWSGI<br>

>> processes and neutron-rpc-server with 16 rpc workers. However the way<br>

>> that<br>

>> the trunk RPC server is implemented it is only run on the parent RPC<br>

>> thread<br>

>> and instead runs in all of the uWSGI processes as well. This means that<br>

>> most of my trunk RPC calls are being handled by the uWSGI instead of the<br>

>> RPC workers. When the parent RPC thread handles the trunk port creation<br>

>> calls I constantly see creation times of 1-1.5 seconds. I've isolated it<br>

>> so<br>

>> that this thread does all of the trunk RPC calls and this works quite<br>

>> well<br>

>> but this doesn't seem ideal. What could be causing such poor performance<br>

>> in<br>

>> the uWSGI side of the house? I'm having a really hard time getting a good<br>

>> feeling for what might be slowing it down so much. I'm wondering if it<br>

>> could be green thread preemption but I really don't know. I've tried<br>

>> setting 'enable-threads' false for uWSGI but I don't think that is<br>

>> improving performance. Putting the profiled decorator on<br>

>> update_subport_bindings shows different places taking longer every time,<br>

>> but in general a lot of time(tottime, i.e. not subfunction time) spent in<br>

>> webob/dec.py(__call__), paste/urlmap.py(__call__),<br>

>> webob/request.py(call_application),webob/request.py(send). What else can<br>

>> I<br>

>> do to try and look for why this is taking so long?<br>

>><br>

>> As a side question it seems counterintuitive that the uWSGI handles most<br>

>> of<br>

>> the trunk RPC calls and not the RPC workers?<br>

>><br>

>> A couple other notes about my environment that could indicate my<br>

>> challenges:<br>

>><br>

>> I had to disable rabbitmq heartbeats for neutron as they kept not getting<br>

>> sent reliably and connections were terminated. I tried with<br>

>> heartbeat_in_pthread both true and false but still had issues. It looks<br>

>> like nova also sometimes experiences this but not near as often.<br>

>><br>

>> I was overzealous with my vxlan ranges in my first configuration and gave<br>

>> it a range of 10,000,000 not realizing that would create that many rows<br>

>> in<br>

>> the database. Looking into that I saw that pymysql in my cluster takes<br>

>> 3.5<br>

>> minutes to retrieve those rows. mysql CLI only takes 4 seconds. Perhaps<br>

>> that is just the overhead of pymysql? I've greatly scaled down the vxlan<br>

>> range now.<br>

>><br>

>> I'm provisioning the 2 servers with a heat template that contains around<br>

>> 200 custom resources. For 198 of the resources they are set to<br>

>> conditionally not create any OpenStack native resources. Deploying this<br>

>> template of mostly no-op resources still takes about 3 minutes.<br>

>><br>

>> Horizon works but almost every page load take a few seconds to load. I'm<br>

>> not sure if that is normal or not.<br>

>><br>

>> Thanks for any help anyone can provide.<br>

>><br>

>> john<br>

>><br>

>><br>

><br>

</blockquote></div>