John Bartelme <bartelme@gmail.com> ezt írta (időpont: 2023. ápr. 4., K, 15:12):

When you say trunk issue do you mean about the RPC calls going to
uWSGI threads or this general issue with long times. For the long
times I'm not sure I have enough detail to write a bug but I could for
the RPC calls.

Also I'm using LinuxBridge on the backend.

Thanks, john

On 4/4/23, Lajos Katona <katonalala@gmail.com> wrote:
> Hi,
> could you open a bug report on https://bugs.launchpad.net/neutron/ for the
> trunk issue with reproduction steps?
> It is also important which backend you use? OVS or something else?
>
> Thanks in advance
> Lajos Katona (lajoskatona)
>
> John Bartelme <bartelme@gmail.com> ezt írta (időpont: 2023. ápr. 4., K,
> 14:15):
>
>> Hello,
>>
>> I'm currently experiencing some pretty severe performance issues with my
>> openstack-ansible deployed cluster(yoga) while deploying trunk ports and
>> I'm looking for some help determining what might be the cause of this
>> poor
>> performance.
>>
>> In my simplest case I'm deploying 2 servers each with one trunk port
>> each.
>> The first trunk has 2 subports and the second 6 subports. Both servers
>> also
>> have 3 other regular ports. When deploying the first trunk port its
>> subports are often provisioned quickly and the second trunk port takes
>> anywhere from 30 seconds to 18 minutes. This happens even when I isolate
>> neutron-server to a single physical machine with 44(88 threads) and 256GB
>> ram. Further diagnosis has shown me some things i didn't quite
>> understand.
>> My deployment with OpenStack-ansible deploys neutron-server with 16 uWSGI
>> processes and neutron-rpc-server with 16 rpc workers. However the way
>> that
>> the trunk RPC server is implemented it is only run on the parent RPC
>> thread
>> and instead runs in all of the uWSGI processes as well. This means that
>> most of my trunk RPC calls are being handled by the uWSGI instead of the
>> RPC workers. When the parent RPC thread handles the trunk port creation
>> calls I constantly see creation times of 1-1.5 seconds. I've isolated it
>> so
>> that this thread does all of the trunk RPC calls and this works quite
>> well
>> but this doesn't seem ideal. What could be causing such poor performance
>> in
>> the uWSGI side of the house? I'm having a really hard time getting a good
>> feeling for what might be slowing it down so much. I'm wondering if it
>> could be green thread preemption but I really don't know. I've tried
>> setting 'enable-threads' false for uWSGI but I don't think that is
>> improving performance. Putting the profiled decorator on
>> update_subport_bindings shows different places taking longer every time,
>> but in general a lot of time(tottime, i.e. not subfunction time) spent in
>> webob/dec.py(__call__), paste/urlmap.py(__call__),
>> webob/request.py(call_application),webob/request.py(send). What else can
>> I
>> do to try and look for why this is taking so long?
>>
>> As a side question it seems counterintuitive that the uWSGI handles most
>> of
>> the trunk RPC calls and not the RPC workers?
>>
>> A couple other notes about my environment that could indicate my
>> challenges:
>>
>> I had to disable rabbitmq heartbeats for neutron as they kept not getting
>> sent reliably and connections were terminated. I tried with
>> heartbeat_in_pthread both true and false but still had issues. It looks
>> like nova also sometimes experiences this but not near as often.
>>
>> I was overzealous with my vxlan ranges in my first configuration and gave
>> it a range of 10,000,000 not realizing that would create that many rows
>> in
>> the database. Looking into that I saw that pymysql in my cluster takes
>> 3.5
>> minutes to retrieve those rows. mysql CLI only takes 4 seconds. Perhaps
>> that is just the overhead of pymysql? I've greatly scaled down the vxlan
>> range now.
>>
>> I'm provisioning the 2 servers with a heat template that contains around
>> 200 custom resources. For 198 of the resources they are set to
>> conditionally not create any OpenStack native resources. Deploying this
>> template of mostly no-op resources still takes about 3 minutes.
>>
>> Horizon works but almost every page load take a few seconds to load. I'm
>> not sure if that is normal or not.
>>
>> Thanks for any help anyone can provide.
>>
>> john
>>
>>
>