[Openstack-operators] [Large deployments] Neutron issues in Openstack Large deployment using DVR

Satyanarayana Patibandla satya.patibandla at gmail.com
Mon Feb 27 07:02:54 UTC 2017


Hi,

We increased api_workers,rpc_workers and metadata_workers based on the
number of cores we are running on controller node ( the workers are half of
the number of cores. i.e if we have 24 cores then we are running 12 workers
for each). Increased rpc_connect_timeout to 180 and rpc_response_timeout to
600. As of now it seems these are fine.

Let me know if you have any comments or suggestions about increasing those
parameter values.

Thanks,
Satya.P

On Mon, Feb 27, 2017 at 11:16 AM, Kevin Benton <kevin at benton.pub> wrote:

> Thanks for following up. Would you mind sharing the parameters you had to
> tune (db pool limits, etc) just in case someone comes across this same
> thread in a google search?
>
> Thanks,
> Kevin Benton
>
> On Sun, Feb 26, 2017 at 8:48 PM, Satyanarayana Patibandla <
> satya.patibandla at gmail.com> wrote:
>
>> Hi Saverio,
>>
>> The issue seems to be related to neutron tuning. We observed the same
>> issue with stable/ocata branch code. When we tuned few neutron parameters
>> it is working fine.
>> Thanks for your suggestion.
>>
>> Thanks,
>> Satya.P
>>
>> On Wed, Feb 22, 2017 at 10:10 AM, Satyanarayana Patibandla <
>> satya.patibandla at gmail.com> wrote:
>>
>>> Hi Saverio,
>>>
>>> Thanks for your inputs. Will test with statable/ocata branch code and
>>> will share the result.
>>>
>>> Thanks,
>>> Satya.P
>>>
>>> On Wed, Feb 22, 2017 at 1:54 AM, Saverio Proto <zioproto at gmail.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> I would use at least the stable/ocata branch. If you just use master
>>>> that is not supposed to be stable, and also I am not sure if you can
>>>> fill a bug against a specific commit in master.
>>>>
>>>> Saverio
>>>>
>>>> 2017-02-21 21:12 GMT+01:00 Satyanarayana Patibandla
>>>> <satya.patibandla at gmail.com>:
>>>> > Hi Saverio,
>>>> >
>>>> > We have tried to create 20 VMs each time using heat template. There
>>>> is 1 sec
>>>> > time gap between each VM creation request. When we reached 114 VMs we
>>>> got
>>>> > the error mentioned in the below mail.Heat template will boot
>>>> instance from
>>>> > volume and it assigns floating IP to the instance.
>>>> >
>>>> > Except neutron-server container we restarted all the neutron agent
>>>> > containers which are present on all network and compute nodes. We are
>>>> using
>>>> > kolla to deploy openstack services.
>>>> >
>>>> > We are using 1 month old master branch openstack code to deploy our
>>>> > services.
>>>> >
>>>> > Please find the error logs in the below link.
>>>> > http://paste.openstack.org/show/599892/
>>>> >
>>>> > Thanks,
>>>> > Satya.P
>>>> >
>>>> > On Wed, Feb 22, 2017 at 12:21 AM, Saverio Proto <zioproto at gmail.com>
>>>> wrote:
>>>> >>
>>>> >> Hello Satya,
>>>> >>
>>>> >> I would fill a bug on launchpad for this issue.
>>>> >> 114 VMs is not much. Can you identify how to trigger the issue to
>>>> >> reproduce it ? or it just happens randomly ?
>>>> >>
>>>> >> When you say rebooting the network node, do you mean the server
>>>> >> running the neutron-server process ?
>>>> >>
>>>> >> what version and distribution of openstack are you using ?
>>>> >>
>>>> >> thank you
>>>> >>
>>>> >> Saverio
>>>> >>
>>>> >>
>>>> >> 2017-02-21 13:54 GMT+01:00 Satyanarayana Patibandla
>>>> >> <satya.patibandla at gmail.com>:
>>>> >> > Hi All,
>>>> >> >
>>>> >> > We are trying to deploy Openstack in our production environment.
>>>> For
>>>> >> > networking we are using DVR with out L3 HA. We are able to create
>>>> 114
>>>> >> > VMs
>>>> >> > with out any issue. After creating 114 VMs we are getting the below
>>>> >> > error.
>>>> >> >
>>>> >> > Error: <html><body><h1>504 Gateway Time-out</h1> The server didn't
>>>> >> > respond
>>>> >> > in time. </body></html>
>>>> >> >
>>>> >> > Neutron services are getting freezed up due to a persistent lock
>>>> on the
>>>> >> > agents table. it seems one of the network node is holding the lock
>>>> on
>>>> >> > the
>>>> >> > table. After rebooting the network node, the Neutron CLI was
>>>> responsive
>>>> >> > again.
>>>> >> >
>>>> >> > Neutron agent and neutron server is throwing below errors.
>>>> >> >
>>>> >> > Neutron-server errors:
>>>> >> > ERROR oslo_db.sqlalchemy.exc_filters     "Can't reconnect until
>>>> invalid
>>>> >> > "
>>>> >> > ERROR oslo_db.sqlalchemy.exc_filters InvalidRequestError: Can't
>>>> >> > reconnect
>>>> >> > until invalid transaction is rolled back
>>>> >> > ERROR neutron.api.v2.resource [req-24fa6eaa-a9e0-4f55-97e0-5
>>>> 9db203e72c6
>>>> >> > 3eb776587c9c40569731ebe5c3557bc7 f43e8699cd5a46e89ffe39e3cac75341
>>>> - - -]
>>>> >> > index failed: No details.
>>>> >> > ERROR neutron.api.v2.resource DBError: Can't reconnect until
>>>> invalid
>>>> >> > transaction is rolled back
>>>> >> >
>>>> >> >
>>>> >> > Neutron agents errors:
>>>> >> > MessagingTimeout: Timed out waiting for a reply to message ID
>>>> >> > 40638b6bf12c44cd9a404ecaa14a9909
>>>> >> >
>>>> >> > Could you please provide us your valuable inputs or suggestions for
>>>> >> > above
>>>> >> > errors.
>>>> >> >
>>>> >> > Thanks,
>>>> >> > Satya.P
>>>> >> >
>>>> >> > _______________________________________________
>>>> >> > OpenStack-operators mailing list
>>>> >> > OpenStack-operators at lists.openstack.org
>>>> >> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstac
>>>> k-operators
>>>> >> >
>>>> >
>>>> >
>>>>
>>>
>>>
>>
>> _______________________________________________
>> OpenStack-operators mailing list
>> OpenStack-operators at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20170227/62225cd1/attachment.html>


More information about the OpenStack-operators mailing list