[Openstack-operators] [Large deployments] Neutron issues in Openstack Large deployment using DVR

Satyanarayana Patibandla satya.patibandla at gmail.com
Mon Feb 27 19:39:51 UTC 2017


Hi Kevin,

After increasing the parameter values mentioned in the below mail, we are
able to create few hundreds of VMs properly. There were no errors related
to neutron. Our environment contain multiple regions. One of our team
member by mistake ran all openstack service tempest tests against the site.
After running the tempest tests, again we observed the "504 gateway
timeout" error. This time even after restarting all neutron agents related
containers the neutron CLI was not responsive. We are getting the same
gateway timeout error even after restarting all the neutron agent
containers.

We did SHOW PROCESSLIST in MySQL. we can see a lock on the agent table
query.

In the logs we can see below error.

2017-02-27 14:50:29.085 38 ERROR oslo_messaging.rpc.server DBDeadlock:
(pymysql.err.InternalError) (1205, u'Lock wait timeout exceeded; try
restarting transaction') [SQL: u'UPDATE agents SET
heartbeat_timestamp=%(heartbeat_timestamp)s WHERE agents.id =
%(agents_id)s'] [parameters: {'heartbeat_timestamp':
datetime.datetime(2017, 2, 27, 14, 46, 35, 229400), 'agents_id':
u'94535d12-4b04-42c2-8a74-f2358db41634'}]

We are using stable/ocata code in our enviornment. We had to reimage and
redeploy all the nodes to continue our testing. Could you please let us
know your thoughts on the above issue.

Thanks,
Satya.P

On Mon, Feb 27, 2017 at 12:32 PM, Satyanarayana Patibandla <
satya.patibandla at gmail.com> wrote:

> Hi,
>
> We increased api_workers,rpc_workers and metadata_workers based on the
> number of cores we are running on controller node ( the workers are half of
> the number of cores. i.e if we have 24 cores then we are running 12 workers
> for each). Increased rpc_connect_timeout to 180 and rpc_response_timeout
> to 600. As of now it seems these are fine.
>
> Let me know if you have any comments or suggestions about increasing those
> parameter values.
>
> Thanks,
> Satya.P
>
> On Mon, Feb 27, 2017 at 11:16 AM, Kevin Benton <kevin at benton.pub> wrote:
>
>> Thanks for following up. Would you mind sharing the parameters you had to
>> tune (db pool limits, etc) just in case someone comes across this same
>> thread in a google search?
>>
>> Thanks,
>> Kevin Benton
>>
>> On Sun, Feb 26, 2017 at 8:48 PM, Satyanarayana Patibandla <
>> satya.patibandla at gmail.com> wrote:
>>
>>> Hi Saverio,
>>>
>>> The issue seems to be related to neutron tuning. We observed the same
>>> issue with stable/ocata branch code. When we tuned few neutron parameters
>>> it is working fine.
>>> Thanks for your suggestion.
>>>
>>> Thanks,
>>> Satya.P
>>>
>>> On Wed, Feb 22, 2017 at 10:10 AM, Satyanarayana Patibandla <
>>> satya.patibandla at gmail.com> wrote:
>>>
>>>> Hi Saverio,
>>>>
>>>> Thanks for your inputs. Will test with statable/ocata branch code and
>>>> will share the result.
>>>>
>>>> Thanks,
>>>> Satya.P
>>>>
>>>> On Wed, Feb 22, 2017 at 1:54 AM, Saverio Proto <zioproto at gmail.com>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I would use at least the stable/ocata branch. If you just use master
>>>>> that is not supposed to be stable, and also I am not sure if you can
>>>>> fill a bug against a specific commit in master.
>>>>>
>>>>> Saverio
>>>>>
>>>>> 2017-02-21 21:12 GMT+01:00 Satyanarayana Patibandla
>>>>> <satya.patibandla at gmail.com>:
>>>>> > Hi Saverio,
>>>>> >
>>>>> > We have tried to create 20 VMs each time using heat template. There
>>>>> is 1 sec
>>>>> > time gap between each VM creation request. When we reached 114 VMs
>>>>> we got
>>>>> > the error mentioned in the below mail.Heat template will boot
>>>>> instance from
>>>>> > volume and it assigns floating IP to the instance.
>>>>> >
>>>>> > Except neutron-server container we restarted all the neutron agent
>>>>> > containers which are present on all network and compute nodes. We
>>>>> are using
>>>>> > kolla to deploy openstack services.
>>>>> >
>>>>> > We are using 1 month old master branch openstack code to deploy our
>>>>> > services.
>>>>> >
>>>>> > Please find the error logs in the below link.
>>>>> > http://paste.openstack.org/show/599892/
>>>>> >
>>>>> > Thanks,
>>>>> > Satya.P
>>>>> >
>>>>> > On Wed, Feb 22, 2017 at 12:21 AM, Saverio Proto <zioproto at gmail.com>
>>>>> wrote:
>>>>> >>
>>>>> >> Hello Satya,
>>>>> >>
>>>>> >> I would fill a bug on launchpad for this issue.
>>>>> >> 114 VMs is not much. Can you identify how to trigger the issue to
>>>>> >> reproduce it ? or it just happens randomly ?
>>>>> >>
>>>>> >> When you say rebooting the network node, do you mean the server
>>>>> >> running the neutron-server process ?
>>>>> >>
>>>>> >> what version and distribution of openstack are you using ?
>>>>> >>
>>>>> >> thank you
>>>>> >>
>>>>> >> Saverio
>>>>> >>
>>>>> >>
>>>>> >> 2017-02-21 13:54 GMT+01:00 Satyanarayana Patibandla
>>>>> >> <satya.patibandla at gmail.com>:
>>>>> >> > Hi All,
>>>>> >> >
>>>>> >> > We are trying to deploy Openstack in our production environment.
>>>>> For
>>>>> >> > networking we are using DVR with out L3 HA. We are able to create
>>>>> 114
>>>>> >> > VMs
>>>>> >> > with out any issue. After creating 114 VMs we are getting the
>>>>> below
>>>>> >> > error.
>>>>> >> >
>>>>> >> > Error: <html><body><h1>504 Gateway Time-out</h1> The server didn't
>>>>> >> > respond
>>>>> >> > in time. </body></html>
>>>>> >> >
>>>>> >> > Neutron services are getting freezed up due to a persistent lock
>>>>> on the
>>>>> >> > agents table. it seems one of the network node is holding the
>>>>> lock on
>>>>> >> > the
>>>>> >> > table. After rebooting the network node, the Neutron CLI was
>>>>> responsive
>>>>> >> > again.
>>>>> >> >
>>>>> >> > Neutron agent and neutron server is throwing below errors.
>>>>> >> >
>>>>> >> > Neutron-server errors:
>>>>> >> > ERROR oslo_db.sqlalchemy.exc_filters     "Can't reconnect until
>>>>> invalid
>>>>> >> > "
>>>>> >> > ERROR oslo_db.sqlalchemy.exc_filters InvalidRequestError: Can't
>>>>> >> > reconnect
>>>>> >> > until invalid transaction is rolled back
>>>>> >> > ERROR neutron.api.v2.resource [req-24fa6eaa-a9e0-4f55-97e0-5
>>>>> 9db203e72c6
>>>>> >> > 3eb776587c9c40569731ebe5c3557bc7 f43e8699cd5a46e89ffe39e3cac75341
>>>>> - - -]
>>>>> >> > index failed: No details.
>>>>> >> > ERROR neutron.api.v2.resource DBError: Can't reconnect until
>>>>> invalid
>>>>> >> > transaction is rolled back
>>>>> >> >
>>>>> >> >
>>>>> >> > Neutron agents errors:
>>>>> >> > MessagingTimeout: Timed out waiting for a reply to message ID
>>>>> >> > 40638b6bf12c44cd9a404ecaa14a9909
>>>>> >> >
>>>>> >> > Could you please provide us your valuable inputs or suggestions
>>>>> for
>>>>> >> > above
>>>>> >> > errors.
>>>>> >> >
>>>>> >> > Thanks,
>>>>> >> > Satya.P
>>>>> >> >
>>>>> >> > _______________________________________________
>>>>> >> > OpenStack-operators mailing list
>>>>> >> > OpenStack-operators at lists.openstack.org
>>>>> >> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstac
>>>>> k-operators
>>>>> >> >
>>>>> >
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> OpenStack-operators mailing list
>>> OpenStack-operators at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20170228/9348984c/attachment.html>


More information about the OpenStack-operators mailing list