[Openstack-operators] [oslo] RabbitMQ queue TTL issues moving to Liberty

Fox, Kevin M Kevin.Fox at pnnl.gov
Thu Jul 28 12:49:39 UTC 2016


yeah, they work well. but thats not what I'm trying to get at. My point is, the patch submitted works only if plumbed up by the server process to stop properly on exit. Are we sure every rpc listening service is doing that today? If not, how do we find and fix them?

Thanks,
Kevin
________________________________________
From: Davanum Srinivas [davanum at gmail.com]
Sent: Thursday, July 28, 2016 5:31 AM
To: Dmitry Mescheryakov
Cc: Fox, Kevin M; OpenStack Operators
Subject: Re: [Openstack-operators] [oslo] RabbitMQ queue TTL issues moving to Liberty

Dima, Kevin,

There are PreStop hooks that can be used to gracefully bring down
stuff running in containers:
http://kubernetes.io/docs/user-guide/container-environment/

-- Dims

On Thu, Jul 28, 2016 at 8:22 AM, Dmitry Mescheryakov
<dmescheryakov at mirantis.com> wrote:
>
> 2016-07-26 21:20 GMT+03:00 Fox, Kevin M <Kevin.Fox at pnnl.gov>:
>>
>> It only relates to Kubernetes in that Kubernetes can do automatic rolling
>> upgrades by destroying/replacing a service. If the services don't clean up
>> after themselves, then performing a rolling upgrade will break things.
>>
>> So, what do you think is the best approach to ensuring all the services
>> shut things down properly? Seems like its a cross project issue? Should a
>> spec be submitted?
>
>
> I think that it would be fair if Kubernates sends a sigterm to OpenStack
> service in a container, then wait for the service to shut down and only then
> destroy the container.
>
> It might be not very important for our case though, if we agree to split
> expiration time for fanout and reply queues. And I don't know of any other
> case where an OpenStack service needs to clean up on shutdown in some
> external place.
>
> Thanks,
>
> Dmitry
>
>>
>> Thanks,
>> Kevin
>> ________________________________
>> From: Dmitry Mescheryakov [dmescheryakov at mirantis.com]
>> Sent: Tuesday, July 26, 2016 11:01 AM
>> To: Fox, Kevin M
>> Cc: Sam Morrison; OpenStack Operators
>>
>> Subject: Re: [Openstack-operators] [oslo] RabbitMQ queue TTL issues moving
>> to Liberty
>>
>>
>>
>> 2016-07-25 18:47 GMT+03:00 Fox, Kevin M <Kevin.Fox at pnnl.gov>:
>>>
>>> Ah. Interesting.
>>>
>>> The graceful shutdown would really help the Kubernetes situation too.
>>> Kubernetes can do easy rolling upgrades and having the processes being able
>>> to clean up after themselves as they are upgraded is important. Is this
>>> something that needs to go into oslo.messaging or does it have to be added
>>> to all projects using it?
>>
>>
>> It both needs to be fixed on oslo.messaging side (delete fanout queue on
>> RPC server stop, which is done by Kirill's CR) and on side of projects using
>> it, as they need to actually stop RPC server before shutting down. As I
>> wrote earlier, among Neutron processes right now only openvswitch and
>> metadata agents do not stop RPC server.
>>
>> I am not sure how that relates to Kubernates, as I not much familiar with
>> it.
>>
>> Thanks,
>>
>> Dmitry
>>
>>>
>>>
>>> Thanks,
>>> Kevin
>>> ________________________________
>>> From: Dmitry Mescheryakov [dmescheryakov at mirantis.com]
>>> Sent: Monday, July 25, 2016 3:47 AM
>>> To: Sam Morrison
>>> Cc: OpenStack Operators
>>> Subject: Re: [Openstack-operators] [oslo] RabbitMQ queue TTL issues
>>> moving to Liberty
>>>
>>> Sam,
>>>
>>> For your case I would suggest to lower rabbit_transient_queues_ttl until
>>> you are comfortable with volume of messages which comes during that time.
>>> Setting the parameter to 1 will essentially replicate bahaviour of
>>> auto_delete queues. But I would suggest not to set it that low, as otherwise
>>> your OpenStack will suffer from the original bug. Probably a value like 20
>>> seconds should work in most cases.
>>>
>>> I think that there is a space for improvement here - we can delete reply
>>> and fanout queues on graceful shutdown. But I am not sure if it will be easy
>>> to implement, as it requires services (Nova, Neutron, etc.) to stop RPC
>>> server on sigint and I don't know if they do it right now.
>>>
>>> I don't think we can make case with sigkill any better. Other than that,
>>> the issue could be investigated on Neutron side, maybe number of messages
>>> could be reduced there.
>>>
>>> Thanks,
>>>
>>> Dmitry
>>>
>>> 2016-07-25 9:27 GMT+03:00 Sam Morrison <sorrison at gmail.com>:
>>>>
>>>> We recently upgraded to Liberty and have come across some issues with
>>>> queue build ups.
>>>>
>>>> This is due to changes in rabbit to set queue expiries as opposed to
>>>> queue auto delete.
>>>> See https://bugs.launchpad.net/oslo.messaging/+bug/1515278 for more
>>>> information.
>>>>
>>>> The fix for this bug is in liberty and it does fix an issue however it
>>>> causes another one.
>>>>
>>>> Every time you restart something that has a fanout queue. Eg.
>>>> cinder-scheduler or the neutron agents you will have
>>>> a queue in rabbit that is still bound to the rabbitmq exchange (and so
>>>> still getting messages in) but no consumers.
>>>>
>>>> These messages in these queues are basically rubbish and don’t need to
>>>> exist. Rabbit will delete these queues after 10 mins (although the default
>>>> in master is now changed to 30 mins)
>>>>
>>>> During this time the queue will grow and grow with messages. This sets
>>>> off our nagios alerts and our ops guys have to deal with something that
>>>> isn’t really an issue. They basically delete the queue.
>>>>
>>>> A bad scenario is when you make a change to your cloud that means all
>>>> your 1000 neutron agents are restarted, this causes a couple of dead queues
>>>> per agent to hang around. (port updates and security group updates) We get
>>>> around 25 messages / second on these queues and so you can see after 10
>>>> minutes we have a ton of messages in these queues.
>>>>
>>>> 1000 x 2 x 25 x 600 = 30,000,000 messages in 10 minutes to be precise.
>>>>
>>>> Has anyone else been suffering with this before a raise a bug?
>>>>
>>>> Cheers,
>>>> Sam
>>>>
>>>>
>>>> _______________________________________________
>>>> OpenStack-operators mailing list
>>>> OpenStack-operators at lists.openstack.org
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>>
>>>
>>
>
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>



--
Davanum Srinivas :: https://twitter.com/dims



More information about the OpenStack-operators mailing list