[Openstack-operators] A couple of recent bugs that hit us in regions with cells and moderate (to heavy) build/delete activity
Matt Van Winkle
mvanwink at rackspace.com
Fri Feb 13 14:37:18 UTC 2015
Indeed - thanks, Michael. The last time we looked the Oslo bug was in
master, but hadn't been packaged for other projects. That was a few days
ago, though. Anyway, if anyone else finds that their cells services
suddenly start acting like they have nothing to do, I'd recommend looking
at these patches.
On 2/12/15 2:53 PM, "Michael Still" <mikal at stillhq.com> wrote:
>I just want to note that both of those fixes look to be approved now.
>On Fri, Feb 13, 2015 at 6:14 AM, Matt Van Winkle <mvanwink at rackspace.com>
>> Hey folks,
>> Apologies if any of this has been discussed on the list already. I've
>> to check everything ahead of time.
>> We recently had two bugs combine to hit us in some of our regions as we
>> rolled out some new code. The result of them was rabbit servers not
>> connections and/or crashing with OOM errors. I wanted to pass them
>> as I know from the Large Deployments Team, there are more and more folks
>> using cells to manage larger regions. Here are the specific bugs:
>> Cells doesn't properly track RabbitMQ connection pools:
>> Oslo messaging bgt in version 1.5.1 that leaks channels :
>> Upstream bug: https://bugs.launchpad.net/oslo.messaging/+bug/1406629
>> Upstream fix:
>> We are deploying patches for both in our problem areas now and the rest
>> the fleet in the immediate future, but this gave us quite a run for our
>> money last week. I wanted to share in case anyone else is chasing these
>> issues and/or might after an upcoming code update.
>> OpenStack-operators mailing list
>> OpenStack-operators at lists.openstack.org
More information about the OpenStack-operators