[Openstack] [Keystone] performance issues after havana upgrade
Adam Young
ayoung at redhat.com
Wed Jan 29 16:09:30 UTC 2014
On 01/29/2014 10:01 AM, Felix Lee wrote:
> Dear all,
> Just some experiences to share on this.
> After I upgraded Grizzy to Havana, I lived with keystone token
> expiration = 14400 plus memcached backend perfectly without patch for
> weeks.
>
> But since last week, it started suffering "Unable to add token user
> list" issue. So, I was then adjusting token lifetime from 3 hours, 2
> hours and to 1 hour, or even less, but none of them could really solve
> this issue for good(keystone could last for ~10 minute at most),
> further more, after couple restarts of keystone and flush memcached,
> the keystone suddenly could not start up properly, and kept
> complaining error like this:
>
> 2014-01-24 13:25:52.081 91813 INFO
> keystone.common.environment.eventlet_server [-] Starting
> /usr/bin/keystone-all
> on 0.0.0.0:35357
> 2014-01-24 13:25:52.081 91813 CRITICAL keystone [-] [Errno 98] Address
> already in use
>
>
> So, I checked system with:
> netstat -nap
> lsof -i :35357
>
>
> But I saw no any process or connection was occupying 35357 socket.
> I never encountered this problem on Linux before, the only way to
> solve this issue(excepting to reboot machine :) ) is to flush cache by
> hand, like this:
>
> sync; echo 3 > /proc/sys/vm/drop_caches
>
>
> I suspect that the 35357 socket file was removed from /proc while
> process was stopped but somehow it was still remaining in memory cache
> by unknown reason, probably it's some undiscovered bug of eventlet
> server I don't know..., anyway, after this incident, I applied the
> patch and use expiration = 3600 for token life time, now, everything
> is working perfectly again. Only I still have no idea why the problem
> was suddenly escalated into such terrible condition, just like
> keystone was suddenly suffering token DDoS attack by Neutron agent and
> other internal Openstack service components with no reason...
>
35357 is considered an Ephemeral socket by Linux (although not by Posix)
and I suspect that it is getting "reclaimed" by one sub system but not
released in another: another way of describing you hypothesis.
I read somewhere that there is a way to tell Linux to reserve port
35357, and not treat it as ephemeral.
>
> Best regards,
> Felix Lee ~
>
>
> On 2014年01月13日 17:25, Morgan Fainberg wrote:
>> Hi Tim,
>>
>> The change is being proposed directly to stabe/havana. We have an
>> alternative implementation for Icehouse as we are refactoring the entire
>> key-value-store system and making memcache a version of that new
>> implementation.
>>
>> Cheers,
>> Morgan
>>
>> On January 12, 2014 at 10:14:49, Tim Bell (tim.bell at cern.ch
>> <mailto://tim.bell@cern.ch>) wrote:
>>
>>> Can we tag this patch for backporting to Havana stable ?
>>>
>>> We're starting work for the CERN upgrade and this looks like a very
>>> useful patch to be part of the standard Havana offering.
>>>
>>> Tim
>>>
>>> > -----Original Message-----
>>> > From: Jonathan Proulx [mailto:jon at jonproulx.com]
>>> > Sent: 12 January 2014 18:32
>>> > To: Morgan Fainberg
>>> > Cc: openstack at lists.openstack.org
>>> > Subject: Re: [Openstack] [Keystone] performance issues after
>>> havana upgrade
>>> >
>>> > puzzling side effect?
>>> >
>>> > I just made a small change to neutron.conf (adjusted a default
>>> quota) and restarted neutron-server, now neutron (but not other
>>> services)
>>> > is
>>> > spweing:
>>> >
>>> > Invalid user token - rejecting request
>>> >
>>> > (quite possibly only from dashboard requests CLI seems to work).
>>> I've tried restarting keystone (in both wsgi and eventlet modes),
>>> > restarting neutron-server w/ reverted config and
>>> flushing/restarting memcached in various combinations.
>>> >
>>> > I don't really see how restarting neutron-server could confuse
>>> token validation...
>>> >
>>> >
>>> > On Sun, Jan 12, 2014 at 10:38 AM, Morgan Fainberg
>>> <morgan at metacloud.com> wrote:
>>> > > Thanks for confirming this! It also validates my new logic
>>> going into
>>> > > icehouse (I might have had some ulterior motives here, or not so
>>> > > ulterior as the case may be). I'll make sure we resolve the test
>>> > > issues (unrelated to the patch) and get it into the Havana tree
>>> so you
>>> > > don't need to maintain it outside of the releases.
>>> > >
>>> > > Cheers,
>>> > > Morgan
>>> > >
>>> > > Sent from my tablet-like-device
>>> > >
>>> > >> On Jan 11, 2014, at 11:01 PM, Jonathan Proulx
>>> <jon at jonproulx.com> wrote:
>>> > >>
>>> > >>> On Sat, Jan 11, 2014 at 10:57 PM, Morgan Fainberg
>>> <m at metacloud.com> wrote:
>>> > >>> Sounds good! Just remember that prior to the fix I posted there,
>>> > >>> for each token in the user's index, it incurred a round-trip to
>>> > >>> memcached to validate the token wasn't expired. This change
>>> makes
>>> > >>> it so that there are significantly less trips from keystone to
>>> memcached.
>>> > >>>
>>> > >>> If this doesn't 100% solve the issue, we should start digging
>>> > >>> further into what is going on, but I am confident this will
>>> (at the
>>> > >>> very least) help a reasonable amount.
>>> > >>
>>> > >> You sir are a miracle worker, my hat is off!
>>> > >>
>>> > >> The responsiveness of everything is better than it's ever been, my
>>> > >> users will think this is the best feature the upgrade.
>>> > >>
>>> > >> For example earlier today I managed to launch 10 VMs in parallel,
>>> > >> eventually, I'd guess on the order of 5-10min. One of my usual
>>> > >> acceptance tests is being able to launch 100 VMs in that time.
>>> Just
>>> > >> now Iaunched 100 in <2min from request until they'd all been
>>> > >> provisioned and were booting. Now there's too many moving
>>> pieces and
>>> > >> too few experimental samples to make any publishable claims,
>>> but your
>>> > >> patch is the only thing that changed.
>>> > >>
>>> > >> Thanks,
>>> > >> -Jon
>>> >
>>> > _______________________________________________
>>> > Mailing list:
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>> > Post to : openstack at lists.openstack.org
>>> > Unsubscribe :
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>
>>
>> _______________________________________________
>> Mailing list:
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> Post to : openstack at lists.openstack.org
>> Unsubscribe :
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>
>
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack at lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
More information about the Openstack
mailing list