[Openstack] [Keystone] performance issues after havana upgrade

Adam Young ayoung at redhat.com
Wed Jan 29 16:09:30 UTC 2014


On 01/29/2014 10:01 AM, Felix Lee wrote:
> Dear all,
> Just some experiences to share on this.
> After I upgraded Grizzy to Havana, I lived with keystone token 
> expiration = 14400 plus memcached backend perfectly without patch for 
> weeks.
>
> But since last week, it started suffering "Unable to add token user 
> list" issue. So, I was then adjusting token lifetime from 3 hours, 2 
> hours and to 1 hour, or even less, but none of them could really solve 
> this issue for good(keystone could last for ~10 minute at most), 
> further more, after couple restarts of keystone and flush memcached, 
> the keystone suddenly could not start up properly, and kept 
> complaining error like this:
>
> 2014-01-24 13:25:52.081 91813 INFO 
> keystone.common.environment.eventlet_server [-] Starting 
> /usr/bin/keystone-all
>  on 0.0.0.0:35357
> 2014-01-24 13:25:52.081 91813 CRITICAL keystone [-] [Errno 98] Address 
> already in use
>
>
> So, I checked system with:
> netstat -nap
> lsof -i :35357
>
>
> But I saw no any process or connection was occupying 35357 socket.
> I never encountered this problem on Linux before, the only way to 
> solve this issue(excepting to reboot machine :) ) is to flush cache by 
> hand, like this:
>
> sync; echo 3 > /proc/sys/vm/drop_caches
>
>
> I suspect that the 35357 socket file was removed from /proc while 
> process was stopped but somehow it was still remaining in memory cache 
> by unknown reason, probably it's some undiscovered bug of eventlet 
> server I don't know..., anyway, after this incident, I applied the 
> patch and use expiration = 3600 for token life time, now, everything 
> is working perfectly again. Only I still have no idea why the problem 
> was suddenly escalated into such terrible condition, just like 
> keystone was suddenly suffering token DDoS attack by Neutron agent and 
> other internal Openstack service components with no reason...
>

35357 is considered an Ephemeral socket by Linux (although not by Posix) 
and I suspect that it is getting "reclaimed" by one sub system but not 
released in another:  another way of describing you hypothesis.

I read somewhere that there is a way to tell Linux to reserve port 
35357, and not treat it as ephemeral.

>
> Best regards,
> Felix Lee ~
>
>
> On 2014年01月13日 17:25, Morgan Fainberg wrote:
>> Hi Tim,
>>
>> The change is being proposed directly to stabe/havana.  We have an
>> alternative implementation for Icehouse as we are refactoring the entire
>> key-value-store system and making memcache a version of that new
>> implementation.
>>
>> Cheers,
>> Morgan
>>
>> On January 12, 2014 at 10:14:49, Tim Bell (tim.bell at cern.ch
>> <mailto://tim.bell@cern.ch>) wrote:
>>
>>> Can we tag this patch for backporting to Havana stable ?
>>>
>>> We're starting work for the CERN upgrade and this looks like a very
>>> useful patch to be part of the standard Havana offering.
>>>
>>> Tim
>>>
>>> > -----Original Message-----
>>> > From: Jonathan Proulx [mailto:jon at jonproulx.com]
>>> > Sent: 12 January 2014 18:32
>>> > To: Morgan Fainberg
>>> > Cc: openstack at lists.openstack.org
>>> > Subject: Re: [Openstack] [Keystone] performance issues after 
>>> havana upgrade
>>> >
>>> > puzzling side effect?
>>> >
>>> > I just made a small change to neutron.conf (adjusted a default 
>>> quota) and restarted neutron-server, now neutron (but not other 
>>> services)
>>> > is
>>> > spweing:
>>> >
>>> > Invalid user token - rejecting request
>>> >
>>> > (quite possibly only from dashboard requests CLI seems to work).  
>>> I've tried restarting keystone (in both wsgi and eventlet modes),
>>> > restarting neutron-server w/ reverted config and 
>>> flushing/restarting memcached in various combinations.
>>> >
>>> > I don't really see how restarting neutron-server could confuse 
>>> token validation...
>>> >
>>> >
>>> > On Sun, Jan 12, 2014 at 10:38 AM, Morgan Fainberg 
>>> <morgan at metacloud.com> wrote:
>>> > > Thanks for confirming this!  It also validates my new logic 
>>> going into
>>> > > icehouse (I might have had some ulterior motives here, or not so
>>> > > ulterior as the case may be).  I'll make sure we resolve the test
>>> > > issues (unrelated to the patch) and get it into the Havana tree 
>>> so you
>>> > > don't need to maintain it outside of the releases.
>>> > >
>>> > > Cheers,
>>> > > Morgan
>>> > >
>>> > > Sent from my tablet-like-device
>>> > >
>>> > >> On Jan 11, 2014, at 11:01 PM, Jonathan Proulx 
>>> <jon at jonproulx.com> wrote:
>>> > >>
>>> > >>> On Sat, Jan 11, 2014 at 10:57 PM, Morgan Fainberg 
>>> <m at metacloud.com> wrote:
>>> > >>> Sounds good!  Just remember that prior to the fix I posted there,
>>> > >>> for each token in the user's index, it incurred a round-trip to
>>> > >>> memcached to validate the token wasn't expired.  This change 
>>> makes
>>> > >>> it so that there are significantly less trips from keystone to 
>>> memcached.
>>> > >>>
>>> > >>> If this doesn't 100% solve the issue, we should start digging
>>> > >>> further into what is going on, but I am confident this will 
>>> (at the
>>> > >>> very least) help a reasonable amount.
>>> > >>
>>> > >> You sir are a miracle worker, my hat is off!
>>> > >>
>>> > >> The responsiveness of everything is better than it's ever been, my
>>> > >> users will think this is the best feature the upgrade.
>>> > >>
>>> > >> For example earlier today I managed to launch 10 VMs in parallel,
>>> > >> eventually, I'd guess on the order of 5-10min. One of my usual
>>> > >> acceptance tests is being able to launch 100 VMs in that time.  
>>> Just
>>> > >> now Iaunched 100 in <2min from request until they'd all been
>>> > >> provisioned and were booting.  Now there's too many moving 
>>> pieces and
>>> > >> too few experimental samples to make any publishable claims, 
>>> but your
>>> > >> patch is the only thing that changed.
>>> > >>
>>> > >> Thanks,
>>> > >> -Jon
>>> >
>>> > _______________________________________________
>>> > Mailing list: 
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>> > Post to     : openstack at lists.openstack.org
>>> > Unsubscribe : 
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>
>>
>> _______________________________________________
>> Mailing list: 
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> Post to     : openstack at lists.openstack.org
>> Unsubscribe : 
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>
>
>
> _______________________________________________
> Mailing list: 
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack at lists.openstack.org
> Unsubscribe : 
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack





More information about the Openstack mailing list