[Openstack] [Keystone] performance issues after havana upgrade
Felix Lee
zaknafein.lee at gmail.com
Wed Jan 29 15:01:00 UTC 2014
Dear all,
Just some experiences to share on this.
After I upgraded Grizzy to Havana, I lived with keystone token
expiration = 14400 plus memcached backend perfectly without patch for weeks.
But since last week, it started suffering "Unable to add token user
list" issue. So, I was then adjusting token lifetime from 3 hours, 2
hours and to 1 hour, or even less, but none of them could really solve
this issue for good(keystone could last for ~10 minute at most), further
more, after couple restarts of keystone and flush memcached, the
keystone suddenly could not start up properly, and kept complaining
error like this:
2014-01-24 13:25:52.081 91813 INFO
keystone.common.environment.eventlet_server [-] Starting
/usr/bin/keystone-all
on 0.0.0.0:35357
2014-01-24 13:25:52.081 91813 CRITICAL keystone [-] [Errno 98] Address
already in use
So, I checked system with:
netstat -nap
lsof -i :35357
But I saw no any process or connection was occupying 35357 socket.
I never encountered this problem on Linux before, the only way to solve
this issue(excepting to reboot machine :) ) is to flush cache by hand,
like this:
sync; echo 3 > /proc/sys/vm/drop_caches
I suspect that the 35357 socket file was removed from /proc while
process was stopped but somehow it was still remaining in memory cache
by unknown reason, probably it's some undiscovered bug of eventlet
server I don't know..., anyway, after this incident, I applied the patch
and use expiration = 3600 for token life time, now, everything is
working perfectly again. Only I still have no idea why the problem was
suddenly escalated into such terrible condition, just like keystone was
suddenly suffering token DDoS attack by Neutron agent and other internal
Openstack service components with no reason...
Best regards,
Felix Lee ~
On 2014年01月13日 17:25, Morgan Fainberg wrote:
> Hi Tim,
>
> The change is being proposed directly to stabe/havana. We have an
> alternative implementation for Icehouse as we are refactoring the entire
> key-value-store system and making memcache a version of that new
> implementation.
>
> Cheers,
> Morgan
>
> On January 12, 2014 at 10:14:49, Tim Bell (tim.bell at cern.ch
> <mailto://tim.bell@cern.ch>) wrote:
>
>> Can we tag this patch for backporting to Havana stable ?
>>
>> We're starting work for the CERN upgrade and this looks like a very
>> useful patch to be part of the standard Havana offering.
>>
>> Tim
>>
>> > -----Original Message-----
>> > From: Jonathan Proulx [mailto:jon at jonproulx.com]
>> > Sent: 12 January 2014 18:32
>> > To: Morgan Fainberg
>> > Cc: openstack at lists.openstack.org
>> > Subject: Re: [Openstack] [Keystone] performance issues after havana upgrade
>> >
>> > puzzling side effect?
>> >
>> > I just made a small change to neutron.conf (adjusted a default quota) and restarted neutron-server, now neutron (but not other services)
>> > is
>> > spweing:
>> >
>> > Invalid user token - rejecting request
>> >
>> > (quite possibly only from dashboard requests CLI seems to work). I've tried restarting keystone (in both wsgi and eventlet modes),
>> > restarting neutron-server w/ reverted config and flushing/restarting memcached in various combinations.
>> >
>> > I don't really see how restarting neutron-server could confuse token validation...
>> >
>> >
>> > On Sun, Jan 12, 2014 at 10:38 AM, Morgan Fainberg <morgan at metacloud.com> wrote:
>> > > Thanks for confirming this! It also validates my new logic going into
>> > > icehouse (I might have had some ulterior motives here, or not so
>> > > ulterior as the case may be). I'll make sure we resolve the test
>> > > issues (unrelated to the patch) and get it into the Havana tree so you
>> > > don't need to maintain it outside of the releases.
>> > >
>> > > Cheers,
>> > > Morgan
>> > >
>> > > Sent from my tablet-like-device
>> > >
>> > >> On Jan 11, 2014, at 11:01 PM, Jonathan Proulx <jon at jonproulx.com> wrote:
>> > >>
>> > >>> On Sat, Jan 11, 2014 at 10:57 PM, Morgan Fainberg <m at metacloud.com> wrote:
>> > >>> Sounds good! Just remember that prior to the fix I posted there,
>> > >>> for each token in the user's index, it incurred a round-trip to
>> > >>> memcached to validate the token wasn't expired. This change makes
>> > >>> it so that there are significantly less trips from keystone to memcached.
>> > >>>
>> > >>> If this doesn't 100% solve the issue, we should start digging
>> > >>> further into what is going on, but I am confident this will (at the
>> > >>> very least) help a reasonable amount.
>> > >>
>> > >> You sir are a miracle worker, my hat is off!
>> > >>
>> > >> The responsiveness of everything is better than it's ever been, my
>> > >> users will think this is the best feature the upgrade.
>> > >>
>> > >> For example earlier today I managed to launch 10 VMs in parallel,
>> > >> eventually, I'd guess on the order of 5-10min. One of my usual
>> > >> acceptance tests is being able to launch 100 VMs in that time. Just
>> > >> now Iaunched 100 in <2min from request until they'd all been
>> > >> provisioned and were booting. Now there's too many moving pieces and
>> > >> too few experimental samples to make any publishable claims, but your
>> > >> patch is the only thing that changed.
>> > >>
>> > >> Thanks,
>> > >> -Jon
>> >
>> > _______________________________________________
>> > Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> > Post to : openstack at lists.openstack.org
>> > Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>
>
> _______________________________________________
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack at lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>
More information about the Openstack
mailing list