Open Stack

Sat Jan 11 18:34:26 UTC 2014

Hi All,

recently upgraded my 1 controller 60 compute node Ubuntu 12.04(+cloud
archive) system from Grizlzy to Havana.  Now even before I let my
users back to the API I'm barely able to do anything due to
authentication time outs.  I am using neutron which like to
authenticate *a lot*, I'm not entierly convinced the real problem
isn't neutron reauthenticating a bajillion services a second, but
right now it looks like Keystone...

I'm using UUID Tokens with memcached backend.

Under Grizzly I had been using Peter Feiner's multi worker patches,
though this was only needed for peak loads (when starting 100's of
instances).  With just background noise of the running compute-nodes
(60) and instances (281) in default single worker (eventlet) mode
keystone runs at 100% and many client requests (from dash board or
CLI) time out.  Nova-compute nodes also frequently log timeouts when
trying to authenticate to the neutron service.

Verbose keystone logs show very little activity even when running 100%
load, just a few "INFO acceess" type entries every few *minutes*.
Debug logging show many sqlalchemy actions per second for the neutron
user id and admin tenant id.

I took the next obvious step and put keystone behind apache, while
that does get more servers running, performance if anything is even
worse while using virtually all of the 12 core controller node's CPUs
rather than just one of them.

The logs quickly fill with data read timeouts:

2014-01-11 12:31:26.606 3054 INFO access [-] 192.168.128.43 - -
[11/Jan/2014:17:31:26 +0000] "POST
http://192.168.128.15:35357/v2.0/tokens HTTP/1.1" 500 167
2014-01-11 12:31:26.621 3054 ERROR keystone.common.wsgi [-] request
data read error
2014-01-11 12:31:26.621 3054 TRACE keystone.common.wsgi Traceback
(most recent call last):
2014-01-11 12:31:26.621 3054 TRACE keystone.common.wsgi   File
"/usr/lib/python2.7/dist-packages/keystone/common/wsgi.py", line 371,
in __call__
2014-01-11 12:31:26.621 3054 TRACE keystone.common.wsgi     response =
self.process_request(request)
2014-01-11 12:31:26.621 3054 TRACE keystone.common.wsgi   File
"/usr/lib/python2.7/dist-packages/keystone/middleware/core.py", line
110, in process_request
2014-01-11 12:31:26.621 3054 TRACE keystone.common.wsgi
params_json = request.body
2014-01-11 12:31:26.621 3054 TRACE keystone.common.wsgi   File
"/usr/lib/python2.7/dist-packages/webob/request.py", line 677, in
_body__get
2014-01-11 12:31:26.621 3054 TRACE keystone.common.wsgi
self.make_body_seekable() # we need this to have content_length
2014-01-11 12:31:26.621 3054 TRACE keystone.common.wsgi   File
"/usr/lib/python2.7/dist-packages/webob/request.py", line 922, in
make_body_seekable
2014-01-11 12:31:26.621 3054 TRACE keystone.common.wsgi     self.copy_body()
2014-01-11 12:31:26.621 3054 TRACE keystone.common.wsgi   File
"/usr/lib/python2.7/dist-packages/webob/request.py", line 945, in
copy_body
2014-01-11 12:31:26.621 3054 TRACE keystone.common.wsgi     self.body
= self.body_file.read(self.content_length)
2014-01-11 12:31:26.621 3054 TRACE keystone.common.wsgi   File
"/usr/lib/python2.7/dist-packages/webob/request.py", line 1521, in
readinto
2014-01-11 12:31:26.621 3054 TRACE keystone.common.wsgi     data =
self.file.read(sz0)
2014-01-11 12:31:26.621 3054 TRACE keystone.common.wsgi IOError:
request data read error
2014-01-11 12:31:26.621 3054 TRACE keystone.common.wsgi

I've played around with different process and tread count numbers in
the WSGIDaemonProcess apache directive, but while some seem make it
worse none have made it better.

Clearly I must be doing something wrong since the single process
eventlet mode has significantly better performance than the
multiprocess wsgi mode.

I've also fiddled a bit with the dogpile cache settings, when running
a single stand alone process the 'memory' backend seemed to make
things actually go, though after getting the pylibmc backend setup (or
I think setup there could well be more I'm misisng), which didn't make
a noticible difference, I wasn't able to revert to the 'success' or
the memory backend though for obvious reasons I wouldn't have wanted
to keep that one in production anyway.

How can I either make Keystone go faster or Neutron authenticate less?

-Jon

Open Stack

[Openstack] [Keystone] performance issues after havana upgrade

OpenStack

Community

Documentation

Branding & Legal