[Openstack] Is there a reason Nova doesn't use scoped sessions in sqlalchemy ?

Day, Phil philip.day at hp.com
Tue Nov 1 14:20:13 UTC 2011


Hi Vish,

I probably wasn't careful enough with my wording - the API server may not be threaded as such, but the use of eventlets gives effectively the same concurrency issues that point towards needing to use scoped sessions.

Our basis for concluding that this is some form of concurrency issue is that we can easily reproduce the issue by running concurrent requests into an API server, and we have seen the problem disappear if we reduce the eventlet pool to 1 or change to scoped sessions.   Whilst the symptom is that the session has terminated by the time the lazy load is requested, as far as we can see the eventlet handing the query hasn't itself terminated the session - although it does seem likely that another eventlet using the same shared session could have. This seems to be specifically the type of issue that scoped sessions are intended to address.

http://www.sqlalchemy.org/docs/orm/session.html#contextual-thread-local-sessions

All of this is based on a limited understanding of how sqlalchemy is used in Nova - I'd be more than happy to be corrected by others with more experience, hence the question to the mailing list.

I fully understand the drive to clean up the database layer, and I'm not knocking the fix to 855660 - its clearly a good template for the way the DB needs to go in Essex.   My concern is that as shown by 855660 these changes have a pretty wide scope, and by the time that's been expanded to all of the current joinedloads it feels like it would be such a large set of changes that I'd be concerned about them coming back into Diablo.Stable.

Hence instead we were looking for a much smaller change that can address the whole class of problem of joinedloads in Diablo for now ahead of the DB refactoring in Essex - and from our testing scoped sessions seem to address that.  However as changing to scoped session breaks the migrate code in unit tests, and not really understanding why this is or the intricacies of the DB unit tests I wanted to see if we were heading down a path that had already been examined and discarded before we spend too much time on it.

I'd be really interested in hearing from anyone with experience of scoped_sessions, and/or willing to help us understand the issues we're seeing in the Unit Tests.

And of course I'd like to know what the communities feeling is towards a simpler approach to fixing the issue in Diablo.Final vs the backport of DB simplification changes from Essex - which I'm assuming will take some tiem yet to work through all of the joinedloads.

Phil

From: Vishvananda Ishaya [mailto:vishvananda at gmail.com]
Sent: 31 October 2011 19:50
To: Day, Phil
Cc: openstack at lists.launchpad.net (openstack at lists.launchpad.net); Johnson, Andrew Gordon (HP Cloud Services); Hassan, Ahmad; Haynes, David; nova-database at lists.launchpad.net
Subject: Re: [Openstack] Is there a reason Nova doesn't use scoped sessions in sqlalchemy ?

All of the workers are single-threaded, so I'm not sure that scoped sessions are really necessary.

We did however decide that objects from the db layer are supposed to be simple dictionaries.  We currently allow nested dictionaries to optimize joined objects. Unfortunately we never switched to sanitizing data from sqlalchemy, and instead we make the sqlalchemy objects provide a dictionary-like interface and pass the object itself.

The issue that you're seeing is because network wasn't properly 'joinedload'ed in the initial query, and because the data is not sanitized, sqlalchemy tries to joinedload, but the session has been terminated.  If we had sanitized data, we would get a more useful error like a key error when network is accessed. The current solution is to add the proper joinedload.

One of the goals of the nova-database team is to do the necessary data sanitization and to remove as many of the joinedloads as possible (hopefully all of them).

Vish

On Oct 31, 2011, at 12:25 PM, Day, Phil wrote:


Hi Folks,

We've been looking into a problem which looks a lot like:

https://bugs.launchpad.net/nova/+bug/855660



2011-10-21 14:13:31,035 ERROR nova.api [5bd52130-d46f-4702-b06b-9ca5045473d7 smokeuser smokeproject] Unexpected error raised: Parent instance <FixedIp at 0x4e74490> is not bound to a Session; lazy load operation of attribute 'network' cannot proceed
(nova.api): TRACE: Traceback (most recent call last):
(nova.api): TRACE: File "/usr/lib/python2.7/dist-packages/nova/api/ec2/__init__.py", line 363, in __call__
(nova.api): TRACE: result = api_request.invoke(context)
(nova.api): TRACE: File "/usr/lib/python2.7/dist-packages/nova/api/ec2/apirequest.py", line 90, in invoke
(nova.api): TRACE: result = method(context, **args)
(nova.api): TRACE: File "/usr/lib/python2.7/dist-packages/nova/api/ec2/cloud.py", line 1195, in describe_instances
(nova.api): TRACE: instance_id=instance_id)
(nova.api): TRACE: File "/usr/lib/python2.7/dist-packages/nova/api/ec2/cloud.py", line 1204, in _format_describe_instances
(nova.api): TRACE: return {'reservationSet': self._format_instances(context, **kwargs)}
(nova.api): TRACE: File "/usr/lib/python2.7/dist-packages/nova/api/ec2/cloud.py", line 1309, in _format_instances
(nova.api): TRACE: if fixed['network'] and use_v6:
(nova.api): TRACE: File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/models.py", line 76, in __getitem__
(nova.api): TRACE: return getattr(self, key)
(nova.api): TRACE: File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/attributes.py", line 163, in __get__
(nova.api): TRACE: instance_dict(instance))
(nova.api): TRACE: File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/attributes.py", line 383, in get
(nova.api): TRACE: value = callable_(passive=passive)
(nova.api): TRACE: File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/strategies.py", line 595, in __call__
(nova.api): TRACE: (mapperutil.state_str(state), self.key)
(nova.api): TRACE: DetachedInstanceError: Parent instance <FixedIp at 0x4e74490> is not bound to a Session; lazy load operation of attribute 'network' cannot proceed
(nova.api): TRACE:


As far as we can see the problem seems to be related to some conflict between multiple threads in the same API server instance and lazy loading of some part of the object.

Looking at the sqlalchemy documentation it seems to strongly suggest that when used from multi-threaded WSGI applications that scoped_sessions should be used (I'm not clear on the details but it seems that this effectively makes lazy load operations thread safe).    However whilst this fixes the problem it has a bad effect on the unit tests - in particular it seems to upset all of the DB migration code used in the unit tests.

So does anyone know if there was an explicit decision / reason not to use scoped_sessions in Nova ?

Thanks,
Phil

PS:  The other possible fix we've found is to change sqlalchemy/models.py so that the associations are explicitly set to use eager load - which also seems to fix the problem but feels like a more clumsy way to go about it.   Any thoughts on that would also be appreciated ?



_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to     : openstack at lists.launchpad.net<mailto:openstack at lists.launchpad.net>
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20111101/845e4887/attachment.html>


More information about the Openstack mailing list