[openstack-dev] [cinder] Taskflow 0.10.0 incompatible with NetApp NFS drivers

Kerr, Andrew Andrew.Kerr at netapp.com
Fri May 8 11:51:49 UTC 2015


The problem is in the version of taskflow that is downloaded from pypi by devstack.  You will need to wait until a new version >0.10.0 is available [1]

[1] https://pypi.python.org/pypi/taskflow/

Andrew Kerr
OpenStack QA
Cloud Solutions Group
NetApp

From: Bharat Kumar <bharat.kobagana at redhat.com<mailto:bharat.kobagana at redhat.com>>
Reply-To: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>>
Date: Friday, May 8, 2015 at 7:37 AM
To: "openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>" <openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>>
Subject: Re: [openstack-dev] [cinder] Taskflow 0.10.0 incompatible with NetApp NFS drivers

GlusterFS CI job is still failing with the same issue.

I gave couple of "recheck"s on [1], after https://review.openstack.org/#/c/181288/ patch got merged.

But still GlusterFS CI job is failing with below error [2]:
ObjectDereferencedError: Can't emit change event for attribute 'Volume.provider_location' - parent object of type <Volume> has been garbage collected.

Also I found the same behaviour with NetApp CI also.


[1] https://review.openstack.org/#/c/165424/
[2] http://logs.openstack.org/24/165424/6/check/check-tempest-dsvm-full-glusterfs-nv/f386477/logs/screen-c-vol.txt.gz


On 05/08/2015 10:21 AM, Joshua Harlow wrote:
Alright, it was as I had a hunch for, a small bug found in the new algorithm to make the storage layer copy-original,mutate-copy,save-copy,update-original (vs update-original,save-original) more reliable.

https://bugs.launchpad.net/taskflow/+bug/1452978 opened and a one line fix made @ https://review.openstack.org/#/c/181288/ to stop trying to copy task results (which was activating logic that must of caused the reference to drop out of existence and therefore the issue noted below).

Will get that released in 0.10.1 once it flushes through the pipeline.

Thanks alex for helping double check, if others want to check to that'd be nice, can make sure that's the root cause (overzealous usage of copy.copy, ha).

Overall I'd still *highly* recommend that the following still happen:

>> One way to get around whatever the issue is would be to change the
>> drivers to not update the object directly as it is not needed. But
>> this should not fail. Perhaps a more proper fix is for the volume
>> manager to not pass around sqlalchemy objects.

But that can be a later tweak that cinder does; using any taskflow engine that isn't the greenthreaded/threaded/serial engine will require results to be serializable, and therefore copyable, so that those results can go across IPC or MQ/other boundaries. Sqlalchemy objects won't fit either of these cases (obviously).

-Josh

Joshua Harlow wrote:
Are we sure this is taskflow? I'm wondering since those errors are more
from task code (which is in cinder) and the following seems to be a
general garbage collection issue (not connected to taskflow?):

'Exception during message handling: Can't emit change event for
attribute 'Volume.provider_location' - parent object of type <Volume>
has been garbage collected.'''

Or:

'''2015-05-07 22:42:51.142 17040 TRACE oslo_messaging.rpc.dispatcher
ObjectDereferencedError: Can't emit change event for attribute
'Volume.provider_location' - parent object of type <Volume> has been
garbage collected.'''

Alex Meade wrote:
So it seems that this will break a number of drivers, I see that
glusterfs does the same thing.

On Thu, May 7, 2015 at 10:29 PM, Alex Meade <mr.alex.meade at gmail.com<mailto:mr.alex.meade at gmail.com>
<mailto:mr.alex.meade at gmail.com><mailto:mr.alex.meade at gmail.com>> wrote:

It appears that the release of taskflow 0.10.0 exposed an issue in
the NetApp NFS drivers. Something changed that caused the sqlalchemy
Volume object to be garbage collected even though it is passed into
create_volume()

An example error can be found in the c-vol logs here:

http://dcf901611175aa43f968-c54047c910227e27e1d6f03bb1796fd7.r95.cf5.rackcdn.com/57/181157/1/check/cinder-cDOT-NFS/0473c54/


One way to get around whatever the issue is would be to change the
drivers to not update the object directly as it is not needed. But
this should not fail. Perhaps a more proper fix is for the volume
manager to not pass around sqlalchemy objects.

+1


Something changed in taskflow, however, and we should just
understand if that has other impact.

I'd like to understand that also: the only one commit that touched this
stuff is https://github.com/openstack/taskflow/commit/227cf52 (which
basically ensured that a storage object copy is modified, then saved,
then the local object is updated vs updating the local object, and then
saving, which has problems/inconsistencies if the save fails).


-Alex


__________________________________________________________________________

OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request at lists.openstack.org?subject:unsubscribe<mailto:OpenStack-dev-request at lists.openstack.org?subject:unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe<mailto:OpenStack-dev-request at lists.openstack.org?subject:unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe<mailto:OpenStack-dev-request at lists.openstack.org?subject:unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


--
Warm Regards,
Bharat Kumar Kobagana
Software Engineer
OpenStack Storage – RedHat India
Mobile - +91 9949278005
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150508/39d3c7bf/attachment.html>


More information about the OpenStack-dev mailing list