[openstack-dev] [cinder] Taskflow 0.10.0 incompatible with NetApp NFS drivers

Bharat Kumar bharat.kobagana at redhat.com
Fri May 8 11:37:51 UTC 2015


GlusterFS CI job is still failing with the same issue.

I gave couple of "recheck"s on [1], after 
https://review.openstack.org/#/c/181288/ patch got merged.

But still GlusterFS CI job is failing with below error [2]:
ObjectDereferencedError: Can't emit change event for attribute 
'Volume.provider_location' - parent object of type <Volume> has been 
garbage collected.

Also I found the same behaviour with NetApp CI also.


[1] https://review.openstack.org/#/c/165424/
[2] 
http://logs.openstack.org/24/165424/6/check/check-tempest-dsvm-full-glusterfs-nv/f386477/logs/screen-c-vol.txt.gz


On 05/08/2015 10:21 AM, Joshua Harlow wrote:
> Alright, it was as I had a hunch for, a small bug found in the new 
> algorithm to make the storage layer 
> copy-original,mutate-copy,save-copy,update-original (vs 
> update-original,save-original) more reliable.
>
> https://bugs.launchpad.net/taskflow/+bug/1452978 opened and a one line 
> fix made @ https://review.openstack.org/#/c/181288/ to stop trying to 
> copy task results (which was activating logic that must of caused the 
> reference to drop out of existence and therefore the issue noted below).
>
> Will get that released in 0.10.1 once it flushes through the pipeline.
>
> Thanks alex for helping double check, if others want to check to 
> that'd be nice, can make sure that's the root cause (overzealous usage 
> of copy.copy, ha).
>
> Overall I'd still *highly* recommend that the following still happen:
>
> >> One way to get around whatever the issue is would be to change the
> >> drivers to not update the object directly as it is not needed. But
> >> this should not fail. Perhaps a more proper fix is for the volume
> >> manager to not pass around sqlalchemy objects.
>
> But that can be a later tweak that cinder does; using any taskflow 
> engine that isn't the greenthreaded/threaded/serial engine will 
> require results to be serializable, and therefore copyable, so that 
> those results can go across IPC or MQ/other boundaries. Sqlalchemy 
> objects won't fit either of these cases (obviously).
>
> -Josh
>
> Joshua Harlow wrote:
>> Are we sure this is taskflow? I'm wondering since those errors are more
>> from task code (which is in cinder) and the following seems to be a
>> general garbage collection issue (not connected to taskflow?):
>>
>> 'Exception during message handling: Can't emit change event for
>> attribute 'Volume.provider_location' - parent object of type <Volume>
>> has been garbage collected.'''
>>
>> Or:
>>
>> '''2015-05-07 22:42:51.142 17040 TRACE oslo_messaging.rpc.dispatcher
>> ObjectDereferencedError: Can't emit change event for attribute
>> 'Volume.provider_location' - parent object of type <Volume> has been
>> garbage collected.'''
>>
>> Alex Meade wrote:
>>> So it seems that this will break a number of drivers, I see that
>>> glusterfs does the same thing.
>>>
>>> On Thu, May 7, 2015 at 10:29 PM, Alex Meade <mr.alex.meade at gmail.com
>>> <mailto:mr.alex.meade at gmail.com>> wrote:
>>>
>>> It appears that the release of taskflow 0.10.0 exposed an issue in
>>> the NetApp NFS drivers. Something changed that caused the sqlalchemy
>>> Volume object to be garbage collected even though it is passed into
>>> create_volume()
>>>
>>> An example error can be found in the c-vol logs here:
>>>
>>> http://dcf901611175aa43f968-c54047c910227e27e1d6f03bb1796fd7.r95.cf5.rackcdn.com/57/181157/1/check/cinder-cDOT-NFS/0473c54/ 
>>>
>>>
>>>
>>> One way to get around whatever the issue is would be to change the
>>> drivers to not update the object directly as it is not needed. But
>>> this should not fail. Perhaps a more proper fix is for the volume
>>> manager to not pass around sqlalchemy objects.
>>
>> +1
>>
>>>
>>> Something changed in taskflow, however, and we should just
>>> understand if that has other impact.
>>
>> I'd like to understand that also: the only one commit that touched this
>> stuff is https://github.com/openstack/taskflow/commit/227cf52 (which
>> basically ensured that a storage object copy is modified, then saved,
>> then the local object is updated vs updating the local object, and then
>> saving, which has problems/inconsistencies if the save fails).
>>
>>>
>>> -Alex
>>>
>>>
>>> __________________________________________________________________________ 
>>>
>>>
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe:
>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>> __________________________________________________________________________ 
>>
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: 
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> __________________________________________________________________________ 
>
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: 
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-- 
Warm Regards,
Bharat Kumar Kobagana
Software Engineer
OpenStack Storage – RedHat India
Mobile - +91 9949278005

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150508/6acbde62/attachment.html>


More information about the OpenStack-dev mailing list