[openstack-dev] [cinder] Taskflow 0.10.0 incompatible with NetApp NFS drivers
Bharat Kumar
bharat.kobagana at redhat.com
Fri May 8 11:37:51 UTC 2015
GlusterFS CI job is still failing with the same issue.
I gave couple of "recheck"s on [1], after
https://review.openstack.org/#/c/181288/ patch got merged.
But still GlusterFS CI job is failing with below error [2]:
ObjectDereferencedError: Can't emit change event for attribute
'Volume.provider_location' - parent object of type <Volume> has been
garbage collected.
Also I found the same behaviour with NetApp CI also.
[1] https://review.openstack.org/#/c/165424/
[2]
http://logs.openstack.org/24/165424/6/check/check-tempest-dsvm-full-glusterfs-nv/f386477/logs/screen-c-vol.txt.gz
On 05/08/2015 10:21 AM, Joshua Harlow wrote:
> Alright, it was as I had a hunch for, a small bug found in the new
> algorithm to make the storage layer
> copy-original,mutate-copy,save-copy,update-original (vs
> update-original,save-original) more reliable.
>
> https://bugs.launchpad.net/taskflow/+bug/1452978 opened and a one line
> fix made @ https://review.openstack.org/#/c/181288/ to stop trying to
> copy task results (which was activating logic that must of caused the
> reference to drop out of existence and therefore the issue noted below).
>
> Will get that released in 0.10.1 once it flushes through the pipeline.
>
> Thanks alex for helping double check, if others want to check to
> that'd be nice, can make sure that's the root cause (overzealous usage
> of copy.copy, ha).
>
> Overall I'd still *highly* recommend that the following still happen:
>
> >> One way to get around whatever the issue is would be to change the
> >> drivers to not update the object directly as it is not needed. But
> >> this should not fail. Perhaps a more proper fix is for the volume
> >> manager to not pass around sqlalchemy objects.
>
> But that can be a later tweak that cinder does; using any taskflow
> engine that isn't the greenthreaded/threaded/serial engine will
> require results to be serializable, and therefore copyable, so that
> those results can go across IPC or MQ/other boundaries. Sqlalchemy
> objects won't fit either of these cases (obviously).
>
> -Josh
>
> Joshua Harlow wrote:
>> Are we sure this is taskflow? I'm wondering since those errors are more
>> from task code (which is in cinder) and the following seems to be a
>> general garbage collection issue (not connected to taskflow?):
>>
>> 'Exception during message handling: Can't emit change event for
>> attribute 'Volume.provider_location' - parent object of type <Volume>
>> has been garbage collected.'''
>>
>> Or:
>>
>> '''2015-05-07 22:42:51.142 17040 TRACE oslo_messaging.rpc.dispatcher
>> ObjectDereferencedError: Can't emit change event for attribute
>> 'Volume.provider_location' - parent object of type <Volume> has been
>> garbage collected.'''
>>
>> Alex Meade wrote:
>>> So it seems that this will break a number of drivers, I see that
>>> glusterfs does the same thing.
>>>
>>> On Thu, May 7, 2015 at 10:29 PM, Alex Meade <mr.alex.meade at gmail.com
>>> <mailto:mr.alex.meade at gmail.com>> wrote:
>>>
>>> It appears that the release of taskflow 0.10.0 exposed an issue in
>>> the NetApp NFS drivers. Something changed that caused the sqlalchemy
>>> Volume object to be garbage collected even though it is passed into
>>> create_volume()
>>>
>>> An example error can be found in the c-vol logs here:
>>>
>>> http://dcf901611175aa43f968-c54047c910227e27e1d6f03bb1796fd7.r95.cf5.rackcdn.com/57/181157/1/check/cinder-cDOT-NFS/0473c54/
>>>
>>>
>>>
>>> One way to get around whatever the issue is would be to change the
>>> drivers to not update the object directly as it is not needed. But
>>> this should not fail. Perhaps a more proper fix is for the volume
>>> manager to not pass around sqlalchemy objects.
>>
>> +1
>>
>>>
>>> Something changed in taskflow, however, and we should just
>>> understand if that has other impact.
>>
>> I'd like to understand that also: the only one commit that touched this
>> stuff is https://github.com/openstack/taskflow/commit/227cf52 (which
>> basically ensured that a storage object copy is modified, then saved,
>> then the local object is updated vs updating the local object, and then
>> saving, which has problems/inconsistencies if the save fails).
>>
>>>
>>> -Alex
>>>
>>>
>>> __________________________________________________________________________
>>>
>>>
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe:
>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>> __________________________________________________________________________
>>
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> __________________________________________________________________________
>
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
--
Warm Regards,
Bharat Kumar Kobagana
Software Engineer
OpenStack Storage – RedHat India
Mobile - +91 9949278005
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150508/6acbde62/attachment.html>
More information about the OpenStack-dev
mailing list