[openstack-dev] [Nova] nova-compute deadlock

Qin Zhao chaochin at gmail.com
Thu Jun 5 15:21:00 UTC 2014


Hi Yuriy,

Thanks for reading my bug!  You are right. Python 3.3 or 3.4 should not
have this issue, since they have can secure the file descriptor. Before
OpenStack move to Python 3, we may still need a solution. Calling
libguestfs in a separate process seems to be a way. This way, Nova code can
close those fd by itself, not depending upon CLOEXEC. However, that will be
an expensive solution, since it requires a lot of code change. At least we
need to write code to pass the return value and exception between these two
processes. That will make this solution very complex. Do you agree?


On Thu, Jun 5, 2014 at 9:39 PM, Yuriy Taraday <yorik.sar at gmail.com> wrote:

> This behavior of os.pipe() has changed in Python 3.x so it won't be an
> issue on newer Python (if only it was accessible for us).
>
> From the looks of it you can mitigate the problem by running libguestfs
> requests in a separate process (multiprocessing.managers comes to mind).
> This way the only descriptors child process could theoretically inherit
> would be long-lived pipes to main process although they won't leak because
> they should be marked with CLOEXEC before any libguestfs request is run.
> The other benefit is that this separate process won't be busy opening and
> closing tons of fds so the problem with inheriting will be avoided.
>
>
> On Thu, Jun 5, 2014 at 2:17 PM, laserjetyang <laserjetyang at gmail.com>
> wrote:
>
>>   Will this patch of Python fix your problem? *http://bugs.python.org/issue7213
>> <http://bugs.python.org/issue7213>*
>>
>>
>> On Wed, Jun 4, 2014 at 10:41 PM, Qin Zhao <chaochin at gmail.com> wrote:
>>
>>>  Hi Zhu Zhu,
>>>
>>> Thank you for reading my diagram!   I need to clarify that this problem
>>> does not occur during data injection.  Before creating the ISO, the driver
>>> code will extend the disk. Libguestfs is invoked in that time frame.
>>>
>>> And now I think this problem may occur at any time, if the code use
>>> tpool to invoke libguestfs, and one external commend is executed in another
>>> green thread simultaneously.  Please correct me if I am wrong.
>>>
>>> I think one simple solution for this issue is to call libguestfs routine
>>> in greenthread, rather than another native thread. But it will impact the
>>> performance very much. So I do not think that is an acceptable solution.
>>>
>>>
>>>
>>>  On Wed, Jun 4, 2014 at 12:00 PM, Zhu Zhu <bjzzu.zz at gmail.com> wrote:
>>>
>>>>   Hi Qin Zhao,
>>>>
>>>> Thanks for raising this issue and analysis. According to the issue
>>>> description and happen scenario(
>>>> https://docs.google.com/drawings/d/1pItX9urLd6fmjws3BVovXQvRg_qMdTHS-0JhYfSkkVc/pub?w=960&h=720
>>>> ),  if that's the case,  concurrent mutiple KVM spawn instances(*with
>>>> both config drive and data injection enabled*) are triggered, the
>>>> issue can be very likely to happen.
>>>> As in libvirt/driver.py _create_image method, right after iso making "cdb.make_drive",
>>>> the driver will attempt "data injection" which will call the libguestfs
>>>> launch in another thread.
>>>>
>>>> Looks there were also a couple of libguestfs hang issues from Launch
>>>> pad as below. . I am not sure if libguestfs itself can have certain
>>>> mechanism to free/close the fds that inherited from parent process instead
>>>> of require explicitly calling the tear down. Maybe open a defect to
>>>> libguestfs to see what their thoughts?
>>>>
>>>>  https://bugs.launchpad.net/nova/+bug/1286256
>>>> https://bugs.launchpad.net/nova/+bug/1270304
>>>>
>>>> ------------------------------
>>>>  Zhu Zhu
>>>> Best Regards
>>>>
>>>>
>>>>  *From:* Qin Zhao <chaochin at gmail.com>
>>>> *Date:* 2014-05-31 01:25
>>>>  *To:* OpenStack Development Mailing List (not for usage questions)
>>>> <openstack-dev at lists.openstack.org>
>>>> *Subject:* [openstack-dev] [Nova] nova-compute deadlock
>>>>    Hi all,
>>>>
>>>> When I run Icehouse code, I encountered a strange problem. The
>>>> nova-compute service becomes stuck, when I boot instances. I report this
>>>> bug in https://bugs.launchpad.net/nova/+bug/1313477.
>>>>
>>>> After thinking several days, I feel I know its root cause. This bug
>>>> should be a deadlock problem cause by pipe fd leaking.  I draw a diagram to
>>>> illustrate this problem.
>>>> https://docs.google.com/drawings/d/1pItX9urLd6fmjws3BVovXQvRg_qMdTHS-0JhYfSkkVc/pub?w=960&h=720
>>>>
>>>> However, I have not find a very good solution to prevent this deadlock.
>>>> This problem is related with Python runtime, libguestfs, and eventlet. The
>>>> situation is a little complicated. Is there any expert who can help me to
>>>> look for a solution? I will appreciate for your help!
>>>>
>>>> --
>>>> Qin Zhao
>>>>
>>>>
>>>> _______________________________________________
>>>> OpenStack-dev mailing list
>>>> OpenStack-dev at lists.openstack.org
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>
>>>>
>>>
>>>
>>> --
>>> Qin Zhao
>>>
>>> _______________________________________________
>>> OpenStack-dev mailing list
>>> OpenStack-dev at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>>
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
>
> --
>
> Kind regards, Yuriy.
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>


-- 
Qin Zhao
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140605/30cdc8a4/attachment.html>


More information about the OpenStack-dev mailing list