[openstack-dev] [Nova] nova-compute deadlock

Qin Zhao chaochin at gmail.com
Fri Jun 6 19:27:52 UTC 2014


Yuriy,

And I think if we use proxy object of multiprocessing, the green thread
will not switch during we call libguestfs.  Is that correct?


On Fri, Jun 6, 2014 at 2:44 AM, Qin Zhao <chaochin at gmail.com> wrote:

> Hi Yuriy,
>
> I read multiprocessing source code just now.  Now I feel it may not solve
> this problem very easily.  For example, let us assume that we will use the
> proxy object in Manager's process to call libguestfs.  In manager.py, I see
> it needs to create a pipe, before fork the child process. The write end of
> this pipe is required by child process.
>
>
> http://sourcecodebrowser.com/python-multiprocessing/2.6.2.1/classmultiprocessing_1_1managers_1_1_base_manager.html#a57fe9abe7a3d281286556c4bf3fbf4d5
>
> And in Process._bootstrp(), I think we will need to register a function to
> be called by _run_after_forkers(), in order to closed the fds inherited
> from Nova process.
>
>
> http://sourcecodebrowser.com/python-multiprocessing/2.6.2.1/classmultiprocessing_1_1process_1_1_process.html#ae594800e7bdef288d9bfbf8b79019d2e
>
> And we also can not close the write end fd created by Manager in
> _run_after_forkers(). One feasible way may be getting that fd from the 5th
> element of _args attribute of Process object, then skip to close this
> fd....  I have not investigate if or not Manager need to use other fds,
> besides this pipe. Personally, I feel such an implementation will be a
> little tricky and risky, because it tightly depends on Manager code. If
> Manager opens other files, or change the argument order, our code will fail
> to run. Am I wrong?  Is there any other safer way?
>
>
> On Thu, Jun 5, 2014 at 11:40 PM, Yuriy Taraday <yorik.sar at gmail.com>
> wrote:
>
>> Please take a look at
>> https://docs.python.org/2.7/library/multiprocessing.html#managers -
>> everything is already implemented there.
>> All you need is to start one manager that would serve all your requests
>> to libguestfs. The implementation in stdlib will provide you with all
>> exceptions and return values with minimum code changes on Nova side.
>> Create a new Manager, register an libguestfs "endpoint" in it and call
>> start(). It will spawn a separate process that will speak with calling
>> process over very simple RPC.
>> From the looks of it all you need to do is replace tpool.Proxy calls in
>> VFSGuestFS.setup method to calls to this new Manager.
>>
>>
>> On Thu, Jun 5, 2014 at 7:21 PM, Qin Zhao <chaochin at gmail.com> wrote:
>>
>>> Hi Yuriy,
>>>
>>> Thanks for reading my bug!  You are right. Python 3.3 or 3.4 should not
>>> have this issue, since they have can secure the file descriptor. Before
>>> OpenStack move to Python 3, we may still need a solution. Calling
>>> libguestfs in a separate process seems to be a way. This way, Nova code can
>>> close those fd by itself, not depending upon CLOEXEC. However, that will be
>>> an expensive solution, since it requires a lot of code change. At least we
>>> need to write code to pass the return value and exception between these two
>>> processes. That will make this solution very complex. Do you agree?
>>>
>>>
>>> On Thu, Jun 5, 2014 at 9:39 PM, Yuriy Taraday <yorik.sar at gmail.com>
>>> wrote:
>>>
>>>> This behavior of os.pipe() has changed in Python 3.x so it won't be an
>>>> issue on newer Python (if only it was accessible for us).
>>>>
>>>> From the looks of it you can mitigate the problem by running libguestfs
>>>> requests in a separate process (multiprocessing.managers comes to mind).
>>>> This way the only descriptors child process could theoretically inherit
>>>> would be long-lived pipes to main process although they won't leak because
>>>> they should be marked with CLOEXEC before any libguestfs request is run.
>>>> The other benefit is that this separate process won't be busy opening and
>>>> closing tons of fds so the problem with inheriting will be avoided.
>>>>
>>>>
>>>> On Thu, Jun 5, 2014 at 2:17 PM, laserjetyang <laserjetyang at gmail.com>
>>>> wrote:
>>>>
>>>>>   Will this patch of Python fix your problem? *http://bugs.python.org/issue7213
>>>>> <http://bugs.python.org/issue7213>*
>>>>>
>>>>>
>>>>> On Wed, Jun 4, 2014 at 10:41 PM, Qin Zhao <chaochin at gmail.com> wrote:
>>>>>
>>>>>>  Hi Zhu Zhu,
>>>>>>
>>>>>> Thank you for reading my diagram!   I need to clarify that this
>>>>>> problem does not occur during data injection.  Before creating the ISO, the
>>>>>> driver code will extend the disk. Libguestfs is invoked in that time frame.
>>>>>>
>>>>>> And now I think this problem may occur at any time, if the code use
>>>>>> tpool to invoke libguestfs, and one external commend is executed in another
>>>>>> green thread simultaneously.  Please correct me if I am wrong.
>>>>>>
>>>>>> I think one simple solution for this issue is to call libguestfs
>>>>>> routine in greenthread, rather than another native thread. But it will
>>>>>> impact the performance very much. So I do not think that is an acceptable
>>>>>> solution.
>>>>>>
>>>>>>
>>>>>>
>>>>>>  On Wed, Jun 4, 2014 at 12:00 PM, Zhu Zhu <bjzzu.zz at gmail.com> wrote:
>>>>>>
>>>>>>>   Hi Qin Zhao,
>>>>>>>
>>>>>>> Thanks for raising this issue and analysis. According to the issue
>>>>>>> description and happen scenario(
>>>>>>> https://docs.google.com/drawings/d/1pItX9urLd6fmjws3BVovXQvRg_qMdTHS-0JhYfSkkVc/pub?w=960&h=720
>>>>>>> ),  if that's the case,  concurrent mutiple KVM spawn instances(*with
>>>>>>> both config drive and data injection enabled*) are triggered, the
>>>>>>> issue can be very likely to happen.
>>>>>>> As in libvirt/driver.py _create_image method, right after iso
>>>>>>> making "cdb.make_drive", the driver will attempt "data injection"
>>>>>>> which will call the libguestfs launch in another thread.
>>>>>>>
>>>>>>> Looks there were also a couple of libguestfs hang issues from Launch
>>>>>>> pad as below. . I am not sure if libguestfs itself can have certain
>>>>>>> mechanism to free/close the fds that inherited from parent process instead
>>>>>>> of require explicitly calling the tear down. Maybe open a defect to
>>>>>>> libguestfs to see what their thoughts?
>>>>>>>
>>>>>>>  https://bugs.launchpad.net/nova/+bug/1286256
>>>>>>> https://bugs.launchpad.net/nova/+bug/1270304
>>>>>>>
>>>>>>> ------------------------------
>>>>>>>  Zhu Zhu
>>>>>>> Best Regards
>>>>>>>
>>>>>>>
>>>>>>>  *From:* Qin Zhao <chaochin at gmail.com>
>>>>>>> *Date:* 2014-05-31 01:25
>>>>>>>  *To:* OpenStack Development Mailing List (not for usage questions)
>>>>>>> <openstack-dev at lists.openstack.org>
>>>>>>> *Subject:* [openstack-dev] [Nova] nova-compute deadlock
>>>>>>>    Hi all,
>>>>>>>
>>>>>>> When I run Icehouse code, I encountered a strange problem. The
>>>>>>> nova-compute service becomes stuck, when I boot instances. I report this
>>>>>>> bug in https://bugs.launchpad.net/nova/+bug/1313477.
>>>>>>>
>>>>>>> After thinking several days, I feel I know its root cause. This bug
>>>>>>> should be a deadlock problem cause by pipe fd leaking.  I draw a diagram to
>>>>>>> illustrate this problem.
>>>>>>> https://docs.google.com/drawings/d/1pItX9urLd6fmjws3BVovXQvRg_qMdTHS-0JhYfSkkVc/pub?w=960&h=720
>>>>>>>
>>>>>>> However, I have not find a very good solution to prevent this
>>>>>>> deadlock. This problem is related with Python runtime, libguestfs, and
>>>>>>> eventlet. The situation is a little complicated. Is there any expert who
>>>>>>> can help me to look for a solution? I will appreciate for your help!
>>>>>>>
>>>>>>> --
>>>>>>> Qin Zhao
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> OpenStack-dev mailing list
>>>>>>> OpenStack-dev at lists.openstack.org
>>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Qin Zhao
>>>>>>
>>>>>> _______________________________________________
>>>>>> OpenStack-dev mailing list
>>>>>> OpenStack-dev at lists.openstack.org
>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> OpenStack-dev mailing list
>>>>> OpenStack-dev at lists.openstack.org
>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Kind regards, Yuriy.
>>>>
>>>> _______________________________________________
>>>> OpenStack-dev mailing list
>>>> OpenStack-dev at lists.openstack.org
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>
>>>>
>>>
>>>
>>> --
>>> Qin Zhao
>>>
>>> _______________________________________________
>>> OpenStack-dev mailing list
>>> OpenStack-dev at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>>
>>
>>
>> --
>>
>> Kind regards, Yuriy.
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
>
> --
> Qin Zhao
>



-- 
Qin Zhao
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140607/7d41abe1/attachment-0001.html>


More information about the OpenStack-dev mailing list