[openstack-dev] [Nova] nova-compute deadlock

laserjetyang laserjetyang at gmail.com
Thu Jun 5 10:17:46 UTC 2014


  Will this patch of Python fix your problem? *http://bugs.python.org/issue7213
<http://bugs.python.org/issue7213>*

On Wed, Jun 4, 2014 at 10:41 PM, Qin Zhao <chaochin at gmail.com> wrote:

>  Hi Zhu Zhu,
>
> Thank you for reading my diagram!   I need to clarify that this problem
> does not occur during data injection.  Before creating the ISO, the driver
> code will extend the disk. Libguestfs is invoked in that time frame.
>
> And now I think this problem may occur at any time, if the code use tpool
> to invoke libguestfs, and one external commend is executed in another green
> thread simultaneously.  Please correct me if I am wrong.
>
> I think one simple solution for this issue is to call libguestfs routine
> in greenthread, rather than another native thread. But it will impact the
> performance very much. So I do not think that is an acceptable solution.
>
>
>
>  On Wed, Jun 4, 2014 at 12:00 PM, Zhu Zhu <bjzzu.zz at gmail.com> wrote:
>
>>   Hi Qin Zhao,
>>
>> Thanks for raising this issue and analysis. According to the issue
>> description and happen scenario(
>> https://docs.google.com/drawings/d/1pItX9urLd6fmjws3BVovXQvRg_qMdTHS-0JhYfSkkVc/pub?w=960&h=720
>> ),  if that's the case,  concurrent mutiple KVM spawn instances(*with
>> both config drive and data injection enabled*) are triggered, the issue
>> can be very likely to happen.
>> As in libvirt/driver.py _create_image method, right after iso making "cdb.make_drive",
>> the driver will attempt "data injection" which will call the libguestfs
>> launch in another thread.
>>
>> Looks there were also a couple of libguestfs hang issues from Launch pad
>> as below. . I am not sure if libguestfs itself can have certain mechanism
>> to free/close the fds that inherited from parent process instead of require
>> explicitly calling the tear down. Maybe open a defect to libguestfs to see
>> what their thoughts?
>>
>>  https://bugs.launchpad.net/nova/+bug/1286256
>> https://bugs.launchpad.net/nova/+bug/1270304
>>
>> ------------------------------
>>  Zhu Zhu
>> Best Regards
>>
>>
>>  *From:* Qin Zhao <chaochin at gmail.com>
>> *Date:* 2014-05-31 01:25
>>  *To:* OpenStack Development Mailing List (not for usage questions)
>> <openstack-dev at lists.openstack.org>
>> *Subject:* [openstack-dev] [Nova] nova-compute deadlock
>>    Hi all,
>>
>> When I run Icehouse code, I encountered a strange problem. The
>> nova-compute service becomes stuck, when I boot instances. I report this
>> bug in https://bugs.launchpad.net/nova/+bug/1313477.
>>
>> After thinking several days, I feel I know its root cause. This bug
>> should be a deadlock problem cause by pipe fd leaking.  I draw a diagram to
>> illustrate this problem.
>> https://docs.google.com/drawings/d/1pItX9urLd6fmjws3BVovXQvRg_qMdTHS-0JhYfSkkVc/pub?w=960&h=720
>>
>> However, I have not find a very good solution to prevent this deadlock.
>> This problem is related with Python runtime, libguestfs, and eventlet. The
>> situation is a little complicated. Is there any expert who can help me to
>> look for a solution? I will appreciate for your help!
>>
>> --
>> Qin Zhao
>>
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
>
> --
> Qin Zhao
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140605/19a16652/attachment.html>


More information about the OpenStack-dev mailing list