<p dir="ltr"><br>

On Nov 5, 2013 3:33 PM, "Avishay Traeger" <<a href="mailto:AVISHAY@il.ibm.com">AVISHAY@il.ibm.com</a>> wrote:<br>

><br>

> So while doubling the timeout will fix some cases, there will be cases with<br>

> larger volumes and/or slower systems where the bug will still hit.  Even<br>

> timing out on the download progress can lead to unnecessary timeouts (if<br>

> it's really slow, or volume is really big, it can stay at 5% for some<br>

> time).<br>

><br>

> I think the proper fix is to make sure that Cinder is moving the volume<br>

> into 'error' state in all cases where there is an error.  Nova can then<br>

> poll as long as its in the 'downloading' state, until it's 'available' or<br>

> 'error'. </p>

<p dir="ltr">Agree</p>

<p dir="ltr"> Is there a reason why Cinder would legitimately get stuck in<br>

> 'downloading'?<br>

><br>

> Thanks,<br>

> Avishay<br>

><br>

><br>

><br>

> From:   John Griffith <<a href="mailto:john.griffith@solidfire.com">john.griffith@solidfire.com</a>><br>

> To:     "OpenStack Development Mailing List (not for usage questions)"<br>

>             <<a href="mailto:openstack-dev@lists.openstack.org">openstack-dev@lists.openstack.org</a>>,<br>

> Date:   11/05/2013 07:41 AM<br>

> Subject:        Re: [openstack-dev] Improvement of Cinder API wrt<br>

>             <a href="https://bugs.launchpad.net/nova/+bug/1213953">https://bugs.launchpad.net/nova/+bug/1213953</a><br>

><br>

><br>

><br>

> On Tue, Nov 5, 2013 at 7:27 AM, John Griffith<br>

> <<a href="mailto:john.griffith@solidfire.com">john.griffith@solidfire.com</a>> wrote:<br>

> > On Tue, Nov 5, 2013 at 6:29 AM, Chris Friesen<br>

> > <<a href="mailto:chris.friesen@windriver.com">chris.friesen@windriver.com</a>> wrote:<br>

> >> On 11/04/2013 03:49 PM, Solly Ross wrote:<br>

> >>><br>

> >>> So, There's currently an outstanding issue with regards to a Nova<br>

> >>> shortcut command that creates a volume from an image and then boots<br>

> >>> from it in one fell swoop.  The gist of the issue is that there is<br>

> >>> currently a set timeout which can time out before the volume creation<br>

> >>> has finished (it's designed to time out in case there is an error),<br>

> >>> in cases where the image download or volume creation takes an<br>

> >>> extended period of time (e.g. under a Gluster backend for Cinder with<br>

> >>> certain network conditions).<br>

> >>><br>

> >>> The proposed solution is a modification to the Cinder API to provide<br>

> >>> more detail on what exactly is going on, so that we could<br>

> >>> programmatically tune the timeout.  My initial thought is to create a<br>

> >>> new column in the Volume table called 'status_detail' to provide more<br>

> >>> detailed information about the current status.  For instance, for the<br>

> >>> 'downloading' status, we could have 'status_detail' be the completion<br>

> >>> percentage or JSON containing the total size and the current amount<br>

> >>> copied.  This way, at each interval we could check to see if the<br>

> >>> amount copied had changed, and trigger the timeout if it had not,<br>

> >>> instead of blindly assuming that the operation will complete within a<br>

> >>> given amount of time.<br>

> >>><br>

> >>> What do people think?  Would there be a better way to do this?<br>

> >><br>

> >><br>

> >> The only other option I can think of would be some kind of callback that<br>

> >> cinder could explicitly call to drive updates and/or notifications of<br>

> faults<br>

> >> rather than needing to wait for a timeout.  Possibly a combination of<br>

> both<br>

> >> would be best, that way you could add a --poll option to the "create<br>

> volume<br>

> >> and boot" CLI command.<br>

> >><br>

> >> I come from the kernel-hacking world and most things there involve<br>

> >> event-driven callbacks.  Looking at the openstack code I was kind of<br>

> >> surprised to see hardcoded timeouts and RPC casts with no callbacks to<br>

> >> indicate completion.<br>

> >><br>

> >> Chris<br>

> >><br>

> >><br>

> >> _______________________________________________<br>

> >> OpenStack-dev mailing list<br>

> >> <a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>

> >> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

><br>

> I believe you're referring to [1], which was closed after a patch was<br>

> added to nova to double the timeout length.  Based on comments sounds<br>

> like your still seeing issues on some Gluster (maybe other) setups?<br>

><br>

> Rather than mess with the API in order to do debug, why don't you use<br>

> the info in the cinder-logs?<br>

><br>

> [1] <a href="https://bugs.launchpad.net/nova/+bug/1213953">https://bugs.launchpad.net/nova/+bug/1213953</a><br>

><br>

> _______________________________________________<br>

> OpenStack-dev mailing list<br>

> <a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>

> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

><br>

><br>

><br>

><br>

> _______________________________________________<br>

> OpenStack-dev mailing list<br>

> <a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>

> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

</p>