[ironic][tripleo] RFC: deprecate the iSCSI deploy interface?

Arne Wiebalck arne.wiebalck at cern.ch
Mon Aug 24 09:03:15 UTC 2020

Hi Dmitry,

On 24.08.20 10:32, Dmitry Tantsur wrote:
> Hi,
> On Mon, Aug 24, 2020 at 10:24 AM Arne Wiebalck <arne.wiebalck at cern.ch 
> <mailto:arne.wiebalck at cern.ch>> wrote:
>     Hi!
>     CERN's deployment is using the iscsi deploy interface since we started
>     with Ironic a couple of years ago (and we installed around 5000 nodes
>     with it by now). The reason we chose it at the time was simplicity: we
>     did not (and still do not) have a Swift backend to Glance, and the iscsi
>     interface provided a straightforward alternative.
>     While we have not seen obscure bugs/issues with it, I can certainly back
>     the scalability issues mentioned by Dmitry: the tunneling of the images
>     through the controllers can create issues when deploying hundreds of
>     nodes at the same time. The security of the iscsi interface is less
>     of a
>     concern in our specific environment.
>     So, why did we not move to direct (yet)? In addition to the lack of
>     Swift, mostly since iscsi works for us and the scalability issues were
>     not that much of a burning problem ... so we focused on other things :)
>     Here are some thoughts/suggestions for this discussion:
>     How would 'direct' work with other Glance backends (like Ceph/RBD in
>     our
>     case)? If using direct requires to duplicate images from Glance to
>     Ironic (or somewhere else) to be served, I think this would be an
>     argument against deprecating iscsi.
> With image_download_source=http ironic will download the image to the 
> conductor to be able serve it to the node. Which is exactly what the 
> iscsi is doing, so not much of a change for you (except for 
> s/iSCSI/HTTP/ as a means of serving the image).
> Would it be an option for you to test direct deploy with 
> image_download_source=http?

Oh, absolutely! I was not aware that setting this option would make 
Ironic act as an image buffer (I thought this would expect some URL the 
admin had to provide) ... I will try this and let you know.

>     Equally, if this would require to completely move the Glance backend to
>     something else, like from RBD to RadosGW, I would not expect happy
>     operators. (Does anyone know if RadosGW could even replace Swift for
>     this specific use case?)
> AFAIK ironic works with RadosGW, we have some support code for it.

I was mostly asking to see if RadosGW is a (longer term) option to fully 
benefit from direct's inherent scaling.

>     Do we have numbers on how many deployments use iscsi vs direct? If many
>     rely on iscsi, I would also suggest to establish a migration guide for
>     operators on how to move from iscsi to direct, for the various configs.
>     Recent versions of Glance support multiple backends, so a migration path
>     may be to add a new (direct compatible) backend for new images.
> I don't have any numbers, but a migration guide is a must in any case.
> I expect most TripleO consumers to use the iscsi deploy, but only 
> because it's the default. Their Edge solution uses the direct deploy. 
> I've polled a few operators I know, they all (except for you, obviously 
> :) seem to use the direct deploy. Metal3 uses direct deploy.


> Dmitry
>     Cheers,
>        Arne
>     On 20.08.20 17:49, Julia Kreger wrote:
>      > I'm having a sense of deja vu!
>      >
>      > Because of the way the mechanics work, the iscsi deploy driver is in
>      > an unfortunate position of being harder to troubleshoot and diagnose
>      > failures. Which basically means we've not been able to really
>     identify
>      > common failures and add logic to handle them appropriately, like we
>      > are able to with a tcp socket and file download. Based on this alone,
>      > I think it makes a solid case for us to seriously consider
>      > deprecation.
>      >
>      > Overall, I'm +1 for the proposal and I believe over two cycles is the
>      > right way to go.
>      >
>      > I suspect we're going to have lots of push back from the TripleO
>      > community because there has been resistance to change their default
>      > usage in the past. As such I'm adding them to the subject so
>     hopefully
>      > they will be at least aware.
>      >
>      > I guess my other worry is operators who already have a substantial
>      > operational infrastructure investment built around the iscsi deploy
>      > interface. I wonder why they didn't use direct, but maybe they have
>      > all migrated in the past ?5? years. This could just be a non-concern
>      > in reality, I'm just not sure.
>      >
>      > Of course, if someone is willing to step up and make the iscsi
>      > deployment interface their primary focus, that also shifts the
>      > discussion to making direct the default interface?
>      >
>      > -Julia
>      >
>      >
>      > On Thu, Aug 20, 2020 at 1:57 AM Dmitry Tantsur
>     <dtantsur at redhat.com <mailto:dtantsur at redhat.com>> wrote:
>      >>
>      >> Hi all,
>      >>
>      >> Side note for those lacking context: this proposal concerns
>     deprecating one of the ironic deploy interfaces detailed in
>     https://docs.openstack.org/ironic/latest/admin/interfaces/deploy.html.
>     It does not affect the boot-from-iSCSI feature.
>      >>
>      >> I would like to propose deprecating and removing the 'iscsi'
>     deploy interface over the course of the next 2 cycles. The reasons are:
>      >> 1) The iSCSI deploy is a source of occasional cryptic bugs when
>     a target cannot be discovered or mounted properly.
>      >> 2) Its security is questionable: I don't think we even use
>     authentication.
>      >> 3) Operators confusion: right now we default to the iSCSI deploy
>     but pretty much direct everyone who cares about scalability or
>     security to the 'direct' deploy.
>      >> 4) Cost of maintenance: our feature set is growing, our team -
>     not so much. iscsi_deploy.py is 800 lines of code that can be
>     removed, and some dependencies that can be dropped as well.
>      >>
>      >> As far as I can remember, we've kept the iSCSI deploy for two
>     reasons:
>      >> 1) The direct deploy used to require Glance with Swift backend.
>     The recently added [agent]image_download_source option allows
>     caching and serving images via the ironic's HTTP server, eliminating
>     this problem. I guess we'll have to switch to 'http' by default for
>     this option to keep the out-of-box experience.
>      >> 2) Memory footprint of the direct deploy. With the raw images
>     streaming we no longer have to cache the downloaded images in the
>     agent memory, removing this problem as well (I'm not even sure how
>     much of a problem it is in 2020, even my phone has 4GiB of RAM).
>      >>
>      >> If this proposal is accepted, I suggest to execute it as follows:
>      >> Victoria release:
>      >> 1) Put an early deprecation warning in the release notes.
>      >> 2) Announce the future change of the default value for
>     [agent]image_download_source.
>      >> W release:
>      >> 3) Change [agent]image_download_source to 'http' by default.
>      >> 4) Remove iscsi from the default enabled_deploy_interfaces and
>     move it to the back of the supported list (effectively making direct
>     deploy the default).
>      >> X release:
>      >> 5) Remove the iscsi deploy code from both ironic and IPA.
>      >>
>      >> Thoughts, opinions, suggestions?
>      >>
>      >> Dmitry
>      >

More information about the openstack-discuss mailing list