[ironic][tripleo] RFC: deprecate the iSCSI deploy interface?
Arne Wiebalck
arne.wiebalck at cern.ch
Mon Aug 24 09:03:15 UTC 2020
Hi Dmitry,
On 24.08.20 10:32, Dmitry Tantsur wrote:
> Hi,
>
> On Mon, Aug 24, 2020 at 10:24 AM Arne Wiebalck <arne.wiebalck at cern.ch
> <mailto:arne.wiebalck at cern.ch>> wrote:
>
> Hi!
>
> CERN's deployment is using the iscsi deploy interface since we started
> with Ironic a couple of years ago (and we installed around 5000 nodes
> with it by now). The reason we chose it at the time was simplicity: we
> did not (and still do not) have a Swift backend to Glance, and the iscsi
> interface provided a straightforward alternative.
>
> While we have not seen obscure bugs/issues with it, I can certainly back
> the scalability issues mentioned by Dmitry: the tunneling of the images
> through the controllers can create issues when deploying hundreds of
> nodes at the same time. The security of the iscsi interface is less
> of a
> concern in our specific environment.
>
> So, why did we not move to direct (yet)? In addition to the lack of
> Swift, mostly since iscsi works for us and the scalability issues were
> not that much of a burning problem ... so we focused on other things :)
>
> Here are some thoughts/suggestions for this discussion:
>
> How would 'direct' work with other Glance backends (like Ceph/RBD in
> our
> case)? If using direct requires to duplicate images from Glance to
> Ironic (or somewhere else) to be served, I think this would be an
> argument against deprecating iscsi.
>
>
> With image_download_source=http ironic will download the image to the
> conductor to be able serve it to the node. Which is exactly what the
> iscsi is doing, so not much of a change for you (except for
> s/iSCSI/HTTP/ as a means of serving the image).
>
> Would it be an option for you to test direct deploy with
> image_download_source=http?
Oh, absolutely! I was not aware that setting this option would make
Ironic act as an image buffer (I thought this would expect some URL the
admin had to provide) ... I will try this and let you know.
>
>
> Equally, if this would require to completely move the Glance backend to
> something else, like from RBD to RadosGW, I would not expect happy
> operators. (Does anyone know if RadosGW could even replace Swift for
> this specific use case?)
>
>
> AFAIK ironic works with RadosGW, we have some support code for it.
I was mostly asking to see if RadosGW is a (longer term) option to fully
benefit from direct's inherent scaling.
>
>
> Do we have numbers on how many deployments use iscsi vs direct? If many
> rely on iscsi, I would also suggest to establish a migration guide for
> operators on how to move from iscsi to direct, for the various configs.
> Recent versions of Glance support multiple backends, so a migration path
> may be to add a new (direct compatible) backend for new images.
>
>
> I don't have any numbers, but a migration guide is a must in any case.
>
> I expect most TripleO consumers to use the iscsi deploy, but only
> because it's the default. Their Edge solution uses the direct deploy.
> I've polled a few operators I know, they all (except for you, obviously
> :) seem to use the direct deploy. Metal3 uses direct deploy.
Thanks!
Arne
> Dmitry
>
>
> Cheers,
> Arne
>
> On 20.08.20 17:49, Julia Kreger wrote:
> > I'm having a sense of deja vu!
> >
> > Because of the way the mechanics work, the iscsi deploy driver is in
> > an unfortunate position of being harder to troubleshoot and diagnose
> > failures. Which basically means we've not been able to really
> identify
> > common failures and add logic to handle them appropriately, like we
> > are able to with a tcp socket and file download. Based on this alone,
> > I think it makes a solid case for us to seriously consider
> > deprecation.
> >
> > Overall, I'm +1 for the proposal and I believe over two cycles is the
> > right way to go.
> >
> > I suspect we're going to have lots of push back from the TripleO
> > community because there has been resistance to change their default
> > usage in the past. As such I'm adding them to the subject so
> hopefully
> > they will be at least aware.
> >
> > I guess my other worry is operators who already have a substantial
> > operational infrastructure investment built around the iscsi deploy
> > interface. I wonder why they didn't use direct, but maybe they have
> > all migrated in the past ?5? years. This could just be a non-concern
> > in reality, I'm just not sure.
> >
> > Of course, if someone is willing to step up and make the iscsi
> > deployment interface their primary focus, that also shifts the
> > discussion to making direct the default interface?
> >
> > -Julia
> >
> >
> > On Thu, Aug 20, 2020 at 1:57 AM Dmitry Tantsur
> <dtantsur at redhat.com <mailto:dtantsur at redhat.com>> wrote:
> >>
> >> Hi all,
> >>
> >> Side note for those lacking context: this proposal concerns
> deprecating one of the ironic deploy interfaces detailed in
> https://docs.openstack.org/ironic/latest/admin/interfaces/deploy.html.
> It does not affect the boot-from-iSCSI feature.
> >>
> >> I would like to propose deprecating and removing the 'iscsi'
> deploy interface over the course of the next 2 cycles. The reasons are:
> >> 1) The iSCSI deploy is a source of occasional cryptic bugs when
> a target cannot be discovered or mounted properly.
> >> 2) Its security is questionable: I don't think we even use
> authentication.
> >> 3) Operators confusion: right now we default to the iSCSI deploy
> but pretty much direct everyone who cares about scalability or
> security to the 'direct' deploy.
> >> 4) Cost of maintenance: our feature set is growing, our team -
> not so much. iscsi_deploy.py is 800 lines of code that can be
> removed, and some dependencies that can be dropped as well.
> >>
> >> As far as I can remember, we've kept the iSCSI deploy for two
> reasons:
> >> 1) The direct deploy used to require Glance with Swift backend.
> The recently added [agent]image_download_source option allows
> caching and serving images via the ironic's HTTP server, eliminating
> this problem. I guess we'll have to switch to 'http' by default for
> this option to keep the out-of-box experience.
> >> 2) Memory footprint of the direct deploy. With the raw images
> streaming we no longer have to cache the downloaded images in the
> agent memory, removing this problem as well (I'm not even sure how
> much of a problem it is in 2020, even my phone has 4GiB of RAM).
> >>
> >> If this proposal is accepted, I suggest to execute it as follows:
> >> Victoria release:
> >> 1) Put an early deprecation warning in the release notes.
> >> 2) Announce the future change of the default value for
> [agent]image_download_source.
> >> W release:
> >> 3) Change [agent]image_download_source to 'http' by default.
> >> 4) Remove iscsi from the default enabled_deploy_interfaces and
> move it to the back of the supported list (effectively making direct
> deploy the default).
> >> X release:
> >> 5) Remove the iscsi deploy code from both ironic and IPA.
> >>
> >> Thoughts, opinions, suggestions?
> >>
> >> Dmitry
> >
>
More information about the openstack-discuss
mailing list