[ironic][tripleo] RFC: deprecate the iSCSI deploy interface?
Dmitry Tantsur
dtantsur at redhat.com
Mon Aug 24 08:32:57 UTC 2020
Hi,
On Mon, Aug 24, 2020 at 10:24 AM Arne Wiebalck <arne.wiebalck at cern.ch>
wrote:
> Hi!
>
> CERN's deployment is using the iscsi deploy interface since we started
> with Ironic a couple of years ago (and we installed around 5000 nodes
> with it by now). The reason we chose it at the time was simplicity: we
> did not (and still do not) have a Swift backend to Glance, and the iscsi
> interface provided a straightforward alternative.
>
> While we have not seen obscure bugs/issues with it, I can certainly back
> the scalability issues mentioned by Dmitry: the tunneling of the images
> through the controllers can create issues when deploying hundreds of
> nodes at the same time. The security of the iscsi interface is less of a
> concern in our specific environment.
>
> So, why did we not move to direct (yet)? In addition to the lack of
> Swift, mostly since iscsi works for us and the scalability issues were
> not that much of a burning problem ... so we focused on other things :)
>
> Here are some thoughts/suggestions for this discussion:
>
> How would 'direct' work with other Glance backends (like Ceph/RBD in our
> case)? If using direct requires to duplicate images from Glance to
> Ironic (or somewhere else) to be served, I think this would be an
> argument against deprecating iscsi.
>
With image_download_source=http ironic will download the image to the
conductor to be able serve it to the node. Which is exactly what the iscsi
is doing, so not much of a change for you (except for s/iSCSI/HTTP/ as a
means of serving the image).
Would it be an option for you to test direct deploy with
image_download_source=http?
>
> Equally, if this would require to completely move the Glance backend to
> something else, like from RBD to RadosGW, I would not expect happy
> operators. (Does anyone know if RadosGW could even replace Swift for
> this specific use case?)
>
AFAIK ironic works with RadosGW, we have some support code for it.
>
> Do we have numbers on how many deployments use iscsi vs direct? If many
> rely on iscsi, I would also suggest to establish a migration guide for
> operators on how to move from iscsi to direct, for the various configs.
> Recent versions of Glance support multiple backends, so a migration path
> may be to add a new (direct compatible) backend for new images.
>
I don't have any numbers, but a migration guide is a must in any case.
I expect most TripleO consumers to use the iscsi deploy, but only because
it's the default. Their Edge solution uses the direct deploy. I've polled a
few operators I know, they all (except for you, obviously :) seem to use
the direct deploy. Metal3 uses direct deploy.
Dmitry
>
> Cheers,
> Arne
>
> On 20.08.20 17:49, Julia Kreger wrote:
> > I'm having a sense of deja vu!
> >
> > Because of the way the mechanics work, the iscsi deploy driver is in
> > an unfortunate position of being harder to troubleshoot and diagnose
> > failures. Which basically means we've not been able to really identify
> > common failures and add logic to handle them appropriately, like we
> > are able to with a tcp socket and file download. Based on this alone,
> > I think it makes a solid case for us to seriously consider
> > deprecation.
> >
> > Overall, I'm +1 for the proposal and I believe over two cycles is the
> > right way to go.
> >
> > I suspect we're going to have lots of push back from the TripleO
> > community because there has been resistance to change their default
> > usage in the past. As such I'm adding them to the subject so hopefully
> > they will be at least aware.
> >
> > I guess my other worry is operators who already have a substantial
> > operational infrastructure investment built around the iscsi deploy
> > interface. I wonder why they didn't use direct, but maybe they have
> > all migrated in the past ?5? years. This could just be a non-concern
> > in reality, I'm just not sure.
> >
> > Of course, if someone is willing to step up and make the iscsi
> > deployment interface their primary focus, that also shifts the
> > discussion to making direct the default interface?
> >
> > -Julia
> >
> >
> > On Thu, Aug 20, 2020 at 1:57 AM Dmitry Tantsur <dtantsur at redhat.com>
> wrote:
> >>
> >> Hi all,
> >>
> >> Side note for those lacking context: this proposal concerns deprecating
> one of the ironic deploy interfaces detailed in
> https://docs.openstack.org/ironic/latest/admin/interfaces/deploy.html. It
> does not affect the boot-from-iSCSI feature.
> >>
> >> I would like to propose deprecating and removing the 'iscsi' deploy
> interface over the course of the next 2 cycles. The reasons are:
> >> 1) The iSCSI deploy is a source of occasional cryptic bugs when a
> target cannot be discovered or mounted properly.
> >> 2) Its security is questionable: I don't think we even use
> authentication.
> >> 3) Operators confusion: right now we default to the iSCSI deploy but
> pretty much direct everyone who cares about scalability or security to the
> 'direct' deploy.
> >> 4) Cost of maintenance: our feature set is growing, our team - not so
> much. iscsi_deploy.py is 800 lines of code that can be removed, and some
> dependencies that can be dropped as well.
> >>
> >> As far as I can remember, we've kept the iSCSI deploy for two reasons:
> >> 1) The direct deploy used to require Glance with Swift backend. The
> recently added [agent]image_download_source option allows caching and
> serving images via the ironic's HTTP server, eliminating this problem. I
> guess we'll have to switch to 'http' by default for this option to keep the
> out-of-box experience.
> >> 2) Memory footprint of the direct deploy. With the raw images streaming
> we no longer have to cache the downloaded images in the agent memory,
> removing this problem as well (I'm not even sure how much of a problem it
> is in 2020, even my phone has 4GiB of RAM).
> >>
> >> If this proposal is accepted, I suggest to execute it as follows:
> >> Victoria release:
> >> 1) Put an early deprecation warning in the release notes.
> >> 2) Announce the future change of the default value for
> [agent]image_download_source.
> >> W release:
> >> 3) Change [agent]image_download_source to 'http' by default.
> >> 4) Remove iscsi from the default enabled_deploy_interfaces and move it
> to the back of the supported list (effectively making direct deploy the
> default).
> >> X release:
> >> 5) Remove the iscsi deploy code from both ironic and IPA.
> >>
> >> Thoughts, opinions, suggestions?
> >>
> >> Dmitry
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20200824/90639f23/attachment-0001.html>
More information about the openstack-discuss
mailing list