[ironic][tripleo] RFC: deprecate the iSCSI deploy interface?

Dmitry Tantsur dtantsur at redhat.com
Tue Aug 25 07:46:42 UTC 2020


On Mon, Aug 24, 2020 at 1:52 PM Sean Mooney <smooney at redhat.com> wrote:

> On Mon, 2020-08-24 at 10:32 +0200, Dmitry Tantsur wrote:
> > Hi,
> >
> > On Mon, Aug 24, 2020 at 10:24 AM Arne Wiebalck <arne.wiebalck at cern.ch>
> > wrote:
> >
> > > Hi!
> > >
> > > CERN's deployment is using the iscsi deploy interface since we started
> > > with Ironic a couple of years ago (and we installed around 5000 nodes
> > > with it by now). The reason we chose it at the time was simplicity: we
> > > did not (and still do not) have a Swift backend to Glance, and the
> iscsi
> > > interface provided a straightforward alternative.
> > >
> > > While we have not seen obscure bugs/issues with it, I can certainly
> back
> > > the scalability issues mentioned by Dmitry: the tunneling of the images
> > > through the controllers can create issues when deploying hundreds of
> > > nodes at the same time. The security of the iscsi interface is less of
> a
> > > concern in our specific environment.
> > >
> > > So, why did we not move to direct (yet)? In addition to the lack of
> > > Swift, mostly since iscsi works for us and the scalability issues were
> > > not that much of a burning problem ... so we focused on other things :)
> > >
> > > Here are some thoughts/suggestions for this discussion:
> > >
> > > How would 'direct' work with other Glance backends (like Ceph/RBD in
> our
> > > case)? If using direct requires to duplicate images from Glance to
> > > Ironic (or somewhere else) to be served, I think this would be an
> > > argument against deprecating iscsi.
> > >
> >
> > With image_download_source=http ironic will download the image to the
> > conductor to be able serve it to the node. Which is exactly what the
> iscsi
> > is doing, so not much of a change for you (except for s/iSCSI/HTTP/ as a
> > means of serving the image).
> >
> > Would it be an option for you to test direct deploy with
> > image_download_source=http?
> i think if there is still an option to not force deployemnt to altere any
> of there
> other sevices this is likely ok but i think the onious shoudl be on the
> ironic
> and ooo teams to ensure there is an upgrade path for those useres before
> this deprecation
> becomes a removal without deploying swift or a swift compatibale api e.g.
> RadosGW
>
> perhaps a ci job could be put in place maybe using grenade that starts
> with iscsi and moves
> to direct with http porvided to show that just setting that weill allow
> the conductor to download
> the image from glance and server it to the ipa.
>

This is the CI job with direct deploy in a low RAM environment with a large
image (CentOS) without Swift:
https://zuul.opendev.org/t/openstack/build/58f623d90435470f9095eb68202c25f8

The change is https://review.opendev.org/#/c/747413/

Dmitry


>
>
> unlike cern i just use ironic in a tiny home deployment where i have an
> all in one deployment + 4 addtional
> nodes for ironic. i cant deploy swift as all my disks are already in use
> for cinder so down the line when
> i eventually upgrade to vicortia and wallaby  i would either have to drop
> ironic or not upgrade it
> if there is not a option to just pull the image from glance or glance via
> the conductor. enhancing the ipa
> to pull directly from glance would also proably work for many who use
> iscsi today but that would depend on your network
> toplogy i guess.
> >
> >
> > >
> > > Equally, if this would require to completely move the Glance backend to
> > > something else, like from RBD to RadosGW, I would not expect happy
> > > operators. (Does anyone know if RadosGW could even replace Swift for
> > > this specific use case?)
> > >
> >
> > AFAIK ironic works with RadosGW, we have some support code for it.
> >
> >
> > >
> > > Do we have numbers on how many deployments use iscsi vs direct? If many
> > > rely on iscsi, I would also suggest to establish a migration guide for
> > > operators on how to move from iscsi to direct, for the various configs.
> > > Recent versions of Glance support multiple backends, so a migration
> path
> > > may be to add a new (direct compatible) backend for new images.
> > >
> >
> > I don't have any numbers, but a migration guide is a must in any case.
> >
> > I expect most TripleO consumers to use the iscsi deploy, but only because
> > it's the default. Their Edge solution uses the direct deploy. I've
> polled a
> > few operators I know, they all (except for you, obviously :) seem to use
> > the direct deploy. Metal3 uses direct deploy.
> >
> > Dmitry
> >
> >
> > >
> > > Cheers,
> > >   Arne
> > >
> > > On 20.08.20 17:49, Julia Kreger wrote:
> > > > I'm having a sense of deja vu!
> > > >
> > > > Because of the way the mechanics work, the iscsi deploy driver is in
> > > > an unfortunate position of being harder to troubleshoot and diagnose
> > > > failures. Which basically means we've not been able to really
> identify
> > > > common failures and add logic to handle them appropriately, like we
> > > > are able to with a tcp socket and file download. Based on this alone,
> > > > I think it makes a solid case for us to seriously consider
> > > > deprecation.
> > > >
> > > > Overall, I'm +1 for the proposal and I believe over two cycles is the
> > > > right way to go.
> > > >
> > > > I suspect we're going to have lots of push back from the TripleO
> > > > community because there has been resistance to change their default
> > > > usage in the past. As such I'm adding them to the subject so
> hopefully
> > > > they will be at least aware.
> > > >
> > > > I guess my other worry is operators who already have a substantial
> > > > operational infrastructure investment built around the iscsi deploy
> > > > interface. I wonder why they didn't use direct, but maybe they have
> > > > all migrated in the past ?5? years. This could just be a non-concern
> > > > in reality, I'm just not sure.
> > > >
> > > > Of course, if someone is willing to step up and make the iscsi
> > > > deployment interface their primary focus, that also shifts the
> > > > discussion to making direct the default interface?
> > > >
> > > > -Julia
> > > >
> > > >
> > > > On Thu, Aug 20, 2020 at 1:57 AM Dmitry Tantsur <dtantsur at redhat.com>
> > >
> > > wrote:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > Side note for those lacking context: this proposal concerns
> deprecating
> > >
> > > one of the ironic deploy interfaces detailed in
> > > https://docs.openstack.org/ironic/latest/admin/interfaces/deploy.html.
> It
> > > does not affect the boot-from-iSCSI feature.
> > > > >
> > > > > I would like to propose deprecating and removing the 'iscsi' deploy
> > >
> > > interface over the course of the next 2 cycles. The reasons are:
> > > > > 1) The iSCSI deploy is a source of occasional cryptic bugs when a
> > >
> > > target cannot be discovered or mounted properly.
> > > > > 2) Its security is questionable: I don't think we even use
> > >
> > > authentication.
> > > > > 3) Operators confusion: right now we default to the iSCSI deploy
> but
> > >
> > > pretty much direct everyone who cares about scalability or security to
> the
> > > 'direct' deploy.
> > > > > 4) Cost of maintenance: our feature set is growing, our team - not
> so
> > >
> > > much. iscsi_deploy.py is 800 lines of code that can be removed, and
> some
> > > dependencies that can be dropped as well.
> > > > >
> > > > > As far as I can remember, we've kept the iSCSI deploy for two
> reasons:
> > > > > 1) The direct deploy used to require Glance with Swift backend. The
> > >
> > > recently added [agent]image_download_source option allows caching and
> > > serving images via the ironic's HTTP server, eliminating this problem.
> I
> > > guess we'll have to switch to 'http' by default for this option to
> keep the
> > > out-of-box experience.
> > > > > 2) Memory footprint of the direct deploy. With the raw images
> streaming
> > >
> > > we no longer have to cache the downloaded images in the agent memory,
> > > removing this problem as well (I'm not even sure how much of a problem
> it
> > > is in 2020, even my phone has 4GiB of RAM).
> > > > >
> > > > > If this proposal is accepted, I suggest to execute it as follows:
> > > > > Victoria release:
> > > > > 1) Put an early deprecation warning in the release notes.
> > > > > 2) Announce the future change of the default value for
> > >
> > > [agent]image_download_source.
> > > > > W release:
> > > > > 3) Change [agent]image_download_source to 'http' by default.
> > > > > 4) Remove iscsi from the default enabled_deploy_interfaces and
> move it
> > >
> > > to the back of the supported list (effectively making direct deploy the
> > > default).
> > > > > X release:
> > > > > 5) Remove the iscsi deploy code from both ironic and IPA.
> > > > >
> > > > > Thoughts, opinions, suggestions?
> > > > >
> > > > > Dmitry
> > >
> > >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20200825/be111943/attachment.html>


More information about the openstack-discuss mailing list