<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p>Hi Tony,</p>

    <p>If I understand correctly, now you're Magnum env can create k8s

      cluster successfully. But the auto scaling failure caused the

      update_failed status, is it? If so, cluster resize should be able

      to bring the cluster back. And you can just resize the cluster to

      the current node number.  For that case, magnum should be able to

      fix the heat stack.</p>

    <p>If you failed with resize, then better check the heat log to

      understand why the heat stack update failed. <br>

    </p>

    <p><br>

    </p>

    <div class="moz-cite-prefix">On 11/08/21 7:04 pm, Tony Pearce wrote:<br>

    </div>

    <blockquote type="cite"

cite="mid:CAK1+d=7PNu0AwsOixnro2qtuEBvkPFmOY=s_XodoeeJwOM8N-A@mail.gmail.com">

      <meta http-equiv="content-type" content="text/html; charset=UTF-8">

      <div dir="ltr">

        <div class="gmail_default"

          style="font-family:verdana,sans-serif;color:#666666">I sent

          this mail last week looking for some insight with regards to a

          magnum issue we had. I hadnt seen any reply and searched for

          my sent mail - I found I did not complete the subject line.

          Sorry about that. </div>

        <div class="gmail_default"

          style="font-family:verdana,sans-serif;color:#666666"><br>

        </div>

        <div class="gmail_default"

          style="font-family:verdana,sans-serif;color:#666666">Resending

          again here with a subject. If anyone has any insight to this

          I'd be grateful to hear from you. </div>

        <div class="gmail_default"

          style="font-family:verdana,sans-serif;color:#666666"><br>

        </div>

        <div class="gmail_default"

          style="font-family:verdana,sans-serif;color:#666666">Kind

          regards,</div>

        <div class="gmail_default"

          style="font-family:verdana,sans-serif;color:#666666"><br>

        </div>

        <div>

          <div dir="ltr" class="gmail_signature"

            data-smartmail="gmail_signature">

            <div dir="ltr">Tony Pearce<br>

              <br>

              <br>

            </div>

          </div>

        </div>

        <br>

        <br>

        <div class="gmail_quote">

          <div dir="ltr" class="gmail_attr">---------- Forwarded message

            ---------<br>

            From: <strong class="gmail_sendername" dir="auto">Tony

              Pearce</strong> <span dir="auto"><<a

                href="mailto:tonyppe@gmail.com" moz-do-not-send="true">tonyppe@gmail.com</a>></span><br>

            Date: Thu, 5 Aug 2021 at 14:22<br>

            Subject: [magnum] [kolla-ansible] [kayobe] [Victoria]<br>

            To: OpenStack Discuss <<a

              href="mailto:openstack-discuss@lists.openstack.org"

              moz-do-not-send="true">openstack-discuss@lists.openstack.org</a>><br>

          </div>

          <br>

          <br>

          <div dir="ltr">Testing out Kubernetes with Magnum project,

            deployed via kayobe on Victoria we have deployed an auto

            scaling cluster and have run into a problem and I'm not sure

            how to proceed. I understand that the cluster tried to scale

            up but the openstack project did not have enough CPU

            resources to accommodate it (error= Quota exceeded for

            cores: Requested 4, but already used 20 of 20 cores).<br>

            <br>

            So the situation is that the cluster shows "healthy" and

            "UPDATE_FAILED" but also kubectl commands are failing [1].<br>

            <br>

            What is required to return the cluster back to a working

            status at this point? I have tried:<br>

            - cluster resize to reduce number of workers<br>

            - cluster resize to increase number of workers after

            increasing project quota<br>

            - cluster resize and maintaining the same number of workers<br>

            <br>

            When trying any of the above, horizon shows an immediate

            error "Unable to resize given cluster" but magnum logs and

            heat logs do not show any log update at all at that time.<br>

            <br>

            - using "check stack" and resume stack in the stack horizon

            menu gives this error [2]<br>

            <br>

            Investigating the kubectl issue, it was noted that some

            services had failed on the master node [3]. Manual start as

            well as reboot the node did not bring up the services.

            Unfortunately I dont have ssh access to the master and no

            further information has been forthcoming with regards to

            logs for those service failures so I am unable to provide

            anything around that here.<br>

            <br>

            I found this link [4] so I decided to delete the master node

            then run "check" cluster again but the check cluster just

            fails in the same way except this time it fails saying that

            it cannot find the master [5] while the previous error was

            that it could not find a node.<br>

            <br>

            Ideally I would prefer to recover the cluster - whether this

            is still possible I am unsure. I can probably recreate this

            scenario again. What steps should be performed in this case

            to restore the cluster?<br>

            <br>

            <br>

            <br>

            [1]<br>

            kubectl get no<br>

            Error from server (Timeout): the server was unable to return

            a response in the time allotted, but may still be processing

            the request (get nodes)<br>

            <br>

            [2]<br>

            Resource CHECK failed: ["['NotFound:

            resources[4].resources.kube-minion: Instance None could not

            be found. (HTTP 404) (Request-ID:

            req-6069ff6a-9eb6-4bce-bb25-4ef001ebc428)']. 'CHECK' not

            fully supported (see resources)"]<br>

            <br>

            [3]<br>

            <br>

            [systemd]<br>

            Failed Units: 3<br>

              etcd.service<br>

              heat-container-agent.service<br>

              logrotate.service<br>

            <br>

            [4] <a

              href="https://bugzilla.redhat.com/show_bug.cgi?id=1459854"

              target="_blank" moz-do-not-send="true">https://bugzilla.redhat.com/show_bug.cgi?id=1459854</a><br>

            <br>

            [5]<br>

            <br>

            ["['NotFound:

            resources.kube_masters.resources[0].resources.kube-master:

            Instance c6185e8e-1a98-4925-959b-0a56210b8c9e could not be

            found. (HTTP 404) (Request-ID:

            req-bdfcc853-7dbb-4022-9208-68b1ab31008a)']. 'CHECK' not

            fully supported (see resources)"].<br>

            <br>

            Kind regards,<br>

            <br>

            Tony Pearce</div>

        </div>

      </div>

    </blockquote>

    <pre class="moz-signature" cols="72">-- 

Cheers & Best regards,

------------------------------------------------------------------------------

Feilong Wang (王飞龙) (he/him)

Head of Research & Development

Catalyst Cloud

Aotearoa's own

Mob: +64 21 0832 6348 | <a class="moz-txt-link-abbreviated" href="http://www.catalystcloud.nz">www.catalystcloud.nz</a>

Level 6, 150 Willis Street, Wellington 6011, New Zealand

CONFIDENTIALITY NOTICE: This email is intended for the named recipients only.

It may contain privileged, confidential or copyright information. If you are 

not the named recipient, any use, reliance upon, disclosure or copying of this 

email or its attachments is unauthorised. If you have received this email in 

error, please reply via email or call +64 21 0832 6348.

------------------------------------------------------------------------------</pre>

  </body>

</html>