[OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly deleted by octavia

Gaël THEROND gael.therond at gmail.com
Tue Jun 11 12:15:46 UTC 2019


Oh, really sorry, I was looking at your answer from my mobile mailing app
and it didn't shows, sorry ^^

Many thanks for your help!

Le mar. 11 juin 2019 à 14:13, Carlos Goncalves <cgoncalves at redhat.com> a
écrit :

> You can find the commit hash from the link I provided. The patch is
> available from Queens so it is also available in Stein.
>
> On Tue, Jun 11, 2019 at 2:10 PM Gaël THEROND <gael.therond at gmail.com>
> wrote:
> >
> > Ok nice, do you have the commit hash? I would look at it and validate
> that it have been committed to Stein too so I could bump my service to
> stein using Kolla.
> >
> > Thanks!
> >
> > Le mar. 11 juin 2019 à 12:59, Carlos Goncalves <cgoncalves at redhat.com>
> a écrit :
> >>
> >> On Mon, Jun 10, 2019 at 3:14 PM Gaël THEROND <gael.therond at gmail.com>
> wrote:
> >> >
> >> > Hi guys,
> >> >
> >> > Just a quick question regarding this bug, someone told me that it
> have been patched within stable/rocky, BUT, were you talking about the
> openstack/octavia repositoy or the openstack/kolla repository?
> >>
> >> Octavia.
> >>
> >>
> https://review.opendev.org/#/q/Ief97ddda8261b5bbc54c6824f90ae9c7a2d81701
> >>
> >> >
> >> > Many Thanks!
> >> >
> >> > Le mar. 4 juin 2019 à 15:19, Gaël THEROND <gael.therond at gmail.com> a
> écrit :
> >> >>
> >> >> Oh, that's perfect so, I'll just update my image and my platform as
> we're using kolla-ansible and that's super easy.
> >> >>
> >> >> You guys rocks!! (Pun intended ;-)).
> >> >>
> >> >> Many many thanks to all of you, that will real back me a lot
> regarding the Octavia solidity and Kolla flexibility actually ^^.
> >> >>
> >> >> Le mar. 4 juin 2019 à 15:17, Carlos Goncalves <cgoncalves at redhat.com>
> a écrit :
> >> >>>
> >> >>> On Tue, Jun 4, 2019 at 3:06 PM Gaël THEROND <gael.therond at gmail.com>
> wrote:
> >> >>> >
> >> >>> > Hi Lingxian Kong,
> >> >>> >
> >> >>> > That’s actually very interesting as I’ve come to the same
> conclusion this morning during my investigation and was starting to think
> about a fix, which it seems you already made!
> >> >>> >
> >> >>> > Is there a reason why it didn’t was backported to rocky?
> >> >>>
> >> >>> The patch was merged in master branch during Rocky development
> cycle,
> >> >>> hence included in stable/rocky as well.
> >> >>>
> >> >>> >
> >> >>> > Very helpful, many many thanks to you you clearly spare me hours
> of works! I’ll get a review of your patch and test it on our lab.
> >> >>> >
> >> >>> > Le mar. 4 juin 2019 à 11:06, Gaël THEROND <gael.therond at gmail.com>
> a écrit :
> >> >>> >>
> >> >>> >> Hi Felix,
> >> >>> >>
> >> >>> >> « Glad » you had the same issue before, and yes of course I
> looked at the HM logs which is were I actually found out that this event
> was triggered by octavia (Beside the DB data that validated that) here is
> my log trace related to this event, It doesn't really shows major issue
> IMHO.
> >> >>> >>
> >> >>> >> Here is the stacktrace that our octavia service archived for our
> both controllers servers, with the initial loadbalancer creation trace
> (Worker.log) and both controllers triggered task (Health-Manager.log).
> >> >>> >>
> >> >>> >> http://paste.openstack.org/show/7z5aZYu12Ttoae3AOhwF/
> >> >>> >>
> >> >>> >> I well may have miss something in it, but I don't see something
> strange on from my point of view.
> >> >>> >> Feel free to tell me if you spot something weird.
> >> >>> >>
> >> >>> >>
> >> >>> >> Le mar. 4 juin 2019 à 10:38, Felix Hüttner
> <felix.huettner at mail.schwarz> a écrit :
> >> >>> >>>
> >> >>> >>> Hi Gael,
> >> >>> >>>
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> we had a similar issue in the past.
> >> >>> >>>
> >> >>> >>> You could check the octiava healthmanager log (should be on the
> same node where the worker is running).
> >> >>> >>>
> >> >>> >>> This component monitors the status of the Amphorae and restarts
> them if they don’t trigger a callback after a specific time. This might
> also happen if there is some connection issue between the two components.
> >> >>> >>>
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> But normally it should at least restart the LB with new
> Amphorae…
> >> >>> >>>
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> Hope that helps
> >> >>> >>>
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> Felix
> >> >>> >>>
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> From: Gaël THEROND <gael.therond at gmail.com>
> >> >>> >>> Sent: Tuesday, June 4, 2019 9:44 AM
> >> >>> >>> To: Openstack <openstack at lists.openstack.org>
> >> >>> >>> Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances
> unexpectedly deleted by octavia
> >> >>> >>>
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> Hi guys,
> >> >>> >>>
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> I’ve a weird situation here.
> >> >>> >>>
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> I smoothly operate a large scale multi-region Octavia service
> using the default amphora driver which imply the use of nova instances as
> loadbalancers.
> >> >>> >>>
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> Everything is running really well and our customers (K8s and
> traditional users) are really  happy with the solution so far.
> >> >>> >>>
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> However, yesterday one of those customers using the
> loadbalancer in front of their ElasticSearch cluster poked me because this
> loadbalancer suddenly passed from ONLINE/OK to ONLINE/ERROR, meaning the
> amphoras were no longer available but yet the anchor/member/pool and
> listeners settings were still existing.
> >> >>> >>>
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> So I investigated and found out that the loadbalancer amphoras
> have been destroyed by the octavia user.
> >> >>> >>>
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> The weird part is, both the master and the backup instance have
> been destroyed at the same moment by the octavia service user.
> >> >>> >>>
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> Is there specific circumstances where the octavia service could
> decide to delete the instances but not the anchor/members/pool ?
> >> >>> >>>
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> It’s worrying me a bit as there is no clear way to trace why
> does Octavia did take this action.
> >> >>> >>>
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> I digged within the nova and Octavia DB in order to correlate
> the action but except than validating my investigation it doesn’t really
> help as there are no clue of why the octavia service did trigger the
> deletion.
> >> >>> >>>
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> If someone have any clue or tips to give me I’ll be more than
> happy to discuss this situation.
> >> >>> >>>
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> Cheers guys!
> >> >>> >>>
> >> >>> >>> Hinweise zum Datenschutz finden Sie hier.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20190611/8f5b2128/attachment.html>


More information about the openstack-discuss mailing list