[OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly deleted by octavia

Gaël THEROND gael.therond at gmail.com
Tue Jun 4 13:19:58 UTC 2019


Oh, that's perfect so, I'll just update my image and my platform as we're
using kolla-ansible and that's super easy.

You guys rocks!! (Pun intended ;-)).

Many many thanks to all of you, that will real back me a lot regarding the
Octavia solidity and Kolla flexibility actually ^^.

Le mar. 4 juin 2019 à 15:17, Carlos Goncalves <cgoncalves at redhat.com> a
écrit :

> On Tue, Jun 4, 2019 at 3:06 PM Gaël THEROND <gael.therond at gmail.com>
> wrote:
> >
> > Hi Lingxian Kong,
> >
> > That’s actually very interesting as I’ve come to the same conclusion
> this morning during my investigation and was starting to think about a fix,
> which it seems you already made!
> >
> > Is there a reason why it didn’t was backported to rocky?
>
> The patch was merged in master branch during Rocky development cycle,
> hence included in stable/rocky as well.
>
> >
> > Very helpful, many many thanks to you you clearly spare me hours of
> works! I’ll get a review of your patch and test it on our lab.
> >
> > Le mar. 4 juin 2019 à 11:06, Gaël THEROND <gael.therond at gmail.com> a
> écrit :
> >>
> >> Hi Felix,
> >>
> >> « Glad » you had the same issue before, and yes of course I looked at
> the HM logs which is were I actually found out that this event was
> triggered by octavia (Beside the DB data that validated that) here is my
> log trace related to this event, It doesn't really shows major issue IMHO.
> >>
> >> Here is the stacktrace that our octavia service archived for our both
> controllers servers, with the initial loadbalancer creation trace
> (Worker.log) and both controllers triggered task (Health-Manager.log).
> >>
> >> http://paste.openstack.org/show/7z5aZYu12Ttoae3AOhwF/
> >>
> >> I well may have miss something in it, but I don't see something strange
> on from my point of view.
> >> Feel free to tell me if you spot something weird.
> >>
> >>
> >> Le mar. 4 juin 2019 à 10:38, Felix Hüttner <felix.huettner at mail.schwarz>
> a écrit :
> >>>
> >>> Hi Gael,
> >>>
> >>>
> >>>
> >>> we had a similar issue in the past.
> >>>
> >>> You could check the octiava healthmanager log (should be on the same
> node where the worker is running).
> >>>
> >>> This component monitors the status of the Amphorae and restarts them
> if they don’t trigger a callback after a specific time. This might also
> happen if there is some connection issue between the two components.
> >>>
> >>>
> >>>
> >>> But normally it should at least restart the LB with new Amphorae…
> >>>
> >>>
> >>>
> >>> Hope that helps
> >>>
> >>>
> >>>
> >>> Felix
> >>>
> >>>
> >>>
> >>> From: Gaël THEROND <gael.therond at gmail.com>
> >>> Sent: Tuesday, June 4, 2019 9:44 AM
> >>> To: Openstack <openstack at lists.openstack.org>
> >>> Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly
> deleted by octavia
> >>>
> >>>
> >>>
> >>> Hi guys,
> >>>
> >>>
> >>>
> >>> I’ve a weird situation here.
> >>>
> >>>
> >>>
> >>> I smoothly operate a large scale multi-region Octavia service using
> the default amphora driver which imply the use of nova instances as
> loadbalancers.
> >>>
> >>>
> >>>
> >>> Everything is running really well and our customers (K8s and
> traditional users) are really  happy with the solution so far.
> >>>
> >>>
> >>>
> >>> However, yesterday one of those customers using the loadbalancer in
> front of their ElasticSearch cluster poked me because this loadbalancer
> suddenly passed from ONLINE/OK to ONLINE/ERROR, meaning the amphoras were
> no longer available but yet the anchor/member/pool and listeners settings
> were still existing.
> >>>
> >>>
> >>>
> >>> So I investigated and found out that the loadbalancer amphoras have
> been destroyed by the octavia user.
> >>>
> >>>
> >>>
> >>> The weird part is, both the master and the backup instance have been
> destroyed at the same moment by the octavia service user.
> >>>
> >>>
> >>>
> >>> Is there specific circumstances where the octavia service could decide
> to delete the instances but not the anchor/members/pool ?
> >>>
> >>>
> >>>
> >>> It’s worrying me a bit as there is no clear way to trace why does
> Octavia did take this action.
> >>>
> >>>
> >>>
> >>> I digged within the nova and Octavia DB in order to correlate the
> action but except than validating my investigation it doesn’t really help
> as there are no clue of why the octavia service did trigger the deletion.
> >>>
> >>>
> >>>
> >>> If someone have any clue or tips to give me I’ll be more than happy to
> discuss this situation.
> >>>
> >>>
> >>>
> >>> Cheers guys!
> >>>
> >>> Hinweise zum Datenschutz finden Sie hier.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20190604/3092a7db/attachment-0001.html>


More information about the openstack-discuss mailing list