[OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly deleted by octavia

Gaël THEROND gael.therond at gmail.com
Mon Jun 10 13:14:06 UTC 2019


Hi guys,

Just a quick question regarding this bug, someone told me that it have been
patched within stable/rocky, BUT, were you talking about the
openstack/octavia repositoy or the openstack/kolla repository?

Many Thanks!

Le mar. 4 juin 2019 à 15:19, Gaël THEROND <gael.therond at gmail.com> a écrit :

> Oh, that's perfect so, I'll just update my image and my platform as we're
> using kolla-ansible and that's super easy.
>
> You guys rocks!! (Pun intended ;-)).
>
> Many many thanks to all of you, that will real back me a lot regarding the
> Octavia solidity and Kolla flexibility actually ^^.
>
> Le mar. 4 juin 2019 à 15:17, Carlos Goncalves <cgoncalves at redhat.com> a
> écrit :
>
>> On Tue, Jun 4, 2019 at 3:06 PM Gaël THEROND <gael.therond at gmail.com>
>> wrote:
>> >
>> > Hi Lingxian Kong,
>> >
>> > That’s actually very interesting as I’ve come to the same conclusion
>> this morning during my investigation and was starting to think about a fix,
>> which it seems you already made!
>> >
>> > Is there a reason why it didn’t was backported to rocky?
>>
>> The patch was merged in master branch during Rocky development cycle,
>> hence included in stable/rocky as well.
>>
>> >
>> > Very helpful, many many thanks to you you clearly spare me hours of
>> works! I’ll get a review of your patch and test it on our lab.
>> >
>> > Le mar. 4 juin 2019 à 11:06, Gaël THEROND <gael.therond at gmail.com> a
>> écrit :
>> >>
>> >> Hi Felix,
>> >>
>> >> « Glad » you had the same issue before, and yes of course I looked at
>> the HM logs which is were I actually found out that this event was
>> triggered by octavia (Beside the DB data that validated that) here is my
>> log trace related to this event, It doesn't really shows major issue IMHO.
>> >>
>> >> Here is the stacktrace that our octavia service archived for our both
>> controllers servers, with the initial loadbalancer creation trace
>> (Worker.log) and both controllers triggered task (Health-Manager.log).
>> >>
>> >> http://paste.openstack.org/show/7z5aZYu12Ttoae3AOhwF/
>> >>
>> >> I well may have miss something in it, but I don't see something
>> strange on from my point of view.
>> >> Feel free to tell me if you spot something weird.
>> >>
>> >>
>> >> Le mar. 4 juin 2019 à 10:38, Felix Hüttner <felix.huettner at mail.schwarz>
>> a écrit :
>> >>>
>> >>> Hi Gael,
>> >>>
>> >>>
>> >>>
>> >>> we had a similar issue in the past.
>> >>>
>> >>> You could check the octiava healthmanager log (should be on the same
>> node where the worker is running).
>> >>>
>> >>> This component monitors the status of the Amphorae and restarts them
>> if they don’t trigger a callback after a specific time. This might also
>> happen if there is some connection issue between the two components.
>> >>>
>> >>>
>> >>>
>> >>> But normally it should at least restart the LB with new Amphorae…
>> >>>
>> >>>
>> >>>
>> >>> Hope that helps
>> >>>
>> >>>
>> >>>
>> >>> Felix
>> >>>
>> >>>
>> >>>
>> >>> From: Gaël THEROND <gael.therond at gmail.com>
>> >>> Sent: Tuesday, June 4, 2019 9:44 AM
>> >>> To: Openstack <openstack at lists.openstack.org>
>> >>> Subject: [OCTAVIA][ROCKY] - MASTER & BACKUP instances unexpectedly
>> deleted by octavia
>> >>>
>> >>>
>> >>>
>> >>> Hi guys,
>> >>>
>> >>>
>> >>>
>> >>> I’ve a weird situation here.
>> >>>
>> >>>
>> >>>
>> >>> I smoothly operate a large scale multi-region Octavia service using
>> the default amphora driver which imply the use of nova instances as
>> loadbalancers.
>> >>>
>> >>>
>> >>>
>> >>> Everything is running really well and our customers (K8s and
>> traditional users) are really  happy with the solution so far.
>> >>>
>> >>>
>> >>>
>> >>> However, yesterday one of those customers using the loadbalancer in
>> front of their ElasticSearch cluster poked me because this loadbalancer
>> suddenly passed from ONLINE/OK to ONLINE/ERROR, meaning the amphoras were
>> no longer available but yet the anchor/member/pool and listeners settings
>> were still existing.
>> >>>
>> >>>
>> >>>
>> >>> So I investigated and found out that the loadbalancer amphoras have
>> been destroyed by the octavia user.
>> >>>
>> >>>
>> >>>
>> >>> The weird part is, both the master and the backup instance have been
>> destroyed at the same moment by the octavia service user.
>> >>>
>> >>>
>> >>>
>> >>> Is there specific circumstances where the octavia service could
>> decide to delete the instances but not the anchor/members/pool ?
>> >>>
>> >>>
>> >>>
>> >>> It’s worrying me a bit as there is no clear way to trace why does
>> Octavia did take this action.
>> >>>
>> >>>
>> >>>
>> >>> I digged within the nova and Octavia DB in order to correlate the
>> action but except than validating my investigation it doesn’t really help
>> as there are no clue of why the octavia service did trigger the deletion.
>> >>>
>> >>>
>> >>>
>> >>> If someone have any clue or tips to give me I’ll be more than happy
>> to discuss this situation.
>> >>>
>> >>>
>> >>>
>> >>> Cheers guys!
>> >>>
>> >>> Hinweise zum Datenschutz finden Sie hier.
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20190610/e9bf1d7c/attachment-0001.html>


More information about the openstack-discuss mailing list