<div>                We use keepalived and exabgp to manage failover for haproxy. That works but it takes a few minutes, and during those few minutes customers experience impact. We tell them to not build/delete VMs during patching, but they still do, and then complain about the failures.<br><br>We're planning to experiment with adding a "manual" haproxy failover to our patching automation, but I'm wondering if there is anything on the controller that needs to be failed over or disabled before rebooting the KVM. I looked at the "remove from cluster" and "add to cluster" procedures but that seems unnecessarily cumbersome for rebooting the KVM.<br>            </div>            <div class="yahoo_quoted" style="margin:10px 0px 0px 0.8ex;border-left:1px solid #ccc;padding-left:1ex;">                        <div style="font-family:'Helvetica Neue', Helvetica, Arial, sans-serif;font-size:13px;color:#26282a;">                                <div>                    On Friday, May 12, 2023, 03:42:42 AM EDT, Eugen Block <eblock@nde.ag> wrote:                </div>                <div><br></div>                <div><br></div>                <div>Hi Albert,<br><br>how is your haproxy placement controlled, something like pacemaker or  <br>similar? I would always do a failover when I'm aware of interruptions  <br>(maintenance window), that should speed things up for clients. We have  <br>a pacemaker controlled HA control plane, it takes more time until  <br>pacemaker realizes that the resource is gone if I just rebooted a  <br>server without failing over. I have no benchmarks though. There's  <br>always a risk of losing a couple of requests during the failover but  <br>we didn't have complaints yet, I believe most of the components try to  <br>resend the lost messages. In one of our customer's cluster with many  <br>resources (they also use terraform) I haven't seen issues during a  <br>regular maintenance window. When they had a DNS outage a few months  <br>back it resulted in a mess, manual cleaning was necessary, but the  <br>regular failovers seem to work just fine.<br>And I don't see rabbitmq issues either after rebooting a server,  <br>usually the haproxy (and virtual IP) failover suffice to prevent  <br>interruptions.<br><br>Regards,<br>Eugen<br><br>Zitat von Satish Patel <<a ymailto="mailto:satish.txt@gmail.com" href="mailto:satish.txt@gmail.com">satish.txt@gmail.com</a>>:<br><br>> Are you running your stack on top of the kvm virtual machine? How many<br>> controller nodes do you have? mostly rabbitMQ causing issues if you restart<br>> controller nodes.<br>><br>> On Thu, May 11, 2023 at 8:34 AM Albert Braden <<a ymailto="mailto:ozzzo@yahoo.com" href="mailto:ozzzo@yahoo.com">ozzzo@yahoo.com</a>> wrote:<br>><br>>> We have our haproxy and controller nodes on KVM hosts. When those KVM<br>>> hosts are restarted, customers who are building or deleting VMs see impact.<br>>> VMs may go into error status, fail to get DNS records, fail to delete, etc.<br>>> The obvious reason is because traffic that is being routed to the haproxy<br>>> on the restarting KVM is lost. If we manually fail over haproxy before<br>>> restarting the KVM, will that be sufficient to stop traffic being lost, or<br>>> do we also need to do something with the controller?<br>>><br>>><br><br><br><br><br></div>            </div>                </div>