VIP switch causing connections that refuse to die
Hi, I found that keepalived VIP switch cause TCP connections that refuse to die on host where VIP was before it was switched. I filled a bug here -> https://bugs.launchpad.net/kolla-ansible/+bug/1917068 Fixed here -> https://review.opendev.org/c/openstack/kolla-ansible/+/777772 Video presentation of bug here -> https://download.kevko.ultimum.cloud/video_debug.mp4 I was just curious and wanted to ask here in openstack-discuss : If someone already seen this issue in past ? If yes, do you tweak net.ipv4.tcp_retries2 kernel parameter ? If no, how did you solve this ? Thank you, Michal Arbet ( kevko )
On Fri, 26 Feb 2021 at 17:08, Michal Arbet <michal.arbet@ultimum.io> wrote:
Hi,
I found that keepalived VIP switch cause TCP connections that refuse to die on host where VIP was before it was switched.
I filled a bug here -> https://bugs.launchpad.net/kolla-ansible/+bug/1917068 Fixed here -> https://review.opendev.org/c/openstack/kolla-ansible/+/777772 Video presentation of bug here -> https://download.kevko.ultimum.cloud/video_debug.mp4
I was just curious and wanted to ask here in openstack-discuss :
If someone already seen this issue in past ? If yes, do you tweak net.ipv4.tcp_retries2 kernel parameter ? If no, how did you solve this ?
Hi Michal, thanks for the investigation here. There is a nice tool, that I found far too late, that helps to help answer questions like this: https://codesearch.opendev.org/
Thank you, Michal Arbet ( kevko )
Hi, I really like the tool, thank you. also confirms my idea as this kernel option is also used by other projects where they explicitly mention case with keepalived/VIP switch. Now I definitively think this should be handled also in kolla-ansible. Thanks, Michal Arbet ( kevko ) Dne po 1. 3. 2021 9:53 uživatel Mark Goddard <mark@stackhpc.com> napsal:
On Fri, 26 Feb 2021 at 17:08, Michal Arbet <michal.arbet@ultimum.io> wrote:
Hi,
I found that keepalived VIP switch cause TCP connections that refuse
to die on host where VIP was before it was switched.
I filled a bug here ->
https://bugs.launchpad.net/kolla-ansible/+bug/1917068
Fixed here -> https://review.opendev.org/c/openstack/kolla-ansible/+/777772 Video presentation of bug here -> https://download.kevko.ultimum.cloud/video_debug.mp4
I was just curious and wanted to ask here in openstack-discuss :
If someone already seen this issue in past ? If yes, do you tweak net.ipv4.tcp_retries2 kernel parameter ? If no, how did you solve this ?
Hi Michal, thanks for the investigation here. There is a nice tool, that I found far too late, that helps to help answer questions like this: https://codesearch.opendev.org/
Thank you, Michal Arbet ( kevko )
On Mon, Mar 1, 2021 at 11:06 AM Michal Arbet <michal.arbet@ultimum.io> wrote:
I really like the tool, thank you. also confirms my idea as this kernel option is also used by other projects where they explicitly mention case with keepalived/VIP switch.
FWIW, I can see only StarlingX and Airship, none of general-purpose tools seem to mention customising this variable. -yoctozepto
Well, but that doesn't mean it's right that they don't have it configured. If you google "net.ipv4.tcp_retries2 keepalive" and will read results, you will see that this option is widely used I think we have to discuss option value (not fix itself)...to find some golden middle .. https://www.programmersought.com/article/724162740/ https://knowledge.broadcom.com/external/article/142410/tuning-tcp-keepalive-... https://www.ibm.com/support/knowledgecenter/ko/SSEPGG_9.7.0/com.ibm.db2.luw.... https://www.suse.com/support/kb/doc/?id=000019293 https://programmer.group/kubeadm-build-highly-available-kubernetes-1.15.1.ht... po 1. 3. 2021 v 14:09 odesílatel Radosław Piliszek < radoslaw.piliszek@gmail.com> napsal:
I really like the tool, thank you. also confirms my idea as this kernel
On Mon, Mar 1, 2021 at 11:06 AM Michal Arbet <michal.arbet@ultimum.io> wrote: option is also used by other projects where they explicitly mention case with keepalived/VIP switch.
FWIW, I can see only StarlingX and Airship, none of general-purpose tools seem to mention customising this variable.
-yoctozepto
Well, but that doesn't mean it's right that they don't have it configured. If you google "net.ipv4.tcp_retries2 keepalive" and will read results, you will see that this option is widely used
On Mon, 2021-03-01 at 17:02 +0100, Michal Arbet wrote: this is something that operators can fix themselve externally however. im not against haveign kolla or other tools be able to configur it automticlly persay but its not kolla-ansibles job to configure every possible tuneing. this one might make sense to do by default or optionally but host config is largely out of scope of kolla ansible. in ooo where that is ment to handel all host config as well as openstack configuration or other tools where that is expliclty in socpe may also want to wtwee it but likely it shoudl be configurable.
I think we have to discuss option value (not fix itself)...to find some golden middle ..
https://www.programmersought.com/article/724162740/ https://knowledge.broadcom.com/external/article/142410/tuning-tcp-keepalive-... https://www.ibm.com/support/knowledgecenter/ko/SSEPGG_9.7.0/com.ibm.db2.luw.... https://www.suse.com/support/kb/doc/?id=000019293 https://programmer.group/kubeadm-build-highly-available-kubernetes-1.15.1.ht...
po 1. 3. 2021 v 14:09 odesílatel Radosław Piliszek < radoslaw.piliszek@gmail.com> napsal:
I really like the tool, thank you. also confirms my idea as this kernel
On Mon, Mar 1, 2021 at 11:06 AM Michal Arbet <michal.arbet@ultimum.io> wrote: option is also used by other projects where they explicitly mention case with keepalived/VIP switch.
FWIW, I can see only StarlingX and Airship, none of general-purpose tools seem to mention customising this variable.
-yoctozepto
participants (4)
-
Mark Goddard
-
Michal Arbet
-
Radosław Piliszek
-
Sean Mooney