<div dir="ltr">I had some time this morning to set up an environment to reproduce the issue. The environment is just as described before: all services go through haproxy except for memcached. <div><br></div><div>This is all Ubuntu 12.04 and OpenStack Havana.</div>
<div><br></div><div>Keystone is using the memcached backend for tokens:</div><div><br></div><div><div>[token]</div><div>driver = keystone.token.backends.memcache.Token</div></div><div><div>[memcache]</div><div>servers=hnl-memcached-1:11211,hnl-memcached-2:11211</div>
</div><div><br></div><div>nova.conf also has a memcached setting:</div><div><br></div><div>memcached_servers=hnl-memcached-1:11211,hnl-memcached-2:11211<br></div><div><br></div><div>Normal requests take less than a second:</div>
<div><br></div><div><div>[ root@hnl-nova-1 ~ ] # time keystone token-get</div><div>+-----------+----------------------------------+</div><div>| Property | Value |</div><div>+-----------+----------------------------------+</div>
<div>| expires | 2014-08-22T15:10:20Z |</div><div>| id | be78e6ee4992435d8bcde6860810b2db |</div><div>| tenant_id | 20f49a5947d94d2ca25f68e8c790b9d6 |</div><div>| user_id | 79c71416b20f453eb1d48ad57454cdb7 |</div>
<div>+-----------+----------------------------------+</div><div><br></div><div>real 0m0.498s</div><div>user 0m0.254s</div><div>sys 0m0.072s</div></div><div><br></div><div><div>[ root@hnl-nova-1 ~ ] # time nova list</div>
<div><br></div><div><br></div><div>real 0m0.969s</div><div>user 0m0.417s</div><div>sys 0m0.104s</div></div><div><br></div><div>If I shut off hnl-memcached-1, requests take an extra 3 seconds:</div><div><br></div>
<div><div>[ root@hnl-nova-1 ~ ] # time keystone token-get</div><div>+-----------+----------------------------------+</div><div>| Property | Value |</div><div>+-----------+----------------------------------+</div>
<div>| expires | 2014-08-22T15:20:24Z |</div><div>| id | 8112e925344e4401830b205c60fc4f26 |</div><div>| tenant_id | 20f49a5947d94d2ca25f68e8c790b9d6 |</div><div>| user_id | 79c71416b20f453eb1d48ad57454cdb7 |</div>
<div>+-----------+----------------------------------+</div><div><br></div><div>real 0m3.549s</div><div>user 0m0.273s</div><div>sys 0m0.076s</div></div><div><br></div><div>I'm assuming the 3 seconds comes from the 3 second SYN retry described <a href="https://code.google.com/p/memcached/wiki/Timeouts">here</a>.</div>
<div><br></div><div>If I turn hnl-memcached-1 back on and shut down hnl-memcached-2, I get mixed results. Initially I don't see a delay in requests, but repeated requests take longer and longer until they hit the 3 second mark. </div>
<div><br></div><div>If I switch the order of the memcached servers (so hnl-memcached-2 is first and hnl-memcached-1 is second), I immediately see the 3 second delay.</div><div><br></div><div>If I play around with the memcached settings in nova.conf, I see similar results including a delay of up to 10 seconds.</div>
<div><br></div><div>So in summary, I'm seeing:</div><div><br></div><div>* Delay in requests when the first memcached server in a list is offline</div><div>* Sporadic delays in requests when the second memcached server in a list is offline</div>
<div>* Even longer delays for subsequent services using the same memcached servers.</div><div><br></div><div>Any ideas on what's going on?</div><div><br></div><div>Thanks,</div><div>Joe</div></div><div class="gmail_extra">
<br><br><div class="gmail_quote">On Thu, Aug 14, 2014 at 1:58 PM, Juan José Pavlik Salles <span dir="ltr"><<a href="mailto:jjpavlik@gmail.com" target="_blank">jjpavlik@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">Joe, I'm not running it in my grizzly cloud because I still have got just one controller node. However I remember testing repcached in two VMs and it worked ok. Anyway it may not be the best solution for a big production cloud since repcached project is pretty dead already (last update 2012).<div>
<br></div><div>Cheers. </div></div><div class="gmail_extra"><br><br><div class="gmail_quote">2014-08-14 16:38 GMT-03:00 Joe Topjian <span dir="ltr"><<a href="mailto:joe@topjian.net" target="_blank">joe@topjian.net</a>></span>:<div>
<div class="h5"><br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Thanks all for the input! Aggregating all replies:<div><br></div><div>Juan: I've seen that patch and was curious about it. When you say it worked for you, have you used it in OpenStack for similar reasons?</div>
<div><br></div><div>Daneyon: ah - sticky sessions are a good idea. I will look into that.</div><div><br></div><div>John: Another component is a very good possibility. I'll try this out in a dev environment or if we have planned maintenance where I can try this prior to the maintenance window. Thank you for the insight on what the memcached client should be doing.</div>
<div><br></div><div>Thanks,</div><div>Joe</div></div><div><div><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Aug 14, 2014 at 1:22 PM, Juan José Pavlik Salles <span dir="ltr"><<a href="mailto:jjpavlik@gmail.com" target="_blank">jjpavlik@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">It might not be the best idea ever, but it worked for me. There's a memcached patch called repcached that allows you to have 2 memcached servers sincronized (as far as I now, it just works for 2 servers) with each other, this way you get HA and LB. The patch source is <span style="color:rgb(68,68,68);font-family:Arial,Tahoma,Helvetica,FreeSans,sans-serif;font-size:13px;line-height:18px"> </span><a href="https://github.com/usecide/repcached/blob/master/repcached-2.3.1-1.4.13.patch" style="text-decoration:none;color:rgb(77,70,156);font-family:Arial,Tahoma,Helvetica,FreeSans,sans-serif;font-size:13px;line-height:18px" target="_blank">https://github.com/usecide/repcached/blob/master/repcached-2.3.1-1.4.13.patch</a> and here you've got a little example <a href="http://viviendolared.blogspot.com.ar/2014/01/memcached-replicado-en-ubuntu-1204-lts.html" target="_blank">http://viviendolared.blogspot.com.ar/2014/01/memcached-replicado-en-ubuntu-1204-lts.html</a> running on Ubuntu 12.04 (sorry about the spanish link, but the steps are pretty straight forward).</div>
<div class="gmail_extra"><br><br><div class="gmail_quote">2014-08-14 16:05 GMT-03:00 Daneyon Hansen (danehans) <span dir="ltr"><<a href="mailto:danehans@cisco.com" target="_blank">danehans@cisco.com</a>></span>:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div>
<div style="word-wrap:break-word;color:rgb(0,0,0);font-size:14px;font-family:Calibri,sans-serif">
<div>
<div>
<div><br>
</div>
<div>It has been a while, but I believe I load-balanced memcached using HAProxy (using sticky sessions) and observed no issues with fail-over.</div>
<div><br>
</div>
<div>
<div>Regards,</div>
<div>Daneyon Hansen</div>
<div>Software Engineer</div>
<div>Email: <a href="mailto:danehans@cisco.com" target="_blank">danehans@cisco.com</a></div>
<div>Phone: <a href="tel:303-718-0400" value="+13037180400" target="_blank">303-718-0400</a></div>
<div><a href="http://about.me/daneyon_hansen" target="_blank">http://about.me/daneyon_hansen</a></div>
</div>
</div>
</div>
<div><br>
</div>
<span>
<div style="font-family:Calibri;font-size:11pt;text-align:left;color:black;BORDER-BOTTOM:medium none;BORDER-LEFT:medium none;PADDING-BOTTOM:0in;PADDING-LEFT:0in;PADDING-RIGHT:0in;BORDER-TOP:#b5c4df 1pt solid;BORDER-RIGHT:medium none;PADDING-TOP:3pt">
<span style="font-weight:bold">From: </span>Joe Topjian <<a href="mailto:joe@topjian.net" target="_blank">joe@topjian.net</a>><br>
<span style="font-weight:bold">Date: </span>Thursday, August 14, 2014 10:09 AM<br>
<span style="font-weight:bold">To: </span>"<a href="mailto:openstack-operators@lists.openstack.org" target="_blank">openstack-operators@lists.openstack.org</a>" <<a href="mailto:openstack-operators@lists.openstack.org" target="_blank">openstack-operators@lists.openstack.org</a>><br>
<span style="font-weight:bold">Subject: </span>[Openstack-operators] memcached redundancy<br>
</div><div><div>
<div><br>
</div>
<div>
<div>
<div dir="ltr">Hello,
<div><br>
</div>
<div>I have an OpenStack cloud with two HA cloud controllers. Each controller runs the standard controller components: glance, keystone, nova minus compute and network, cinder, horizon, mysql, rabbitmq, and memcached.</div>
<div><br>
</div>
<div>Everything except memcached is accessed through haproxy and everything is working great (well, rabbit can be finicky ... I might post about that if it continues).</div>
<div><br>
</div>
<div>The problem I currently have is how to effectively work with memcached in this environment. Since all components are load balanced, they need access to the same memcached servers. That's solved by the ability to specify multiple memcached servers in the
various openstack config files.</div>
<div><br>
</div>
<div>But if I take a server down for maintenance, I notice a 2-3 second delay in all requests. I've confirmed it's memcached by editing the list of memcached servers in the config files and the delay goes away.</div>
<div><br>
</div>
<div>I'm wondering how people deploy memcached in environments like this? Are you using some type of memcached replication between servers? Or if a memcached server goes offline are you reconfiguring OpenStack to remove the offline memcached server?</div>
<div><br>
</div>
<div>Thanks,</div>
<div>Joe</div>
</div>
</div>
</div>
</div></div></span>
</div>
<br></div></div><div>_______________________________________________<br>
OpenStack-operators mailing list<br>
<a href="mailto:OpenStack-operators@lists.openstack.org" target="_blank">OpenStack-operators@lists.openstack.org</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators</a><br>
<br></div></blockquote></div><span><font color="#888888"><br><br clear="all"><div><br></div>-- <br><div dir="ltr">Pavlik Salles Juan José<div>Blog - <a href="http://viviendolared.blogspot.com" target="_blank">http://viviendolared.blogspot.com</a></div>
</div>
</font></span></div>
<br>_______________________________________________<br>
OpenStack-operators mailing list<br>
<a href="mailto:OpenStack-operators@lists.openstack.org" target="_blank">OpenStack-operators@lists.openstack.org</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators</a><br>
<br></blockquote></div><br></div>
</div></div><br>_______________________________________________<br>
OpenStack-operators mailing list<br>
<a href="mailto:OpenStack-operators@lists.openstack.org" target="_blank">OpenStack-operators@lists.openstack.org</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators</a><br>
<br></blockquote></div></div></div><div><div class="h5"><br><br clear="all"><div><br></div>-- <br><div dir="ltr">Pavlik Salles Juan José<div>Blog - <a href="http://viviendolared.blogspot.com" target="_blank">http://viviendolared.blogspot.com</a></div>
</div>
</div></div></div>
<br>_______________________________________________<br>
OpenStack-operators mailing list<br>
<a href="mailto:OpenStack-operators@lists.openstack.org">OpenStack-operators@lists.openstack.org</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators</a><br>
<br></blockquote></div><br></div>