[Openstack-operators] memcached redundancy
joe at topjian.net
Thu Aug 21 18:57:44 UTC 2014
Thank you for your input.
If I understand you correctly, the core cause seems to be internal to
OpenStack? If that's true, I will create a bug report about this. I'm
guessing Oslo would be the correct project to file the bug?
On Thu, Aug 21, 2014 at 12:19 PM, Clint Byrum <clint at fewbar.com> wrote:
> Excerpts from Joe Topjian's message of 2014-08-14 09:09:59 -0700:
> > Hello,
> > I have an OpenStack cloud with two HA cloud controllers. Each controller
> > runs the standard controller components: glance, keystone, nova minus
> > compute and network, cinder, horizon, mysql, rabbitmq, and memcached.
> > Everything except memcached is accessed through haproxy and everything is
> > working great (well, rabbit can be finicky ... I might post about that if
> > it continues).
> > The problem I currently have is how to effectively work with memcached in
> > this environment. Since all components are load balanced, they need
> > to the same memcached servers. That's solved by the ability to specify
> > multiple memcached servers in the various openstack config files.
> > But if I take a server down for maintenance, I notice a 2-3 second delay
> > all requests. I've confirmed it's memcached by editing the list of
> > memcached servers in the config files and the delay goes away.
> I've seen a few responses to this that show a _massive_ misunderstanding
> of how memcached is intended to work.
> Memcached should never need to be load balanced at the connection
> level. It has a consistent hash ring based on the keys to handle
> load balancing and failover. If you have 2 servers, and 1 is gone,
> the failover should happen automatically. This gets important when you
> have, say, 5 memcached servers as it means that given 1 failed server,
> you retain n-1 RAM for caching.
> What I suspect is happening is that we're not doing that right by
> either not keeping persistent connections, or retrying dead servers
> too aggressively.
> In fact, it looks like the default one used in oslo-incubator's
> 'memorycache', the 'memcache' driver, will by default retry dead servers
> every 30 seconds, and wait 3 seconds for a timeout, which probably
> matches the behavior you see. None of the places I looked in Nova seem
> to allow passing in a different dead_retry or timeout. In my experience,
> you probably want something like dead_retry == 600, so only one slow
> operation every 10 minutes per process (so if you have 10 nova-api's
> running, that's 10 requests every 10 minutes).
> It is also possible that some of these objects are being re-created on
> every request, as is common if caching is implemented too deep inside
> "middleware" and not at the edges of a solution. I haven't dug deep
> enough in, but suffice to say, replicating and load balancing may be the
> cheaper solution to auditing the code and fixing it at this point.
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the OpenStack-operators