[Openstack-operators] memcached redundancy

Joshua Harlow harlowja at outlook.com
Thu Aug 21 20:19:30 UTC 2014


+1 for this, remember the 'cache' in memcache *strongly* indicates what it should be used for.

A useful link to read over @ http://joped.com/2009/03/a-rant-about-proper-memcache-usage/

-Josh

On Aug 21, 2014, at 11:19 AM, Clint Byrum <clint at fewbar.com> wrote:

> Excerpts from Joe Topjian's message of 2014-08-14 09:09:59 -0700:
>> Hello,
>> 
>> I have an OpenStack cloud with two HA cloud controllers. Each controller
>> runs the standard controller components: glance, keystone, nova minus
>> compute and network, cinder, horizon, mysql, rabbitmq, and memcached.
>> 
>> Everything except memcached is accessed through haproxy and everything is
>> working great (well, rabbit can be finicky ... I might post about that if
>> it continues).
>> 
>> The problem I currently have is how to effectively work with memcached in
>> this environment. Since all components are load balanced, they need access
>> to the same memcached servers. That's solved by the ability to specify
>> multiple memcached servers in the various openstack config files.
>> 
>> But if I take a server down for maintenance, I notice a 2-3 second delay in
>> all requests. I've confirmed it's memcached by editing the list of
>> memcached servers in the config files and the delay goes away.
> 
> I've seen a few responses to this that show a _massive_ misunderstanding
> of how memcached is intended to work.
> 
> Memcached should never need to be load balanced at the connection
> level. It has a consistent hash ring based on the keys to handle
> load balancing and failover. If you have 2 servers, and 1 is gone,
> the failover should happen automatically. This gets important when you
> have, say, 5 memcached servers as it means that given 1 failed server,
> you retain n-1 RAM for caching.
> 
> What I suspect is happening is that we're not doing that right by
> either not keeping persistent connections, or retrying dead servers
> too aggressively.
> 
> In fact, it looks like the default one used in oslo-incubator's
> 'memorycache', the 'memcache' driver, will by default retry dead servers
> every 30 seconds, and wait 3 seconds for a timeout, which probably
> matches the behavior you see. None of the places I looked in Nova seem
> to allow passing in a different dead_retry or timeout. In my experience,
> you probably want something like dead_retry == 600, so only one slow
> operation every 10 minutes per process (so if you have 10 nova-api's
> running, that's 10 requests every 10 minutes).
> 
> It is also possible that some of these objects are being re-created on
> every request, as is common if caching is implemented too deep inside
> "middleware" and not at the edges of a solution. I haven't dug deep
> enough in, but suffice to say, replicating and load balancing may be the
> cheaper solution to auditing the code and fixing it at this point.
> 
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators




More information about the OpenStack-operators mailing list