[Openstack-operators] memcached redundancy

Joe Topjian joe at topjian.net
Thu Aug 21 18:57:44 UTC 2014


Hi Clint,

Thank you for your input.

If I understand you correctly, the core cause seems to be internal to
OpenStack? If that's true, I will create a bug report about this. I'm
guessing Oslo would be the correct project to file the bug?

Thanks,
Joe


On Thu, Aug 21, 2014 at 12:19 PM, Clint Byrum <clint at fewbar.com> wrote:

> Excerpts from Joe Topjian's message of 2014-08-14 09:09:59 -0700:
> > Hello,
> >
> > I have an OpenStack cloud with two HA cloud controllers. Each controller
> > runs the standard controller components: glance, keystone, nova minus
> > compute and network, cinder, horizon, mysql, rabbitmq, and memcached.
> >
> > Everything except memcached is accessed through haproxy and everything is
> > working great (well, rabbit can be finicky ... I might post about that if
> > it continues).
> >
> > The problem I currently have is how to effectively work with memcached in
> > this environment. Since all components are load balanced, they need
> access
> > to the same memcached servers. That's solved by the ability to specify
> > multiple memcached servers in the various openstack config files.
> >
> > But if I take a server down for maintenance, I notice a 2-3 second delay
> in
> > all requests. I've confirmed it's memcached by editing the list of
> > memcached servers in the config files and the delay goes away.
>
> I've seen a few responses to this that show a _massive_ misunderstanding
> of how memcached is intended to work.
>
> Memcached should never need to be load balanced at the connection
> level. It has a consistent hash ring based on the keys to handle
> load balancing and failover. If you have 2 servers, and 1 is gone,
> the failover should happen automatically. This gets important when you
> have, say, 5 memcached servers as it means that given 1 failed server,
> you retain n-1 RAM for caching.
>
> What I suspect is happening is that we're not doing that right by
> either not keeping persistent connections, or retrying dead servers
> too aggressively.
>
> In fact, it looks like the default one used in oslo-incubator's
> 'memorycache', the 'memcache' driver, will by default retry dead servers
> every 30 seconds, and wait 3 seconds for a timeout, which probably
> matches the behavior you see. None of the places I looked in Nova seem
> to allow passing in a different dead_retry or timeout. In my experience,
> you probably want something like dead_retry == 600, so only one slow
> operation every 10 minutes per process (so if you have 10 nova-api's
> running, that's 10 requests every 10 minutes).
>
> It is also possible that some of these objects are being re-created on
> every request, as is common if caching is implemented too deep inside
> "middleware" and not at the edges of a solution. I haven't dug deep
> enough in, but suffice to say, replicating and load balancing may be the
> cheaper solution to auditing the code and fixing it at this point.
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20140821/e452a853/attachment.html>


More information about the OpenStack-operators mailing list