[Openstack-operators] memcached redundancy

Joe Topjian joe at topjian.net
Fri Aug 22 20:00:07 UTC 2014


Hi Morgan,

Thank you very much for the detailed reply.

What's the relationship between the dogpile family of drivers and the SQL
and memcached token storage backends? The Keystone dev docs say that
dogpile is a caching layer on top of other keystone functions. You mention
that dogpile can also work as a key/value store. So is the dogpile stuff
superseding the "original" token backends? Or are those still needed but
dogpile sits on top and ultimately does a better job?

I'm asking the above from the point of view of running a pre-Icehouse
cloud. So I'm just trying to figure what options I currently have, will
have, and if I should plan for some type of backend switch (even if it's
just additional configuration in keystone.conf).

Thanks,
Joe


On Fri, Aug 22, 2014 at 1:25 PM, Morgan Fainberg <morgan.fainberg at gmail.com>
wrote:

> For Keystone, there will be a MongoDB backend in Juno that uses the
> Dogpile-based key-value storage. The Dogpile storage of tokens (available
> in icehouse) requires a simple backend that implements the basic types of
> interfaces (get, set, delete, get_multi, set_multi, delete_multi, etc) and
> that can communicate to whatever storage/cache system you want to use.
> Obviously it’s optimized for caching (the library is named dogpile.cache),
> but it works well as a key-value-store implementation as well.
>
> There are also significant strides towards supporting no-persistence (when
> using PKI tokens). There are still some roadblocks from getting us clear of
> needing the token persistence backends as an option.
>
> With that said… Back on the original topic.
>
> We definitely are using memcached incorrectly (as a persistent store), but
> at the time we needed to provide some alternatives to alleviate the issues
> you are highlighting with storing tokens in SQL (there are ways to make the
> SQL backend better as well). This incorrect use of memcached does drive
> towards wanting connection *AND* storage level redundancy.
>
> With regards to the MemoryCache oslo-incubator library (and oslo.cache
> basic library) there is some work that has been proposed (a spec) to move
> to dogpile.cache and really focus on using caching backends (such as
> memcachd) correctly across OpenStack. This opens the door to having more
> control on how we work with memcache (or any other backend) that we use for
> caching. This change is a tentative target for Kilo and the subsequent
> cycles.
>
> Now with all of that in mind, some of the issue comes from the basic
> python memcache library and how it handles dead servers (with socket
> timeouts, marking them dead, etc) and probably how we’re setting those
> timeouts / limits.
>
> There is a lot of room for improvement in how we cache; just remember
> caching is one of the hardest things to do right. Doing caching wrong opens
> up the potential for a lot of bugs.
>
>> Morgan Fainberg
>
>
> -----Original Message-----
> From: Joe Topjian <joe at topjian.net>
> Reply: Joe Topjian <joe at topjian.net>>
> Date: August 22, 2014 at 12:03:56
> To: Morgan Fainberg <morgan.fainberg at gmail.com>>
> Cc: openstack-operators <openstack-operators at lists.openstack.org>>
> Subject:  Re: [Openstack-operators] memcached redundancy
>
> > It sounds like there are two incorrect uses of memcached: The actual
> > communication of the openstack components to memcached and using
> memcached
> > itself as a persistent token store. Though from what it sounds like, if
> the
> > former was done better, the latter wouldn't be too much of an issue?
> >
> > I do agree that using something like memcached, which explicitly
> advertises
> > itself as a bad solution for persistent storage, can ultimately be asking
> > for trouble.
> >
> > With that said, though, it looks like there are currently two choices
> for a
> > keystone token backend: memcached and SQL. Both have obvious downsides.
> > Personally I'd rather deal with my current memcached issues than go back
> to
> > storing tokens in SQL.
> >
> > ... unless I'm missing something? Is there more to the current state of
> > Keystone token backends than the memcached and SQL backends that have
> been
> > around for the past few years?
> >
> >
> >
> >
> > On Fri, Aug 22, 2014 at 12:39 PM, Morgan Fainberg > > wrote:
> >
> > > While keystone uses memcache as a possible token storage backend we are
> > > working towards eliminating the design that makes memcache a desirable
> > > token backend.
> > >
> > > Using memcache for the token backend is not the best approach as the
> token
> > > backend (up through icehouse and in some cases will hold true for Juno)
> > > assumes stable storage for at least the life of the token.
> > >
> > > I agree with Josh, we are likely using memcached incorrectly in a
> number
> > > of cases.
> > >
> > > --Morgan
> > >
> > >
> > > On Thursday, August 21, 2014, Joshua Harlow wrote:
> > >
> > >> +1 for this, remember the 'cache' in memcache *strongly* indicates
> what
> > >> it should be used for.
> > >>
> > >> A useful link to read over @
> > >> http://joped.com/2009/03/a-rant-about-proper-memcache-usage/
> > >>
> > >> -Josh
> > >>
> > >> On Aug 21, 2014, at 11:19 AM, Clint Byrum wrote:
> > >>
> > >> > Excerpts from Joe Topjian's message of 2014-08-14 09:09:59 -0700:
> > >> >> Hello,
> > >> >>
> > >> >> I have an OpenStack cloud with two HA cloud controllers. Each
> > >> controller
> > >> >> runs the standard controller components: glance, keystone, nova
> minus
> > >> >> compute and network, cinder, horizon, mysql, rabbitmq, and
> memcached.
> > >> >>
> > >> >> Everything except memcached is accessed through haproxy and
> everything
> > >> is
> > >> >> working great (well, rabbit can be finicky ... I might post about
> that
> > >> if
> > >> >> it continues).
> > >> >>
> > >> >> The problem I currently have is how to effectively work with
> memcached
> > >> in
> > >> >> this environment. Since all components are load balanced, they need
> > >> access
> > >> >> to the same memcached servers. That's solved by the ability to
> specify
> > >> >> multiple memcached servers in the various openstack config files.
> > >> >>
> > >> >> But if I take a server down for maintenance, I notice a 2-3 second
> > >> delay in
> > >> >> all requests. I've confirmed it's memcached by editing the list of
> > >> >> memcached servers in the config files and the delay goes away.
> > >> >
> > >> > I've seen a few responses to this that show a _massive_
> misunderstanding
> > >> > of how memcached is intended to work.
> > >> >
> > >> > Memcached should never need to be load balanced at the connection
> > >> > level. It has a consistent hash ring based on the keys to handle
> > >> > load balancing and failover. If you have 2 servers, and 1 is gone,
> > >> > the failover should happen automatically. This gets important when
> you
> > >> > have, say, 5 memcached servers as it means that given 1 failed
> server,
> > >> > you retain n-1 RAM for caching.
> > >> >
> > >> > What I suspect is happening is that we're not doing that right by
> > >> > either not keeping persistent connections, or retrying dead servers
> > >> > too aggressively.
> > >> >
> > >> > In fact, it looks like the default one used in oslo-incubator's
> > >> > 'memorycache', the 'memcache' driver, will by default retry dead
> servers
> > >> > every 30 seconds, and wait 3 seconds for a timeout, which probably
> > >> > matches the behavior you see. None of the places I looked in Nova
> seem
> > >> > to allow passing in a different dead_retry or timeout. In my
> experience,
> > >> > you probably want something like dead_retry == 600, so only one slow
> > >> > operation every 10 minutes per process (so if you have 10 nova-api's
> > >> > running, that's 10 requests every 10 minutes).
> > >> >
> > >> > It is also possible that some of these objects are being re-created
> on
> > >> > every request, as is common if caching is implemented too deep
> inside
> > >> > "middleware" and not at the edges of a solution. I haven't dug deep
> > >> > enough in, but suffice to say, replicating and load balancing may
> be the
> > >> > cheaper solution to auditing the code and fixing it at this point.
> > >> >
> > >> > _______________________________________________
> > >> > OpenStack-operators mailing list
> > >> > OpenStack-operators at lists.openstack.org
> > >> >
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> > >>
> > >>
> > >> _______________________________________________
> > >> OpenStack-operators mailing list
> > >> OpenStack-operators at lists.openstack.org
> > >>
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> > >>
> > >
> > > _______________________________________________
> > > OpenStack-operators mailing list
> > > OpenStack-operators at lists.openstack.org
> > >
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> > >
> > >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20140822/5f95604e/attachment.html>


More information about the OpenStack-operators mailing list