Open Stack

Fri Jul 30 20:23:15 UTC 2010

Hi Eric,

Well said.  Assuming I understood your argument below, I think we are
actually on the same page.  Thanks for taking the time to explain your
position in detail.

I think that you and I are in agreement if the modules that you describe are
always simply bindings to another service's API.  For example, I completely
agree that we should have a caching module that any layer of the stack can
use as needed -- but that it should just be a language-specific binding to a
memcached or similar, *running somewhere outside the binary*.  It may be on
localhost, or another machine, or be a load balancer with 10 memcached
servers behind it.  Similarly, any layer of the stack can use the
authorization module, and authentication, etc etc, and they just facilitate
calling out to the authorization or authentication services.

I think it would be a *bad* idea if the caching module was just a standalone
caching library, which any layer of the stack could "strap on" as needed.
 Now that program does two jobs: its original purpose, and caching.  It's
more complex, and you can't scale it as precisely (you end up scaling for
the slowest of the two jobs.)  If we strapped on authentication, and
authorization, etc etc, now this is one complex binary that does everything.

Are we in agreement?  Modules provide language bindings to a URI endpoint
but don't provide the logic itself?

I think my argument for "proxies" came across as if one layer of binaries
should wrap the API, and everything had to happen at that point:

request
  |
  v
hardware LB
  |
  v
proxy layer(s)
  |
  v
API endpoint
  |
  v
deeper...

and I agree that that would inevitably cause us to bubble too much
information up the request chain.  Your module argument is more what I had
in mind, where individual services still stay simple while being able to
make roundtrip calls to other services as needed:

# In this diagram, => means a roundtrip call

request
  |
  v
hardware LB
  |
  v
service A # does the absolute minimum it can,
  |       # then forwards request downstream
  v
service B
      => service C # e.g. authentication
              => cache service
              => storage service
                      => some database
  |
  v
service D # e.g. API endpoint
      => cache service
  |
  v
service E # e.g. more specific endpoint, maybe cloud servers endpoint
      => cache service
      => service F # e.g. authorization
  |
  v
deeper... # the more links in the chain, the better:
          # each link is a simpler piece to code and to understand.

and what I had *thought* you meant looked like this in my head:

request
  |
  v
hardware LB
  |
  v
monolithic binary that did more than one thing
  |
  v
deeper...

In my experience, drawing #2 is the best, and a huge reason is that I can
actually explain to you what each service does.  I can tell you what inputs
he expects, what outputs he expects, and the one thing that he calculates --
and you can just throw more instances of him in place to scale him.
 Whenever a service has accreted too much functionality, you split his code
into two new services, and put them serially in the request chain or have
one call the other to do some work.

Let me know if I've misunderstood you again,
Michael

On Fri, Jul 30, 2010 at 3:09 PM, Eric Day <eday at oddments.org> wrote:

> Hi Everyone,
>
> A number of us have been discussing different ways of sharing
> components between different projects in OpenStack, such as auth*,
> caching, rate limiting, and so on. There are a few ways to approach
> this, and we thought it would be best to put this out on the mailing
> list for folks to discuss.
>
> The two approaches proposed so far are a proxy layer that would
> sit in front of the service APIs and a library/module that would
> encapsulate these shared components and allow the services to consume
> it at any layer. The problem we are trying to solve is re-usability
> across different services, as well as a layer that can scale along
> with the service. I'm leaning towards the library approach so I'll
> explain that side.
>
> The basic idea is to provide a set of libraries or modules that could
> be reused and expose a common API each service can consume (auth,
> caching, ...). They will be modular themselves in that they could have
> different backends providing each service. These interfaces will need
> to be written for each language we plan to support (or written once
> in something like C and write extensions on top of it). Tools like
> SWIG can help in this area.
>
> The reasoning behind this approach over the proxy is that you're not
> forced to answer questions out of context. Having the appropriate
> amount of context, and doing checks at the appropriate layer, are key
> in building efficient systems that scale. If we use the proxy model,
> we will inevitably need to push more service-specific context up
> into that layer to handle requests efficiently (URL structure for the
> service, peeking into the JSON/XML request to interpret the request
> parameters, and so on). I think questions around authorization and
> cached responses can sometimes best be handled deeper in the system.
>
> If we have this functionality wrapped in a library, we can make
> calls from the service software at any layer (when the context is
> relevant). We still solve the re-usability problem, but in a way that
> can both be more efficient and doesn't require context to bubble up
> into a generic proxy layer.
>
> As for scalability, the libraries provided can use any methods needed
> to ensure they scale across projects. For example, if we're talking
> about authentication systems, the module can manage caching, either
> local or network based, and still perform any optimizations it needs
> to. The library may expose a simple API to the applications, but it
> can have it's own scalable architecture underneath.
>
> The service API software will already need the ability to scale out
> horizontally, so I don't see this as a potential bottleneck. For
> example, in Nova, the API servers essentially act as a HTTP<->message
> queue proxy, so you can easily start up as many as is needed with
> some form of load balancing in front and workers on the other side
> of the queues carry out the bulk of the work. Having the service API
> also handle tasks like rate limiting and auth should not be an issue.
>
> You could even write a generic proxy layer for services that need it
> based on the set of libraries we would use elsewhere in the system.
>
> Having worked on systems that took both approaches in the past, I can
> say the library approach was both more efficient and maintainable. I'm
> sure we can make either work, but I want to make sure we consider
> alternatives and think through the specifics a bit more first.
>
> Thanks,
> -Eric
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack at lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp
>

Confidentiality Notice: This e-mail message (including any attached or
embedded documents) is intended for the exclusive and confidential use of the
individual or entity to which this message is addressed, and unless otherwise
expressly indicated, is confidential and privileged information of Rackspace.
Any dissemination, distribution or copying of the enclosed material is prohibited.
If you receive this transmission in error, please notify us immediately by e-mail
at abuse at rackspace.com, and delete the original message.
Your cooperation is appreciated.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20100730/1291ea71/attachment.html>

Open Stack

[Openstack] Architecture for Shared Components

OpenStack

Community

Documentation

Branding & Legal