[Openstack] Architecture for Shared Components

Jorge Williams jorge.williams at rackspace.com
Tue Aug 3 22:34:59 UTC 2010


On Aug 3, 2010, at 9:06 AM, Michael Gundlach wrote:

Howdy Eric,

On Mon, Aug 2, 2010 at 6:57 PM, Eric Day <eday at oddments.org<mailto:eday at oddments.org>> wrote:
Hi Jorge,

I think we may not be on the same page here.

But I think we're getting close :)  All three of us have slightly different approaches in mind, but we're narrowing down the areas where we disagree.  I've tried listing our agreements and disagreements below -- we'll see if this style is helpful or confusing.

Thanks for summing this up Michael.

I'll place my comments here rather than to reply to previous messages.


I can't speak for what
Michael meant, but this is what I had in mind:

http://oddments.org/tmp/os_module_arch.png

This removes the intermediate proxies and instead relies on the
scaling out of API endpoint and the services it uses. Different
services or different parts of the same service could consume the
same API/service. See my original email for my reasoning, but the
main ones are to keep the APIs consistent across different parts of
a service and to reduce the number of hops a request must go through.


OK, I think we all agree that it is good that

 *   code in the request chain can call out sideways to services
 *   we provide language bindings to those services
 *   the language bindings talk to services using their normal wire protocol, or REST if we write the service ourselves
 *   the language bindings allow dropping in different implementations of the service, including local mocks for testing
 *   it's undesirable to have a lower layer in the request chain have to call up to a higher layer to service a request.

Here's where I think we disagree (pardon if I put words in your mouths incorrectly):

 1.  Jorge suggests putting some specific things in the request chain (e.g. caching) which you and I would suggest making into services.

I should note that there are two kinds of caches that we may employ:  memcache-like object caching, which works much like a service and varnish/squid-like HTTP caching which is by definition a proxy.  I don't think these approaches are mutually exclusive, they are both useful and we can employ both of them in the same system.   i.e. we can use memcache to cache authentication tokens, and varnish to cache representation documents.

Proxy caching makes a lot of sense because a good HTTP caching proxy understands conditional GET operations.  If we are being continuously polled for whatever reason, poll requests will stop at a higher level.

Authentication is an example where we want to take a dual approach.  Have it work as a service AND as a proxy.  The authentication service can validate user credentials or authentication tokens, the proxy can encapsulate the business logic of interacting with the service and can potentially cache authentication tokens to avoid continuously calling the service.


 1.  Jorge and I would suggest making the request chain longer when possible to simplify each part, while you would suggest collapsing it to reduce the number of hops.

I don't see this as being mutually exclusive points either.   Ideally we can set up our proxy services so that if they are deployed in different hosts, they forward HTTP requests, but they can be packaged together in a single deployment so that communication between them is say a method call.   We can setup a framework that will allow us to easily develop them in this manner....

proxyService.service  (HTTPRequest, HTTPResponse, Chain) {

   //
   //  Do your work...
   //
   (........)

  //
  //  Call the next proxy in the chain...
  //

   nextProxy = Chain.nextProxy();
   nextProxy.service (HTTPRequest, HTTPResponse, Chain)
}


If the services are deployed together, we can chain them so that communication between them is simply a method call. We'll have a "Reader" proxy that can receive an HTTP request and a "Dispatcher" proxy that can dispatch it to the network.  With these we can dispatch to separate hosts.

So one deployment can have everything in a single box:


                         +-[HOST A]---------+
                         |        |         |
                         |        v         |
                         |  [   Reader   ]  |
                         |        |         |
                         |        v         |
                         |  [   Prox 1   ]  |
                         |        |         |
                         |        v         |
                         |  [   Prox 2   ]  |
                         |        |         |
                         |        v         |
                         |  [   Prox 3   ]  |
                         |        |         |
                         |        v         |
                         |  [ Dispatcher ]  |
                         |        |         |
                         |        v         |
                         +------------------+


In another we can have Prox 1 on it's own box


                         +-[HOST B]---------+
                         |        |         |
                         |        v         |
                         |  [   Reader   ]  |
                         |        |         |
                         |        v         |
                         |  [   Prox 1   ]  |
                         |        |         |
                         |        v         |
                         |  [ Dispatcher ]  |
                         |        |         |
                         |        v         |
                         +------------------+


                         +-[HOST C]---------+
                         |        |         |
                         |        v         |
                         |  [   Reader   ]  |
                         |        |         |
                         |        v         |
                         |  [   Prox 2   ]  |
                         |        |         |
                         |        v         |
                         |  [   Prox 3   ]  |
                         |        |         |
                         |        v         |
                         |  [ Dispatcher ]  |
                         |        |         |
                         |        v         |
                         +------------------+


In fact we can have multiple Prox 1's load balanced.  The API endpoint is always at the end of each processing chain.   With this we can scale horizontally at each proxy, but only if we need to.   How you chain the proxies up depends on the specific deployment of OpenStack.

Thoughts?


 1.  Let me suggest that #1 isn't really something we need to settle right now: whether auth, or caching, or rate limiting, or SSL termination is implemented as a layer in the request chain or as a sideways-called service can be individually argued when we get to that piece.  I think we're looking to settle the higher-level shape of the architecture.  #2 is below:


If we find we do need an extra proxy layer because caching or SSL
termination becomes too expensive within the endpoints, we can
easily write a new service layer utilizing the same APIs to provide
this. It would be nice to keep these proxies optional though, as some
installations will not require them.

What makes me uncomfortable about http://oddments.org/tmp/os_module_arch.png is that the "API Endpoint" server is a single rectangle that does multiple things, when we could instead have it be a chain of 2 or 3 rectangles that each do a simpler thing.  For example, if there does exist some basic auth* that can be done on the incoming HTTP request, I would want that teased out of the API Endpoint and put in front of it as a layer that calls sideways to the Auth Service.  Now API Endpoint is a little simpler, and we know that any requests to it have already passed basic auth* checks.  API Endpoint can of course call out to Auth Service again to do more fine grained auth if necessary.  As another example, I haven't looked at the code for "Compute Worker" yet so I really have no idea what it does -- but if it were a stack of named layers I would be able to grok a piece at a time.

You want to avoid extra hops and extra layers for simpler installations to have to think about... maybe WSGI is the solution to this that lets us all meet in the middle.  Build API Endpoint as a stack of as many WSGI servers as possible, so that each one is as simple as possible.  If it turns out we never need to scale an individual piece, we have at least broken a larger problem into smaller ones.  If we do need to scale an individual piece, we can turn the WSGI app into its own layer and scale independently.  What do you (Jorge and Eric) think?

I'm not that familiar with WSGI but it looks like you're proposing something similar to what I have above.


Summarized as bullet points, can we agree that

 *   large problems should be broken into small problems
 *   if a layer of the request chain can have a piece broken into a WSGI app, let's do that unless there's a good reason not to
 *   by the time we release 1.0, we should figure out which WSGI apps, if any, need to become independent proxies for scaling reasons

WSGI may indeed be the solution, but I would say, what should be split into  a separate app should be determined by a specific deployment. We should have the capability to split things as needed as an inherit  feature of the architecture.

?  If not, please push back :)

Hope this helped and didn't muddy the waters,
Michael

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20100803/a868ed30/attachment.html>


More information about the Openstack mailing list