[openstack-dev] [savanna] scalable architecture

Sergey Lukjanov slukjanov at mirantis.com
Thu Jul 25 15:07:49 UTC 2013


Hi Matt,

thank you for your comments. 

First of all, I want to say that personally I like the approach with agents because of much more theoretical flexibility and scalability. I want to share several overall comments of using agents. We’ve already discussed such approach several times including the previous meeting. Let’s take a look at the pros and cons of such approach.

I see the following pros of using it:

1. we can provision several agents to each Hadoop cluster and launching of large clusters will not affect other users;
2. agents will be deployed near the target VMs, so that’s why I/O could be faster.

And here is a list of cons of using it:

1. we’ll need to add one more service to the architecture, because we’ll need to have savanna-engine to create initial virtual machines and then provision agents to them and use agents for all next operations with cluster;
2. we’ll need to move agents in case if we remove some VM from the Hadoop cluster, for example, if we deploy several agents to cluster, some of them should be deployed to workers that could be removed while cluster scaling;
3. agents should have an ability to interop with savanna, so, it’ll need to have access to MQ (that is impossible due to the security reasons and potentially VMs will not have access to network where MQ installed) or agents could use savanna-api to communicate, but in this case will need to have an auth mechanism for them that isn’t easy to do;
4. agents should provide an API to make savanna possible to pass some tasks to it, so there will be need to have not only internal RPC API but internal REST API for communication with agents;
5. we’ll need to add scheduling mechanism to schedule tasks to the right hosts;
6. agents will consume resources at machines, but it’s not a really problem I think.

I want to say one more time that as the approach with agents looks much more flexible for me, but it is much more harder to implement it. I think that we should start work on scaling savanna in the 0.3 version and implement simple “pilot” approach with only engines and don’t forget about the very huge feature that’s the main for 0.3 version - Elastic Data Processing and we think that it’s very important to not overestimate team bandwidth for 0.3 version. It’ll prepare the base for future improvements. When we’ll finish the work on both EDP and simple architecture approach we’ll see much more cleaner other requirements for architecture and tasks execution framework and then we can take a look on pros and cons again and understand importance of agents usage.

Thank you for comment about statistics, I'll update architecture blueprint when wiki will be unfreezed.

P.S. We can discuss more details on IRC meeting.

Sincerely yours,
Sergey Lukjanov
Savanna Technical Lead
Mirantis Inc.

On Jul 25, 2013, at 8:05, Matthew Farrellee <matt at redhat.com> wrote:

> On 07/23/2013 12:32 PM, Sergey Lukjanov wrote:
>> Hi evereyone,
>> 
>> We’ve started working on upgrading Savanna architecture in version
>> 0.3 to make it horizontally scalable.
>> 
>> The most part of information is in the wiki page -
>> https://wiki.openstack.org/wiki/Savanna/NextGenArchitecture.
>> 
>> Additionally there are several blueprints created for this activity -
>> https://blueprints.launchpad.net/savanna?searchtext=ng-
>> 
>> We are looking for comments / questions / suggestions.
> 
> Some comments on "Why not provision agents to Hadoop cluster's to provision all other stuff?"
> 
> Re problems with scaling agents for launching large clusters - launching large clusters may be resource intensive, those resources must be provided by someone. They're either going to be provided by a) the hardware running the savanna infrastructure or b) the instance hardware provided to the tenant. If they are provided by (a) then the cost of launching the cluster is incurred by all users of savanna. If (b) then the cost is incurred by the user trying to launch the large cluster. It is true that some instance recommendations may be necessary, e.g. if you want to run a 500 instance cluster than your head node should be large (vs medium or small). That sizing decision needs to happen for (a) or (b) because enough virtual resources must be present to maintain the large cluster after it is launched. There are accounting and isolation benefits to (b).
> 
> Re problems migrating agents while cluster is scaling - will you expand on this point?
> 
> Re unexpected resource consumers - during launch, maybe, during execution the agent should be a minimal consumer of resources. sshd may also be an unexpected resource consumer.
> 
> Re security vulnerability - the agents should only communicate within the instance network, primarily w/ the head node. The head node can relay information to the savanna infrastructure outside the instances in the same way savanna-api gets information now. So there should be no difference in vulnerability assessment.
> 
> Re support multiple distros - yes, but I'd argue this is at most a small incremental complexity on what already exists today w/ properly creating savanna plugin compatible instances.
> 
> -
> 
> Concretely, the architecture of using instance resources for provisioning is no different than spinning an instance w/ ambari and then telling that instance to provision the rest of the cluster and report back status.
> 
> -
> 
> Re metrics - wherever you gather Hz (# req per sec, # queries per sec, etc), also gather standard summary statistics (mean, median, std dev, quartiles, range)
> 
> Best,
> 
> 
> matt




More information about the OpenStack-dev mailing list