[openstack-dev] [savanna] scalable architecture

Matthew Farrellee matt at redhat.com
Thu Jul 25 04:05:41 UTC 2013

On 07/23/2013 12:32 PM, Sergey Lukjanov wrote:
> Hi evereyone,
> We’ve started working on upgrading Savanna architecture in version
> 0.3 to make it horizontally scalable.
> The most part of information is in the wiki page -
> https://wiki.openstack.org/wiki/Savanna/NextGenArchitecture.
> Additionally there are several blueprints created for this activity -
> https://blueprints.launchpad.net/savanna?searchtext=ng-
> We are looking for comments / questions / suggestions.

Some comments on "Why not provision agents to Hadoop cluster's to 
provision all other stuff?"

Re problems with scaling agents for launching large clusters - launching 
large clusters may be resource intensive, those resources must be 
provided by someone. They're either going to be provided by a) the 
hardware running the savanna infrastructure or b) the instance hardware 
provided to the tenant. If they are provided by (a) then the cost of 
launching the cluster is incurred by all users of savanna. If (b) then 
the cost is incurred by the user trying to launch the large cluster. It 
is true that some instance recommendations may be necessary, e.g. if you 
want to run a 500 instance cluster than your head node should be large 
(vs medium or small). That sizing decision needs to happen for (a) or 
(b) because enough virtual resources must be present to maintain the 
large cluster after it is launched. There are accounting and isolation 
benefits to (b).

Re problems migrating agents while cluster is scaling - will you expand 
on this point?

Re unexpected resource consumers - during launch, maybe, during 
execution the agent should be a minimal consumer of resources. sshd may 
also be an unexpected resource consumer.

Re security vulnerability - the agents should only communicate within 
the instance network, primarily w/ the head node. The head node can 
relay information to the savanna infrastructure outside the instances in 
the same way savanna-api gets information now. So there should be no 
difference in vulnerability assessment.

Re support multiple distros - yes, but I'd argue this is at most a small 
incremental complexity on what already exists today w/ properly creating 
savanna plugin compatible instances.


Concretely, the architecture of using instance resources for 
provisioning is no different than spinning an instance w/ ambari and 
then telling that instance to provision the rest of the cluster and 
report back status.


Re metrics - wherever you gather Hz (# req per sec, # queries per sec, 
etc), also gather standard summary statistics (mean, median, std dev, 
quartiles, range)



More information about the OpenStack-dev mailing list