[openstack-dev] TC Meeting / Savanna Incubation Follow-Up

Michael Basnight mbasnight at gmail.com
Fri Sep 13 16:17:14 UTC 2013


On Sep 13, 2013, at 9:05 AM, Alexander Kuznetsov wrote:

> 
> 
> 
> On Fri, Sep 13, 2013 at 7:26 PM, Michael Basnight <mbasnight at gmail.com> wrote:
> On Sep 13, 2013, at 6:56 AM, Alexander Kuznetsov wrote:
> > On Thu, Sep 12, 2013 at 7:30 PM, Michael Basnight <mbasnight at gmail.com> wrote:
> > On Sep 12, 2013, at 2:39 AM, Thierry Carrez wrote:
> >
> > > Sergey Lukjanov wrote:
> > >
> > >> [...]
> > >> As you can see, resources provisioning is just one of the features and the implementation details are not critical for overall architecture. It performs only the first step of the cluster setup. We’ve been considering Heat for a while, but ended up direct API calls in favor of speed and simplicity. Going forward Heat integration will be done by implementing extension mechanism [3] and [4] as part of Icehouse release.
> > >>
> > >> The next part, Hadoop cluster configuration, already extensible and we have several plugins - Vanilla, Hortonworks Data Platform and Cloudera plugin started too. This allow to unify management of different Hadoop distributions under single control plane. The plugins are responsible for correct Hadoop ecosystem configuration at already provisioned resources and use different Hadoop management tools like Ambari to setup and configure all cluster  services, so, there are no actual provisioning configs on Savanna side in this case. Savanna and its plugins encapsulate the knowledge of Hadoop internals and default configuration for Hadoop services.
> > >
> > > My main gripe with Savanna is that it combines (in its upcoming release)
> > > what sounds like to me two very different services: Hadoop cluster
> > > provisioning service (like what Trove does for databases) and a
> > > MapReduce+ data API service (like what Marconi does for queues).
> > >
> > > Making it part of the same project (rather than two separate projects,
> > > potentially sharing the same program) make discussions about shifting
> > > some of its clustering ability to another library/project more complex
> > > than they should be (see below).
> > >
> > > Could you explain the benefit of having them within the same service,
> > > rather than two services with one consuming the other ?
> >
> > And for the record, i dont think that Trove is the perfect fit for it today. We are still working on a clustering API. But when we create it, i would love the Savanna team's input, so we can try to make a pluggable API thats usable for people who want MySQL or Cassandra or even Hadoop. Im less a fan of a clustering library, because in the end, we will both have API calls like POST /clusters, GET /clusters, and there will be API duplication between the projects.
> >
> > I think that Cluster API (if it would be created) will be helpful not only for Trove and Savanna.  NoSQL, RDBMS and Hadoop are not unique software which can be clustered. What about different kind of messaging solutions like RabbitMQ, ActiveMQ or J2EE containers like JBoss, Weblogic and WebSphere, which often are installed in clustered mode. Messaging, databases, J2EE containers and Hadoop have their own management cycle. It will be confusing to make Cluster API a part of Trove which has different mission - database management and provisioning.
> 
> Are you suggesting a 3rd program, cluster as a service? Trove is trying to target a generic enough™ API to tackle different technologies with plugins or some sort of extensions. This will include a scheduler to determine rack awareness. Even if we decide that both Savanna and Trove need their own API for building clusters, I still want to understand what makes the Savanna API and implementation different, and how Trove can build an API/system that can encompass multiple datastore technologies. So regardless of how this shakes out, I would urge you to go to the Trove clustering summit session [1] so we can share ideas.
> 
> Generic enough™ API shouldn't contain a database specific calls like backups and restore (already in Trove).  Why we need a backup and restore operations for J2EE or messaging solutions? 

I dont mean to encompass J2EE or messaging solutions. Let me amend my email to say "to tackle different datastore technologies". But going with this point… Do you not need to backup things in a J2EE container? Id assume a backup is needed by all clusters, personally. I would not like a system that didnt have a way to backup and restore "things" in my cluster.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 495 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20130913/85f94502/attachment.pgp>


More information about the OpenStack-dev mailing list