[openstack-dev] TC Meeting / Savanna Incubation Follow-Up
Clint Byrum
clint at fewbar.com
Fri Sep 13 18:35:43 UTC 2013
Excerpts from Michael Basnight's message of 2013-09-13 08:26:07 -0700:
> On Sep 13, 2013, at 6:56 AM, Alexander Kuznetsov wrote:
> > On Thu, Sep 12, 2013 at 7:30 PM, Michael Basnight <mbasnight at gmail.com> wrote:
> > On Sep 12, 2013, at 2:39 AM, Thierry Carrez wrote:
> >
> > > Sergey Lukjanov wrote:
> > >
> > >> [...]
> > >> As you can see, resources provisioning is just one of the features and the implementation details are not critical for overall architecture. It performs only the first step of the cluster setup. We’ve been considering Heat for a while, but ended up direct API calls in favor of speed and simplicity. Going forward Heat integration will be done by implementing extension mechanism [3] and [4] as part of Icehouse release.
> > >>
> > >> The next part, Hadoop cluster configuration, already extensible and we have several plugins - Vanilla, Hortonworks Data Platform and Cloudera plugin started too. This allow to unify management of different Hadoop distributions under single control plane. The plugins are responsible for correct Hadoop ecosystem configuration at already provisioned resources and use different Hadoop management tools like Ambari to setup and configure all cluster services, so, there are no actual provisioning configs on Savanna side in this case. Savanna and its plugins encapsulate the knowledge of Hadoop internals and default configuration for Hadoop services.
> > >
> > > My main gripe with Savanna is that it combines (in its upcoming release)
> > > what sounds like to me two very different services: Hadoop cluster
> > > provisioning service (like what Trove does for databases) and a
> > > MapReduce+ data API service (like what Marconi does for queues).
> > >
> > > Making it part of the same project (rather than two separate projects,
> > > potentially sharing the same program) make discussions about shifting
> > > some of its clustering ability to another library/project more complex
> > > than they should be (see below).
> > >
> > > Could you explain the benefit of having them within the same service,
> > > rather than two services with one consuming the other ?
> >
> > And for the record, i dont think that Trove is the perfect fit for it today. We are still working on a clustering API. But when we create it, i would love the Savanna team's input, so we can try to make a pluggable API thats usable for people who want MySQL or Cassandra or even Hadoop. Im less a fan of a clustering library, because in the end, we will both have API calls like POST /clusters, GET /clusters, and there will be API duplication between the projects.
> >
> > I think that Cluster API (if it would be created) will be helpful not only for Trove and Savanna. NoSQL, RDBMS and Hadoop are not unique software which can be clustered. What about different kind of messaging solutions like RabbitMQ, ActiveMQ or J2EE containers like JBoss, Weblogic and WebSphere, which often are installed in clustered mode. Messaging, databases, J2EE containers and Hadoop have their own management cycle. It will be confusing to make Cluster API a part of Trove which has different mission - database management and provisioning.
>
> Are you suggesting a 3rd program, cluster as a service? Trove is trying to target a generic enough™ API to tackle different technologies with plugins or some sort of extensions. This will include a scheduler to determine rack awareness. Even if we decide that both Savanna and Trove need their own API for building clusters, I still want to understand what makes the Savanna API and implementation different, and how Trove can build an API/system that can encompass multiple datastore technologies. So regardless of how this shakes out, I would urge you to go to the Trove clustering summit session [1] so we can share ideas.
>
Kudos to Trove for pushing forward on their Heat implementation. I'd
like to see Savannah go in the same direction. I read the "why not heat"
and it is all a bug list for Heat. Lets fix those bugs so that the next
clusterable solution that needs a simplified API can just grab Heat and
get it done without a special domain specific orchestration backend.
If the backend were shared, would we care so much that there is no common
"clustering" imperative API for users?
This way Savanna's API is focused on helping users solve their "data
processing" problems, and Trove is focused on helping users solve their
"data storage" problems. And if users need to build a cluster of things
that don't exist yet as a handy simplified API, Heat is there for them
as a general purpose tool for building clusters.
More information about the OpenStack-dev
mailing list