[openstack-dev] TC Meeting / Savanna Incubation Follow-Up

Alexander Kuznetsov akuznetsov at mirantis.com
Fri Sep 13 17:00:43 UTC 2013


Hadoop Ecosystem is not only datastore technologies. Hadoop has other
components:  Map Reduce framework, distributed coordinator - Zookepeer,
workflow management - Oozie, runtime for scripting languages - Hive and
Pig, scalable machine learning library - Apache Mahout. All this components
are tightly coupled together and datastore part can't be considered
separately, from other component. This is a the main reason why for Hadoop
installation and management are required  a separate solution, distinct
from generic enough™ datastore API. In the other case, this API will
contain a huge part, not relating to datastore technologies.


On Fri, Sep 13, 2013 at 8:17 PM, Michael Basnight <mbasnight at gmail.com>wrote:

>
> On Sep 13, 2013, at 9:05 AM, Alexander Kuznetsov wrote:
>
> >
> >
> >
> > On Fri, Sep 13, 2013 at 7:26 PM, Michael Basnight <mbasnight at gmail.com>
> wrote:
> > On Sep 13, 2013, at 6:56 AM, Alexander Kuznetsov wrote:
> > > On Thu, Sep 12, 2013 at 7:30 PM, Michael Basnight <mbasnight at gmail.com>
> wrote:
> > > On Sep 12, 2013, at 2:39 AM, Thierry Carrez wrote:
> > >
> > > > Sergey Lukjanov wrote:
> > > >
> > > >> [...]
> > > >> As you can see, resources provisioning is just one of the features
> and the implementation details are not critical for overall architecture.
> It performs only the first step of the cluster setup. We’ve been
> considering Heat for a while, but ended up direct API calls in favor of
> speed and simplicity. Going forward Heat integration will be done by
> implementing extension mechanism [3] and [4] as part of Icehouse release.
> > > >>
> > > >> The next part, Hadoop cluster configuration, already extensible and
> we have several plugins - Vanilla, Hortonworks Data Platform and Cloudera
> plugin started too. This allow to unify management of different Hadoop
> distributions under single control plane. The plugins are responsible for
> correct Hadoop ecosystem configuration at already provisioned resources and
> use different Hadoop management tools like Ambari to setup and configure
> all cluster  services, so, there are no actual provisioning configs on
> Savanna side in this case. Savanna and its plugins encapsulate the
> knowledge of Hadoop internals and default configuration for Hadoop services.
> > > >
> > > > My main gripe with Savanna is that it combines (in its upcoming
> release)
> > > > what sounds like to me two very different services: Hadoop cluster
> > > > provisioning service (like what Trove does for databases) and a
> > > > MapReduce+ data API service (like what Marconi does for queues).
> > > >
> > > > Making it part of the same project (rather than two separate
> projects,
> > > > potentially sharing the same program) make discussions about shifting
> > > > some of its clustering ability to another library/project more
> complex
> > > > than they should be (see below).
> > > >
> > > > Could you explain the benefit of having them within the same service,
> > > > rather than two services with one consuming the other ?
> > >
> > > And for the record, i dont think that Trove is the perfect fit for it
> today. We are still working on a clustering API. But when we create it, i
> would love the Savanna team's input, so we can try to make a pluggable API
> thats usable for people who want MySQL or Cassandra or even Hadoop. Im less
> a fan of a clustering library, because in the end, we will both have API
> calls like POST /clusters, GET /clusters, and there will be API duplication
> between the projects.
> > >
> > > I think that Cluster API (if it would be created) will be helpful not
> only for Trove and Savanna.  NoSQL, RDBMS and Hadoop are not unique
> software which can be clustered. What about different kind of messaging
> solutions like RabbitMQ, ActiveMQ or J2EE containers like JBoss, Weblogic
> and WebSphere, which often are installed in clustered mode. Messaging,
> databases, J2EE containers and Hadoop have their own management cycle. It
> will be confusing to make Cluster API a part of Trove which has different
> mission - database management and provisioning.
> >
> > Are you suggesting a 3rd program, cluster as a service? Trove is trying
> to target a generic enough™ API to tackle different technologies with
> plugins or some sort of extensions. This will include a scheduler to
> determine rack awareness. Even if we decide that both Savanna and Trove
> need their own API for building clusters, I still want to understand what
> makes the Savanna API and implementation different, and how Trove can build
> an API/system that can encompass multiple datastore technologies. So
> regardless of how this shakes out, I would urge you to go to the Trove
> clustering summit session [1] so we can share ideas.
> >
> > Generic enough™ API shouldn't contain a database specific calls like
> backups and restore (already in Trove).  Why we need a backup and restore
> operations for J2EE or messaging solutions?
>
> I dont mean to encompass J2EE or messaging solutions. Let me amend my
> email to say "to tackle different datastore technologies". But going with
> this point… Do you not need to backup things in a J2EE container? Id assume
> a backup is needed by all clusters, personally. I would not like a system
> that didnt have a way to backup and restore "things" in my cluster.
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20130913/587d6511/attachment.html>


More information about the OpenStack-dev mailing list