[openstack-dev] TC Meeting / Savanna Incubation Follow-Up

Sergey Lukjanov slukjanov at mirantis.com
Wed Sep 18 23:53:48 UTC 2013


Hi folks,

I have few comments on Hadoop cluster provisioning in Savanna.

Now Savanna provisions instances, install management console (like Apache Ambari) on one them and communicate with it using REST API of the installed console to prepare and run all requested services at all instances. So, the only provisioning that we're doing in Savanna is the instance, volumes creation and their initial configuration like /etc/hosts generation for all instances. The most part of these operations or even all of them should be eventually removed by Heat integration during the potential incubation in Icehouse cycle, so, after it we'll be concentrated at EDP (Elastic Data Processing) operations.

I was surprised how much time was spent on clustering discussion at the last TC meeting and that there was a small amount of other questions. So, I think that it'll be better to separate clustering discussion that is a long-term activity with plans to be discussed during the design summit and Savanna incubation request that should be finally discussed at the next TC meeting. Of course, I think that it's a right way for Savanna to participate clustering discussions. From our perspective, clustering should be implemented as additional functionality in underlying services like Nova, Cinder, Heat and libraries - Oslo, Taskflow, that will help projects like Savanna, Trove and etc. to provisioning resources for clusters, scale and terminate them. So, our role in it is to collaborate on such features implementation. One more interesting idea - clustering API standardization, it sounds interesting, but it looks like that such APIs could be very different, for example, our current working API [0] and Trove's draft for Cluster API [1].

I also would like to ensure that Savanna team is 100% behind the idea of doing full integration with all applicable OpenStack projects during incubation.

Thanks.

[0] https://savanna.readthedocs.org/en/latest/userdoc/rest_api_v1.0.html#node-group-templates
[1] https://wiki.openstack.org/wiki/Trove-Replication-And-Clustering-API

Sincerely yours,
Sergey Lukjanov
Savanna Technical Lead
Mirantis Inc.

On Sep 13, 2013, at 22:35, Clint Byrum <clint at fewbar.com> wrote:

> Excerpts from Michael Basnight's message of 2013-09-13 08:26:07 -0700:
>> On Sep 13, 2013, at 6:56 AM, Alexander Kuznetsov wrote:
>>> On Thu, Sep 12, 2013 at 7:30 PM, Michael Basnight <mbasnight at gmail.com> wrote:
>>> On Sep 12, 2013, at 2:39 AM, Thierry Carrez wrote:
>>> 
>>>> Sergey Lukjanov wrote:
>>>> 
>>>>> [...]
>>>>> As you can see, resources provisioning is just one of the features and the implementation details are not critical for overall architecture. It performs only the first step of the cluster setup. We’ve been considering Heat for a while, but ended up direct API calls in favor of speed and simplicity. Going forward Heat integration will be done by implementing extension mechanism [3] and [4] as part of Icehouse release.
>>>>> 
>>>>> The next part, Hadoop cluster configuration, already extensible and we have several plugins - Vanilla, Hortonworks Data Platform and Cloudera plugin started too. This allow to unify management of different Hadoop distributions under single control plane. The plugins are responsible for correct Hadoop ecosystem configuration at already provisioned resources and use different Hadoop management tools like Ambari to setup and configure all cluster  services, so, there are no actual provisioning configs on Savanna side in this case. Savanna and its plugins encapsulate the knowledge of Hadoop internals and default configuration for Hadoop services.
>>>> 
>>>> My main gripe with Savanna is that it combines (in its upcoming release)
>>>> what sounds like to me two very different services: Hadoop cluster
>>>> provisioning service (like what Trove does for databases) and a
>>>> MapReduce+ data API service (like what Marconi does for queues).
>>>> 
>>>> Making it part of the same project (rather than two separate projects,
>>>> potentially sharing the same program) make discussions about shifting
>>>> some of its clustering ability to another library/project more complex
>>>> than they should be (see below).
>>>> 
>>>> Could you explain the benefit of having them within the same service,
>>>> rather than two services with one consuming the other ?
>>> 
>>> And for the record, i dont think that Trove is the perfect fit for it today. We are still working on a clustering API. But when we create it, i would love the Savanna team's input, so we can try to make a pluggable API thats usable for people who want MySQL or Cassandra or even Hadoop. Im less a fan of a clustering library, because in the end, we will both have API calls like POST /clusters, GET /clusters, and there will be API duplication between the projects.
>>> 
>>> I think that Cluster API (if it would be created) will be helpful not only for Trove and Savanna.  NoSQL, RDBMS and Hadoop are not unique software which can be clustered. What about different kind of messaging solutions like RabbitMQ, ActiveMQ or J2EE containers like JBoss, Weblogic and WebSphere, which often are installed in clustered mode. Messaging, databases, J2EE containers and Hadoop have their own management cycle. It will be confusing to make Cluster API a part of Trove which has different mission - database management and provisioning.
>> 
>> Are you suggesting a 3rd program, cluster as a service? Trove is trying to target a generic enough™ API to tackle different technologies with plugins or some sort of extensions. This will include a scheduler to determine rack awareness. Even if we decide that both Savanna and Trove need their own API for building clusters, I still want to understand what makes the Savanna API and implementation different, and how Trove can build an API/system that can encompass multiple datastore technologies. So regardless of how this shakes out, I would urge you to go to the Trove clustering summit session [1] so we can share ideas.
>> 
> 
> Kudos to Trove for pushing forward on their Heat implementation. I'd
> like to see Savannah go in the same direction. I read the "why not heat"
> and it is all a bug list for Heat. Lets fix those bugs so that the next
> clusterable solution that needs a simplified API can just grab Heat and
> get it done without a special domain specific orchestration backend.
> 
> If the backend were shared, would we care so much that there is no common
> "clustering" imperative API for users?
> 
> This way Savanna's API is focused on helping users solve their "data
> processing" problems, and Trove is focused on helping users solve their
> "data storage" problems. And if users need to build a cluster of things
> that don't exist yet as a handy simplified API, Heat is there for them
> as a general purpose tool for building clusters.
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




More information about the OpenStack-dev mailing list