<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><br><div><div>On Sep 12, 2013, at 10:30 AM, Michael Basnight wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div>On Sep 12, 2013, at 2:39 AM, Thierry Carrez wrote:<br><br><blockquote type="cite">Sergey Lukjanov wrote:<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite"><blockquote type="cite">[...]<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">As you can see, resources provisioning is just one of the features and the implementation details are not critical for overall architecture. It performs only the first step of the cluster setup. We’ve been considering Heat for a while, but ended up direct API calls in favor of speed and simplicity. Going forward Heat integration will be done by implementing extension mechanism [3] and [4] as part of Icehouse release.<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">The next part, Hadoop cluster configuration, already extensible and we have several plugins - Vanilla, Hortonworks Data Platform and Cloudera plugin started too. This allow to unify management of different Hadoop distributions under single control plane. The plugins are responsible for correct Hadoop ecosystem configuration at already provisioned resources and use different Hadoop management tools like Ambari to setup and configure all cluster services, so, there are no actual provisioning configs on Savanna side in this case. Savanna and its plugins encapsulate the knowledge of Hadoop internals and default configuration for Hadoop services.<br></blockquote></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">My main gripe with Savanna is that it combines (in its upcoming release)<br></blockquote><blockquote type="cite">what sounds like to me two very different services: Hadoop cluster<br></blockquote><blockquote type="cite">provisioning service (like what Trove does for databases) and a<br></blockquote><blockquote type="cite">MapReduce+ data API service (like what Marconi does for queues).<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Making it part of the same project (rather than two separate projects,<br></blockquote><blockquote type="cite">potentially sharing the same program) make discussions about shifting<br></blockquote><blockquote type="cite">some of its clustering ability to another library/project more complex<br></blockquote><blockquote type="cite">than they should be (see below).<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Could you explain the benefit of having them within the same service,<br></blockquote><blockquote type="cite">rather than two services with one consuming the other ?<br></blockquote><br>And for the record, i dont think that Trove is the perfect fit for it today. We are still working on a clustering API. But when we create it, i would love the Savanna team's input, so we can try to make a pluggable API thats usable for people who want MySQL or Cassandra or even Hadoop. Im less a fan of a clustering library, because in the end, we will both have API calls like POST /clusters, GET /clusters, and there will be API duplication between the projects.<br></div></blockquote><div><br></div><div><br></div><div><div>+1. I am looking at the new cluster provisioning API in Trove [1] and the one in Savanna [2], and they look quite different right now. Definitely some collaboration is needed even the API spec, not just the backend.</div><div><br></div><div>[1] <a href="https://wiki.openstack.org/wiki/Trove-Replication-And-Clustering-API#POST_.2Fclusters">https://wiki.openstack.org/wiki/Trove-Replication-And-Clustering-API#POST_.2Fclusters</a></div><div>[2] <a href="https://savanna.readthedocs.org/en/latest/userdoc/rest_api_v1.0.html#start-cluster">https://savanna.readthedocs.org/en/latest/userdoc/rest_api_v1.0.html#start-cluster</a></div><div><br></div></div><br><blockquote type="cite"><div><br><blockquote type="cite"><br></blockquote><blockquote type="cite"><blockquote type="cite">The next topic is “Cluster API”.<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">The concern that was raised is how to extract general clustering functionality to the common library. Cluster provisioning and management topic currently relevant for a number of projects within OpenStack ecosystem: Savanna, Trove, TripleO, Heat, Taskflow.<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">Still each of the projects has their own understanding of what the cluster provisioning is. The idea of extracting common functionality sounds reasonable, but details still need to be worked out. <br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">I’ll try to highlight Savanna team current perspective on this question. Notion of “Cluster management” in my perspective has several levels:<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">1. Resources provisioning and configuration (like instances, networks, storages). Heat is the main tool with possibly additional support from underlying services. For example, instance grouping API extension [5] in Nova would be very useful. <br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">2. Distributed communication/task execution. There is a project in OpenStack ecosystem with the mission to provide a framework for distributed task execution - TaskFlow [6]. It’s been started quite recently. In Savanna we are really looking forward to use more and more of its functionality in I and J cycles as TaskFlow itself getting more mature.<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">3. Higher level clustering - management of the actual services working on top of the infrastructure. For example, in Savanna configuring HDFS data nodes or in Trove setting up MySQL cluster with Percona or Galera. This operations are typical very specific for the project domain. As for Savanna specifically, we use lots of benefits of Hadoop internals knowledge to deploy and configure it properly.<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">Overall conclusion it seems to be that it make sense to enhance Heat capabilities and invest in Taskflow development, leaving domain-specific operations to the individual projects.<br></blockquote></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">The thing we'd need to clarify (and the incubation period would be used<br></blockquote><blockquote type="cite">to achieve that) is how to reuse as much as possible between the various<br></blockquote><blockquote type="cite">cluster provisioning projects (Trove, the cluster side of Savanna, and<br></blockquote><blockquote type="cite">possibly future projects). Solution can be to create a library used by<br></blockquote><blockquote type="cite">Trove and Savanna, to extend Heat, to make Trove the clustering thing<br></blockquote><blockquote type="cite">beyond just databases...<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">One way of making sure smart and non-partisan decisions are taken in<br></blockquote><blockquote type="cite">that area would be to make Trove and Savanna part of the same program,<br></blockquote><blockquote type="cite">or make the clustering part of Savanna part of the same program as<br></blockquote><blockquote type="cite">Trove, while the data API part of Savanna could live separately (hence<br></blockquote><blockquote type="cite">my question about two different projects vs. one project above).<br></blockquote><br>Trove is not, nor will be, a data API. Id like to keep Savanna in its own program, but I could easily see them as being a big data / data processing program, while Trove is a cluster provisioning / scaling / administration / "keep it online" program.<br><br><blockquote type="cite"><br></blockquote><blockquote type="cite"><blockquote type="cite">I also would like to emphasize that in Savanna Hadoop cluster management is already implemented including scaling support.<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">With all this I do believe Savanna fills an important gap in OpenStack by providing Data Processing capabilities in cloud environment in general and integration with Hadoop ecosystem as the first particular step. <br></blockquote></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">For incubation we bless the goal of the project and the promise that it<br></blockquote><blockquote type="cite">will integrate well with the other existing projects. A<br></blockquote><blockquote type="cite">perfectly-working project can stay in incubation until it achieves<br></blockquote><blockquote type="cite">proper integration and avoids duplication of functionality with other<br></blockquote><blockquote type="cite">integrated projects. A perfectly-working project can also happily live<br></blockquote><blockquote type="cite">outside of OpenStack integrated release if it prefers a more standalone<br></blockquote><blockquote type="cite">approach.<br></blockquote><br>A good example. Our instance provisioning was also implemented in Trove, but the goal is to use Heat. So the TC asked us to use Heat for instance provisioning, and we outlined a set of goals to achieve before we went to Integrated status.<br><br><blockquote type="cite">I think there is value in having Savanna in incubation so that we can<br></blockquote><blockquote type="cite">explore those avenues of collaboration between projects. It may take<br></blockquote><blockquote type="cite">more than one cycle of incubation to get it right (in fact, I would not<br></blockquote><blockquote type="cite">be surprised at all if it took us more than one cycle to properly<br></blockquote><blockquote type="cite">separate the roles between Trove / Taskflow / heat / clusterlib). During<br></blockquote><blockquote type="cite">this exploration, Savanna devs may also decide that integration is very<br></blockquote><blockquote type="cite">costly and that their immediate time is better spent adding key<br></blockquote><blockquote type="cite">features, and drop from the incubation track. But in all cases,<br></blockquote><blockquote type="cite">incubation sounds like the right first step to get everyone around the<br></blockquote><blockquote type="cite">same table.<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">-- <br></blockquote><blockquote type="cite">Thierry Carrez (ttx)<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">_______________________________________________<br></blockquote><blockquote type="cite">OpenStack-dev mailing list<br></blockquote><blockquote type="cite"><a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br></blockquote><blockquote type="cite"><a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br></blockquote><br>_______________________________________________<br>OpenStack-dev mailing list<br><a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev<br></div></blockquote></div><br></body></html>