[openstack-dev] [trove] Adding support for HBase in Trove

Amrith Kumar amrith at tesora.com
Thu Jan 7 16:59:55 UTC 2016


Michael, Pete, please see comments interspersed below.

>From the things that you and Pete (Peter MacKinnon) are saying, I don't understand why there is an objection to accepting the currently proposed implementation which is clearly for single node deployments? Both Standalone and Pseudo-Distributed are by definition, explicitly, necessarily, absolutely, positively, definitely single node. I can't be more explicit about that. That's all that is being proposed at this time. See more comments below.

Further, the current proposal also chooses an implementation strategy that makes it much easier to handle fully-distributed in a different way in the future. Consider this, Trove could equally well have dealt with HBase using a single datastore for all operating modes. In the current implementation, one would create a HBase standalone instance using a command that included:

	--datastore hbase-standalone 

And a pseudo-distributed instance by including

	--datastore hbase-pseudo-distributed.

Trove could equally well function by having a single datastore (hbase) but this would make hbase-fully-distributed harder to do in a different way in the future. I consciously eschewed that path, for this very specific reason; it would limit choice in the future.

Now, the implementation behind hbase-fully-distributed could be a custom Trove guest agent that could (if we decided to go that route) interact with Sahara. However, an alternative implementation of hbase-fully-distributed could orchestrate everything natively in Trove. There is much flexibility in the current proposal, and I submit to you that this is being lost in your reading of the specification and the current implementation as proposed.

-amrith

> -----Original Message-----
> From: michael mccune [mailto:msm at redhat.com]
> Sent: Thursday, January 07, 2016 11:18 AM
> To: openstack-dev at lists.openstack.org
> Subject: Re: [openstack-dev] [trove] Adding support for HBase in Trove
> 
> thanks for bringing this up Amrith,
> 
> On 01/06/2016 07:31 PM, Fox, Kevin M wrote:
> > Having a simple plugin that doesn't depend on all of Sahara, for the case a
> user only wants a single node HBase does make sense. Its much easier for an
> Op to support that case if thats all their users ever want. But, thats probably
> as far as that plugin ever should go. If you need scale up/down, etc, then
> your starting to reimplement large swaths of Sahara, and like the Cinder
> plugin for Nova, there could be a plugin that works identically to the stand
> alone one that converts the same api over to a Sahara compatible one. You
> then farm the work over to Sahara.
> 
> i think this sounds reasonable, as long as we are limiting it to standalone
> mode. if the deployments start to take on a larger scope i agree it would be
> useful to leverage sahara for provisioning and scaling.

Why only standalone? The current proposal explicitly covers only standalone and pseudo-distributed which are both valid strictly (add other adjectives here to taste) single node topologies and the currently submitted specification specifically carves out fully-distributed operation as requiring further thought and contemplation. 

> 
> as the hbase installation grows beyond the standalone mode there will
> necessarily need to be hdfs and zookeeper support to allow for a proper
> production deployment. this also brings up questions of allowing the end-
> users to supply configurations for the hdfs and zookeeper processes, not to
> mention enabling support for high availability hdfs.

These are things that Trove already addresses, albeit in a different way than Sahara. Users can, as it turns out, specify configuration groups which can then be used to launch new instances, and can also be associated with groups of instances.
 
> 
> i can envision a scenario where trove could use sahara to provision and
> manage the clusters for hbase/hdfs/zk. this does pose some questions as
> we'd have to determine how the trove guest agent would be installed on the
> nodes, if there will need to be custom configurations used by trove, and if
> sahara will need to provide a plugin for bare (meaning no data processing
> framework) hbase/hdfs/zk clusters. but, i think these could be solved by
> either using custom images or a plugin in sahara that would install the
> necessary agents/configurations.

Let us not underestimate the effort for an end user to now deploy one more project. To a user already using Trove for a myriad of databases, requiring Sahara for supporting HBase Standalone sounds (to put it bluntly) a burden. Requiring it for Fully-Distributed mode may have some development benefits but it remains to be seen whether those benefits are really worth the contortions that Trove would have to go through. And in the Trove architecture, there is flexibility as described above to have multiple possible implementations for fully-distributed, one that would interface with Sahara and another that didn't have to. 

Let's be clear that for a person who wants a fully configurable Hadoop based deployment with more control, Sahara may be the best option. And to one who wants even more control, maybe doing it themselves with Nova and customer Glance Images is the way to go. Similarly, a Database-as-a-Service comes with the understood boundaries imposed by the "as-a-Service" deployment. Not all configuration options may be tweakable with a DBaaS, that's well known an understood, not just in Trove but also, for example, in Amazon RDS, RedShift or any of the other database-as-a-service implementations. The same would be true in fully-distributed as well, in the proposal that is currently under review. I submit to you that this nuance is being lost in your reading.

> 
> of course, this does add a layer of complexity as operators who wish this type
> of deployment will need to have both trove and sahara, but imo this would
> be easier than replicating the work that sahara has done with these
> technologies.

I think this is where our opinions differ, as the 'replication' isn't all that much given the fact that Trove already provides capabilities to cluster databases. But, with that said, nothing in the current specification locks us into a specific deployment strategy in the future, nor does it preclude multiple implementations of fully-distributed, one which could leverage Sahara and one which didn't.

> 
> regards,
> mike
> 
> __________________________________________________________
> ________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-
> request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list