[openstack-dev] [trove] [sahara] Adding support for HBase in Trove

Amrith Kumar amrith at tesora.com
Fri Jan 8 13:34:59 UTC 2016


As Kevin suggests, I'm adding [sahara] to the subject line. 

Others in sahara who now see this thread, apologies for sending you a delayed invitation to the party. There's still lots of food and beer so come on in!

-amrith

> -----Original Message-----
> From: Fox, Kevin M [mailto:Kevin.Fox at pnnl.gov]
> Sent: Thursday, January 07, 2016 7:32 PM
> To: OpenStack Development Mailing List (not for usage questions)
> <openstack-dev at lists.openstack.org>
> Subject: Re: [openstack-dev] [trove] Adding support for HBase in Trove
> 
> While I applaud raising the issue on the mailing list to get more folks to weigh
> in, I think part of the problem maybe the lack of a [sahara] tag on the subject.
> The thread is still tagged to be a Trove centric conversation. All respondents
> please consider adding [sahara] to the subject.
> 
> Thanks,
> Kevin
> ________________________________________
> From: Amrith Kumar [amrith at tesora.com]
> Sent: Thursday, January 07, 2016 1:59 PM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [trove] Adding support for HBase in Trove
> 
> > -----Original Message-----
> > From: michael mccune [mailto:msm at redhat.com]
> > Sent: Thursday, January 07, 2016 3:12 PM
> > To: openstack-dev at lists.openstack.org
> > Subject: Re: [openstack-dev] [trove] Adding support for HBase in Trove
> >
> > On 01/07/2016 11:59 AM, Amrith Kumar wrote:
> > >  From the things that you and Pete (Peter MacKinnon) are saying, I
> > > don't
> > understand why there is an objection to accepting the currently
> > proposed implementation which is clearly for single node deployments?
> > Both Standalone and Pseudo-Distributed are by definition, explicitly,
> > necessarily, absolutely, positively, definitely single node. I can't
> > be more explicit about that. That's all that is being proposed at this
> > time. See more comments below.
> >
> > i didn't think i explicitly objected to the spec, if it seems that way
> > then i apologize. after reading the spec and the comments, it seemed
> > that there was some question about engagement with the sahara team. i
> > wanted to help bring some light to the issues surrounding deploying
> > hbase and thought it would be good to participate in the discussion.
> 
> You are correct Michael. There was a suggestion that we should engage with
> the Sahara team (in the Trove team meeting yesterday) and that is what
> prompted this email thread. So I appreciate your participation as one who is a
> member of the Sahara team.
> 
> >
> > > Further, the current proposal also chooses an implementation
> > > strategy that
> > makes it much easier to handle fully-distributed in a different way in
> > the future. Consider this, Trove could equally well have dealt with
> > HBase using a single datastore for all operating modes. In the current
> > implementation, one would create a HBase standalone instance using a
> command that included:
> > >
> > >     --datastore hbase-standalone
> > >
> > > And a pseudo-distributed instance by including
> > >
> > >     --datastore hbase-pseudo-distributed.
> > >
> >
> > and this delineation sounds reasonable to me
> >
> > > Trove could equally well function by having a single datastore
> > > (hbase) but
> > this would make hbase-fully-distributed harder to do in a different
> > way in the future. I consciously eschewed that path, for this very
> > specific reason; it would limit choice in the future.
> >
> > agreed
> >
> > > Now, the implementation behind hbase-fully-distributed could be a
> > custom Trove guest agent that could (if we decided to go that route)
> > interact with Sahara. However, an alternative implementation of
> > hbase-fully- distributed could orchestrate everything natively in
> > Trove. There is much flexibility in the current proposal, and I submit
> > to you that this is being lost in your reading of the specification
> > and the current implementation as proposed.
> >
> > i don't think your characterization of my reading comprehension is fair.
> > as i stated earlier, i wanted to participate in the discussion
> > surrounding deploying a technology that sahara currently deploys.
> > fwiw, i agree with what you are saying here, but i also think it is
> > axiomatic, the trove team can choose whichever path it would like for
> implementation.
> >
> > >> i think this sounds reasonable, as long as we are limiting it to
> > >> standalone mode. if the deployments start to take on a larger scope
> > >> i agree it would be useful to leverage sahara for provisioning and scaling.
> > >
> > > Why only standalone? The current proposal explicitly covers only
> > standalone and pseudo-distributed which are both valid strictly (add
> > other adjectives here to taste) single node topologies and the
> > currently submitted specification specifically carves out
> > fully-distributed operation as requiring further thought and contemplation.
> >
> > i think starting with standalone mode (and not pseudo-distributed) is
> > a more conservative approach to this. my reason for suggesting
> > limiting this to standalone is that even in pseudo-distributed mode
> > the need for managing hdfs and zookeeper are present, i wanted to
> > highlight some of of the overlap and the issues that will start to creep in
> surrounding this deployment.
> >
> 
> The current code (submitted for review) provides both standalone and
> pseudo-distributed support. You will observe that the standalone and
> pseudo-distributed implementations do install zookeeper. As you are no
> doubt aware, one of the recommended ways to force the HBase Master
> server to always bind to a well-known port in favor of the ephemeral ports is
> to stipulate  hbase.cluster.distributed is True (see
> https://review.openstack.org/#/c/262048/5/scripts/files/elements/ubuntu-
> hbase-standalone/install.d/20-install-hbase line 121). So, as it turns out, the
> code to deploy hdfs and zookeeper is already part of the proposed
> implementation.
> 
> 
> > >> as the hbase installation grows beyond the standalone mode there
> > >> will necessarily need to be hdfs and zookeeper support to allow for
> > >> a proper production deployment. this also brings up questions of
> > >> allowing the end- users to supply configurations for the hdfs and
> > >> zookeeper processes, not to mention enabling support for high
> > >> availability
> > hdfs.
> > >
> > > These are things that Trove already addresses, albeit in a different
> > > way
> > than Sahara. Users can, as it turns out, specify configuration groups
> > which can then be used to launch new instances, and can also be
> > associated with groups of instances.
> >
> > i am merely identifying issues that trove will need to reproduce, i'm
> > not deeply familiar with the configuration options that trove exposes
> > but i am guessing that it is currently not generating the
> > configurations specific to hdfs and zookeeper.
> >
> 
> It is equally important, I think, to realize that Trove doesn't have to produce a
> whole lot of new code to handle this as it already has a robust framework
> that handles a number of databases. Therefore, with a relatively small code
> footprint a prototype that will allow much more flexible configuration
> support has been prototyped (that has not been sent up for review yet). The
> majority of that code is a codec for XML, the rest of it is almost completely
> handled by the framework with the exception of a file specifying the
> configuration options that are to be supported.
> 
> Therefore, I'd like to reiterate that Trove, by its very design was intended to
> support a number of databases and therefore already has much of the
> framework in place to add support for a new database. Therefore there isn't
> a lot of new code that must be 'reproduced' to add this support.
> 
> > >> i can envision a scenario where trove could use sahara to provision
> > >> and manage the clusters for hbase/hdfs/zk. this does pose some
> > >> questions as we'd have to determine how the trove guest agent would
> > >> be installed on the nodes, if there will need to be custom
> > >> configurations used by trove, and if sahara will need to provide a
> > >> plugin for bare (meaning no data processing
> > >> framework) hbase/hdfs/zk clusters. but, i think these could be
> > >> solved by either using custom images or a plugin in sahara that
> > >> would install the necessary agents/configurations.
> > >
> > > Let us not underestimate the effort for an end user to now deploy
> > > one
> > more project. To a user already using Trove for a myriad of databases,
> > requiring Sahara for supporting HBase Standalone sounds (to put it
> > bluntly) a burden. Requiring it for Fully-Distributed mode may have
> > some development benefits but it remains to be seen whether those
> > benefits are really worth the contortions that Trove would have to go
> > through. And in the Trove architecture, there is flexibility as
> > described above to have multiple possible implementations for
> > fully-distributed, one that would interface with Sahara and another that
> didn't have to.
> >
> > i agree about the installation issues when we are talking about
> > standalone versus distributed. as for the contortions that trove may
> > have to go through to integrate with sahara, i think it would be worth
> > it, but i'm probably biased here ;)
> >
> > > Let's be clear that for a person who wants a fully configurable
> > > Hadoop
> > based deployment with more control, Sahara may be the best option. And
> > to one who wants even more control, maybe doing it themselves with
> > Nova and customer Glance Images is the way to go. Similarly, a
> > Database-as-a- Service comes with the understood boundaries imposed by
> > the "as-a- Service" deployment. Not all configuration options may be
> > tweakable with a DBaaS, that's well known an understood, not just in
> > Trove but also, for example, in Amazon RDS, RedShift or any of the
> > other database-as-a-service implementations. The same would be true in
> > fully-distributed as well, in the proposal that is currently under
> > review. I submit to you that this nuance is being lost in your reading.
> >
> > i'd like to think that for someone who wants a fully configurable
> > hadoop base deployment, sahara is the best option =)
> >
> > i think we generally agree here about the deployment of "-aaS"
> > services in openstack, and again i disagree with your characterization
> > of my reading comprehension...
> >
> > >> of course, this does add a layer of complexity as operators who
> > >> wish this type of deployment will need to have both trove and
> > >> sahara, but imo this would be easier than replicating the work that
> > >> sahara has done with these technologies.
> > >
> > > I think this is where our opinions differ, as the 'replication'
> > > isn't all that
> > much given the fact that Trove already provides capabilities to
> > cluster databases. But, with that said, nothing in the current
> > specification locks us into a specific deployment strategy in the
> > future, nor does it preclude multiple implementations of
> > fully-distributed, one which could leverage Sahara and one which didn't.
> >
> > respectfully, i think there is more effort involved with the
> > management of the pseudo-distributed mode than standalone, and that is
> > more where my comments are oriented towards. mind you, provisioning
> > might be a simple matter for trove as it stands now, but i think the
> > potential for issues could get deeper with pseudo-distributed.
> 
> Here, again, I want to point out that the issues will definitely be more with
> pseudo-distributed than with standalone. But, Trove is already a multi-
> database framework and therefore adding support for one more database
> doesn't require a whole new implementation.
> 
> >
> > i'm glad that you are open to the idea of implementations that may
> > involve other projects (namely sahara) in the future. as i said in the
> > beginning, given the comments about sahara in the spec and the review
> > i wanted to make sure we got a few more eyes on this to bring our
> experience to the table.
> 
> Absolutely, that's the intent of the ML conversation.
> 
> >
> > regards,
> > mike
> >
> >
> __________________________________________________________
> > ________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: OpenStack-dev-
> > request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> __________________________________________________________
> ________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-
> request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> __________________________________________________________
> ________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-
> request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list