[openstack-dev] [trove] Adding support for HBase in Trove

Peter MacKinnon pmackinn at redhat.com
Thu Jan 7 15:44:37 UTC 2016


On 1/6/16 8:20 PM, Amrith Kumar wrote:
> Kevin Fox writes:
>
>> as far as that plugin ever should go. If you need scale up/down, etc, then
>> your starting to reimplement large swaths of Sahara, and like the Cinder
>> plugin for Nova, there could be a plugin that works identically to the stand
>> alone one that converts the same api over to a Sahara compatible one. You
>> then farm the work over to Sahara.
> I believe that this is not the case. The entire framework for integration with Cinder, Nova etc., already exists in Trove.
>
> Recall that trove already deals with about a dozen databases, several of which have support for clusters.
>
> The code to add HBase support to trove doesn't have to implement all of this framework that already exists.
>
> All that is being implemented is (literally) a Trove 'plugin' for HBase and a mechanism to build a HBase guest image.
>
> -amrith

Right, I think that's the concern. A plugin for integration with a 
standalone/pseudo-distributed Hbase deployment has arguably a reasonable 
scale to be managed by a Trove guestagent. That agent would also fire up 
the client RPC services necessary for an end user to interact with Hbase 
remotely. But even the Hbase project views standalone mode as a 
devel/test capability only. The fully distributed model gets orders of 
magnitude more complex. Is the agent plugin just wiring into an existing 
multi-node Hbase deployment somewhere? Is it spawning/growing/shrinking 
HDFS endpoints itself?

The "we already have cluster support in Trove" argument doesn't really 
track in a production Hadoop space, IMHO. That's why Sahara was developed.

My $0.02,
\Pete

>
>> -----Original Message-----
>> From: Fox, Kevin M [mailto:Kevin.Fox at pnnl.gov]
>> Sent: Wednesday, January 06, 2016 7:32 PM
>> To: OpenStack Development Mailing List (not for usage questions)
>> <openstack-dev at lists.openstack.org>
>> Subject: Re: [openstack-dev] [trove] Adding support for HBase in Trove
>>
>> just my 2 cents... I think you can do both. The great thing about Trove is that
>> its providing an abstract api so users just deal with provisioning db's, scaling
>> db's, etc.
>>
>> Having a simple plugin that doesn't depend on all of Sahara, for the case a
>> user only wants a single node HBase does make sense. Its much easier for an
>> Op to support that case if thats all their users ever want. But, thats probably
>> as far as that plugin ever should go. If you need scale up/down, etc, then
>> your starting to reimplement large swaths of Sahara, and like the Cinder
>> plugin for Nova, there could be a plugin that works identically to the stand
>> alone one that converts the same api over to a Sahara compatible one. You
>> then farm the work over to Sahara.
>>
>> Then, its up to the ops to choose features and the overhead of supporting
>> Sahara, or not, and you don't have to support implementing a whole cluster
>> management system for Trove that already exists.
>>
>> Thanks,
>> Kevin
>> ________________________________________
>> From: Amrith Kumar [amrith at tesora.com]
>> Sent: Wednesday, January 06, 2016 3:15 PM
>> To: OpenStack Development Mailing List (not for usage questions)
>> Subject: [openstack-dev] [trove] Adding support for HBase in Trove
>>
>> TL;DR Should Trove treat HBase as a special database because one use case is
>> as part of a large multi-node Hadoop cluster, and therefore either not
>> support it at all, or necessarily use Sahara to provision and manage a cluster?
>> There are pro's and con's and it is argued that the con's outweigh the pro's
>> and a blueprint/specification, and an implementation for basic Trove support
>> for HBase independent of Sahara has been submitted for review. See [3], [4]
>> and [5]. The benefits include the ability to provide the commonly used (in
>> development) standalone mode operation, and eliminate the dependency
>> on an additional OpenStack project thereby simplifying deployment.
>> Comments and feedback are welcome on the implementation, as well as the
>> specification and the approach.
>>
>> The long version follows below.
>>
>> The OpenStack Trove mission is to provide scalable and reliable Cloud
>> Database as a Service provisioning functionality for both relational and non-
>> relational database engines, and to continue to improve its fully-featured
>> and extensible open source framework [1].
>>
>> An important aspect of the Trove value proposition is that it provides a
>> common control plane, a common API, and a common set of abstractions are
>> used to manage a number of different relational, and non-relational
>> database technologies. The common API contains primitives to create
>> database instances and clusters of a number of databases including MySQL
>> (MariaDB, Percona too), PostgreSQL, MongoDB, Cassandra, CouchDB,
>> Couchbase, IBM DB2, Vertica, and Redis.
>>
>> Cluster support is also available for a number of databases including
>> MongoDB, Percona XtraDB cluster and Vertica, with more to come
>> imminently.
>>
>> In effect, Trove is a framework for provisioning and managing the lifecycle of
>> a number of different database technologies; it provides only the control
>> plane. Users can do things like provisioning instances and clusters, resizing
>> them, taking backups and creating new instances and clusters from previous
>> backups, establish and manage complex topologies including replication and
>> clustering, and resize instances and clusters.
>>
>> Trove does interfere with the data plane, the applications interact directly
>> with the database using the native API's for each database technology.
>>
>> Users of OpenStack look to Trove to provide a consistent set of interfaces for
>> managing their database resources in a variety of use-cases ranging from
>> small-scale prototyping, development, testing, and all the way through
>> production. Apache HBase is an open-source, distributed, versioned, non-
>> relational database [2] and users of HBase face many of the challenges that
>> Trove addresses for other databases. Therefore adding support for HBase in
>> Trove seems not only reasonable, but also consistent with the goal of the
>> (Trove) project.
>>
>> A spec proposing the addition of HBase support for Trove was submitted [3]
>> and a first phase of code implementing this HBase support has also been
>> submitted for review [4], [5]. The process that has been followed is
>> consistent with other Trove datastores; add basic support and then
>> progressively augment it in subsequent releases. The code submitted allows
>> you to provision an HBase instance (which will launch on a Nova instance),
>> build an HBase guest image using the elements provided, resize the storage
>> and the instance, take a "backup" of the instance and store that backup on
>> Swift, and at a later time you can launch a new instance from that "backup".
>>
>> One can operate HBase with or without HDFS; in fact HBase documents the
>> standalone mode of operation [6] where HBase is completely operational on
>> a single node and data is stored on the local file system. This standalone
>> mode provides a very useful construct for development and testing, and at a
>> later stage an application can be seamlessly migrated to work with an HBase
>> installation of some other "run mode" like "Fully Distributed".
>>
>> Code submitted in [4] and [5] as described in [3] implement support for two
>> modes of operation namely "Standalone" and "Pseudo-Distributed". At a
>> later stage, support will be added for "Fully Distributed" consistent with the
>> way in which clustering support was delivered for other datastores like
>> MySQL and MongoDB.
>>
>> Some have opined that Trove should not directly get into the business of
>> orchestrating Hadoop Clusters or anything to do with HBase, arguing that this
>> is something that Sahara already does, and should remain the sole domain of
>> Sahara.
>>
>> I believe that since HBase is perfectly operable without HDFS, it seems
>> inappropriate to tightly couple HBase with Sahara whose primary motivation
>> is to provision 'data-intensive application clusters' [7]. Furthermore, as we
>> have found with other datastores, it is my belief that having a common
>> implementation model across multiple deployment topologies is a benefit for
>> Trove. Other considerations such as similarity to other databases supported
>> by Trove motivated a choice as illustrated in the specification. An architecture
>> where Trove can function entirely independent of Sahara is also a benefit for
>> end users, and a model where Trove has dependencies only on other core
>> OpenStack services considerably simplifies the deployment.
>>
>> Comments and feedback are welcome on the code, as well as the
>> specification and the approach.
>>
>> References:
>>
>> [1] https://wiki.openstack.org/wiki/Trove#Mission_Statement
>> [2] https://hbase.apache.org/
>> [3] https://review.openstack.org/#/c/256079
>> [4] https://review.openstack.org/#/c/262048/
>> [5] https://review.openstack.org/#/c/262815/
>> [6] http://hbase.apache.org/0.94/book/standalone_dist.html
>> [7] https://wiki.openstack.org/wiki/Sahara
>>
>> Thanks,
>>
>> -amrith
>>
>> --
>> Amrith Kumar, CTO                   | amrith at tesora.com
>> Tesora, Inc                         | @amrithkumar
>> 125 CambridgePark Drive, Suite 400  | http://www.tesora.com
>> Cambridge, MA. 02140                |
>>
>>
>>
>>
>>
>>
>>
>> __________________________________________________________
>> ________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-
>> request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>> __________________________________________________________
>> ________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-
>> request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




More information about the OpenStack-dev mailing list