[openstack-dev] [trove][all][tc] A proposal to rearchitect Trove

Amrith Kumar amrith.kumar at gmail.com
Wed Jun 21 19:40:02 UTC 2017

Thank you Kevin. Lots of container (specific?) goodness here.



Amrith Kumar
Phone: +1-978-563-9590

On Mon, Jun 19, 2017 at 2:34 PM, Fox, Kevin M <Kevin.Fox at pnnl.gov> wrote:

> Thanks for starting this difficult discussion.
> I think I agree with all the lessons learned except  the nova one. while
> you can treat containers and vm's the same, after years of using both, I
> really don't think its a good idea to treat them equally. Containers can't
> work properly if used as a vm. (really, really.)
> I agree whole heartedly with your statement that its mostly an
> orchestration problem and should reuse stuff now that there are options.
> I would propose the following that I think meets your goals and could
> widen your contributor base substantially:
> Look at the Kubernetes (k8s) concept of Operator ->
> https://coreos.com/blog/introducing-operators.html
> They allow application specific logic to be added to Kubernetes while
> reusing the rest of k8s to do what its good at. Container Orchestration.
> etcd is just a clustered database and if the operator concept works for it,
> it should also work for other databases such as Gallera.
> Where I think the containers/vm thing is incompatible is the thing I think
> will make Trove's life easier. You can think of a member of the database as
> few different components, such as:
>  * main database process
>  * metrics gatherer (such as https://github.com/prometheus/mysqld_exporter
> )
>  * trove_guest_agent
> With the current approach, all are mixed into the same vm image, making it
> very difficult to update the trove_guest_agent without touching the main
> database process. (needed when you upgrade the trove controllers). With the
> k8s sidecar concept, each would be a separate container loaded into the
> same pod.
> So rather then needing to maintain a trove image for every possible
> combination of db version, trove version, etc, you can reuse upstream
> database containers along with trove provided guest agents.
> There's a secure channel between kube-apiserver and kubelet so you can
> reuse it for secure communications. No need to add anything for secure
> communication. trove engine -> kubectl exec xxxxx-db -c guest_agent some
> command.
> There is k8s federation, so if the operator was started at the federation
> level, it can cross multiple OpenStack regions.
> Another big feature I that hasn't been mentioned yet that I think is
> critical. In our performance tests, databases in VM's have never performed
> particularly well. Using k8s as a base, bare metal nodes could be pulled in
> easily, with dedicated disk or ssd's that the pods land on that are very
> very close to the database. This should give native performance.
> So, my suggestion would be to strongly consider basing Trove v2 on
> Kubernetes. It can provide a huge bang for the buck, simplifying the Trove
> architecture substantially while gaining the new features your list as
> being important. The Trove v2 OpenStack api can be exposed as a very thin
> wrapper over k8s Third Party Resources (TPR) and would make Trove entirely
> stateless. k8s maintains all state for everything in etcd.
> Please consider this architecture.
> Thanks,
> Kevin
> ------------------------------
> *From:* Amrith Kumar [amrith.kumar at gmail.com]
> *Sent:* Sunday, June 18, 2017 4:35 AM
> *To:* OpenStack Development Mailing List (not for usage questions)
> *Subject:* [openstack-dev] [trove][all][tc] A proposal to rearchitect
> Trove
> Trove has evolved rapidly over the past several years, since integration
> in IceHouse when it only supported single instances of a few databases.
> Today it supports a dozen databases including clusters and replication.
> The user survey [1] indicates that while there is strong interest in the
> project, there are few large production deployments that are known of (by
> the development team).
> Recent changes in the OpenStack community at large (company realignments,
> acquisitions, layoffs) and the Trove community in particular, coupled with
> a mounting burden of technical debt have prompted me to make this proposal
> to re-architect Trove.
> This email summarizes several of the issues that face the project, both
> structurally and architecturally. This email does not claim to include a
> detailed specification for what the new Trove would look like, merely the
> recommendation that the community should come together and develop one so
> that the project can be sustainable and useful to those who wish to use it
> in the future.
> Trove, with support for a dozen or so databases today, finds itself in a
> bind because there are few developers, and a code-base with a significant
> amount of technical debt.
> Some architectural choices which the team made over the years have
> consequences which make the project less than ideal for deployers.
> Given that there are no major production deployments of Trove at present,
> this provides us an opportunity to reset the project, learn from our v1 and
> come up with a strong v2.
> An important aspect of making this proposal work is that we seek to
> eliminate the effort (planning, and coding) involved in migrating existing
> Trove v1 deployments to the proposed Trove v2. Effectively, with work
> beginning on Trove v2 as proposed here, Trove v1 as released with Pike will
> be marked as deprecated and users will have to migrate to Trove v2 when it
> becomes available.
> While I would very much like to continue to support the users on Trove v1
> through this transition, the simple fact is that absent community
> participation this will be impossible. Furthermore, given that there are no
> production deployments of Trove at this time, it seems pointless to build
> that upgrade path from Trove v1 to Trove v2; it would be the proverbial
> bridge from nowhere.
> This (previous) statement is, I realize, contentious. There are those who
> have told me that an upgrade path must be provided, and there are those who
> have told me of unnamed deployments of Trove that would suffer. To this,
> all I can say is that if an upgrade path is of value to you, then please
> commit the development resources to participate in the community to make
> that possible. But equally, preventing a v2 of Trove or delaying it will
> only make the v1 that we have today less valuable.
> We have learned a lot from v1, and the hope is that we can address that in
> v2. Some of the more significant things that I have learned are:
> - We should adopt a versioned front-end API from the very beginning;
> making the REST API versioned is not a ‘v2 feature’
> - A guest agent running on a tenant instance, with connectivity to a
> shared management message bus is a security loophole; encrypting traffic,
> per-tenant-passwords, and any other scheme is merely lipstick on a security
> hole
> - Reliance on Nova for compute resources is fine, but dependence on Nova
> VM specific capabilities (like instance rebuild) is not; it makes things
> like containers or bare-metal second class citizens
> - A fair portion of what Trove does is resource orchestration; don’t
> reinvent the wheel, there’s Heat for that. Admittedly, Heat wasn’t as far
> along when Trove got started but that’s not the case today and we have an
> opportunity to fix that now
> - A similarly significant portion of what Trove does is to implement a
> state-machine that will perform specific workflows involved in implementing
> database specific operations. This makes the Trove taskmanager a stateful
> entity. Some of the operations could take a fair amount of time. This is a
> serious architectural flaw
> - Tenants should not ever be able to directly interact with the underlying
> storage and compute used by database instances; that should be the default
> configuration, not an untested deployment alternative
> - The CI should test all databases that are considered to be ‘supported’
> without excessive use of resources in the gate; better code modularization
> will help determine the tests which can safely be skipped in testing changes
> - Clusters should be first class citizens not an afterthought, single
> instance databases may be the ‘special case’, not the other way around
> - The project must provide guest images (or at least complete tooling for
> deployers to build these); while the project can’t distribute operating
> systems and database software, the current deployment model merely impedes
> adoption
> - Clusters spanning OpenStack deployments are a real thing that must be
> supported
> This might sound harsh, that isn’t the intent. Each of these is the
> consequence of one or more perfectly rational decisions. Some of those
> decisions have had unintended consequences, and others were made knowing
> that we would be incurring some technical debt; debt we have not had the
> time or resources to address. Fixing all these is not impossible, it just
> takes the dedication of resources by the community.
> I do not have a complete design for what the new Trove would look like.
> For example, I don’t know how we will interact with other projects (like
> Heat). Many questions remain to be explored and answered.
> Would it suffice to just use the existing Heat resources and build
> templates around those, or will it be better to implement custom Trove
> resources and then orchestrate things based on those resources?
> Would Trove implement the workflows required for multi-stage database
> operations by itself, or would it rely on some other project (say Mistral)
> for this? Is Mistral really a workflow service, or just cron on steroids? I
> don’t know the answer but I would like to find out.
> While we don’t have the answers to these questions, I think this is a
> conversation that we must have, one that we must decide on, and then as a
> community commit the resources required to make a Trove v2 which delivers
> on the mission of the project; “To provide scalable and reliable Cloud
> Database as a Service provisioning functionality for both relational and
> non-relational database engines, and to continue to improve its
> fully-featured and extensible open source framework.”[2]
> Thanks,
> -amrith​
> [1] https://www.openstack.org/assets/survey/April2017SurveyReport.pdf
> [2] https://wiki.openstack.org/wiki/Trove#Mission_Statement
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20170621/d1c040eb/attachment.html>

More information about the OpenStack-dev mailing list