[openstack-dev] [trove][all][tc] A proposal to rearchitect Trove
Doug Hellmann
doug at doughellmann.com
Wed Jul 12 13:57:21 UTC 2017
Excerpts from Amrith Kumar's message of 2017-07-12 06:14:28 -0500:
> All:
>
> First, let me thank all of you who responded and provided feedback
> on what I wrote. I've summarized what I heard below and am posting
> it as one consolidated response rather than responding to each
> of your messages and making this thread even deeper.
>
> As I say at the end of this email, I will be setting up a session at
> the Denver PTG to specifically continue this conversation and hope
> you will all be able to attend. As soon as time slots for PTG are
> announced, I will try and pick this slot and request that you please
> attend.
>
> ----
>
> Thierry: naming issue; call it Hoard if it does not have a migration
> path.
>
> ----
>
> Kevin: use a container approach with k8s as the orchestration
> mechanism, addresses multiple issues including performance. Trove to
> provide containers for multiple components which cooperate to provide
> a single instance of a database or cluster. Don't put all components
> (agent, monitoring, database) in a single VM, decoupling makes
> migraiton and upgrades easier and allows trove to reuse database
> vendor supplied containers. Performance of databases in VM's poor
> compared to databases on bare-metal.
>
> ----
>
> Doug Hellmann:
>
> > Does "service VM" need to be a first-class thing? Akanda creates
> > them, using a service user. The VMs are tied to a "router" which is
> > the billable resource that the user understands and interacts with
> > through the API.
>
> Amrith: Doug, yes because we're looking not just for service VM's but all
> resources provisioned by a service. So, to Matt's comment about a
> blackbox DBaaS, the VM's, storage, snapshots, ... they should all be
> owned by the service, charged to a users quota but not visible to the
> user directly.
I still don't understand. If you have entities that represent the
DBaaS "host" or "database" or "database backup" or whatever, then
you put a quota on those entities and you bill for them. If the
database actually runs in a VM or the backup is a snapshot, those
are implementation details. You don't want to have to rewrite your
quota management or billing integration if those details change.
Doug
>
> ----
>
> Jay:
>
> > Frankly, I believe all of these types of services should be built
> > as applications that run on OpenStack (or other)
> > infrastructure. In other words, they should not be part of the
> > infrastructure itself.
> >
> > There's really no need for a user of a DBaaS to have access to the
> > host or hosts the DB is running on. If the user really wanted
> > that, they would just spin up a VM/baremetal server and install
> > the thing themselves.
>
> and subsequently in follow-up with Zane:
>
> > Think only in terms of what a user of a DBaaS really wants. At the
> > end of the day, all they want is an address in the cloud where they
> > can point their application to write and read data from.
> > ...
> > At the end of the day, I think Trove is best implemented as a hosted
> > application that exposes an API to its users that is entirely
> > separate from the underlying infrastructure APIs like
> > Cinder/Nova/Neutron.
>
> Amrith: Yes, I agree, +1000
>
> ----
>
> Clint (in response to Jay's proposal regarding the service making all
> resources multi-tenant) raised a concern about having multi-tenant
> shared resources. The issue is with ensuring separation between
> tenants (don't want to use the word isolation because this is database
> related).
>
> Amrith: yes, definitely a concern and one that we don't have today
> because each DB is a VM of its own. Personally, I'd rather stick with
> that construct, one DB per VM/container/baremetal and leave that be
> the separation boundary.
>
> ----
>
> Zane: Discomfort over throwing out working code, grass is greener on
> the other side, is there anything to salvage?
>
> Amrith: Yes, there is certainly a 'grass is greener with a rewrite'
> fallacy. But, there is stuff that can be salvaged. The elements are
> still good, they are separable and can be used with the new
> project. Much of the controller logic however will fall by the
> wayside.
>
> In a similar vein, Clint asks about the elements that Trove provides,
> "how has that worked out".
>
> Amrith: Honestly, not well. Trove only provided reference elements
> suitable for development use. Never really production hardened
> ones. For example, the image elements trove provides don't bake the
> guest agent in; they assume that at VM launch, the guest agent code
> will be slurped (technical term) from the controller and
> launched. Great for debugging, not great for production. That is
> something that should change. But, equally, I've heard disagreements
> saying that slurping the guest agent at runtime is clever and good
> in production.
>
> ----
>
> Zane: consider using Mistral for workflow.
>
> > The disadvantage, obviously, is that it requires the cloud to offer
> > Mistral as-a-Service, which currently doesn't include nearly as many
> > clouds as I'd like.
>
> Amrith: Yes, as we discussed, we are in agreement with both parts of
> this recommendation.
>
> Zane, Jay and Dims: a subtle distinction between Tessmaster and Magnum
> (I want a database figure out the lower layers, vs. I want a k8s
> cluster).
>
> ----
>
> Zane: Fun fact: Trove started out as a *complete fork* of Nova(!).
>
> Amrith: Not fun at all :) Never, ever, ever, ever f5g do that
> again. Yeah, sure, if you can have i18n, and k8s, I can have f5g :)
>
> ----
>
> Thierry:
>
> > We generally need to be very careful about creating dependencies
> > between OpenStack projects.
> > ...
> > I understand it's a hard trade-off: you want to reuse functionality
> > rather than reinvent it in every project... we just need to
> > recognize the cost of doing that.
>
> Amrith: Yes, this is part of my concern re: Mistral, and earlier in
> trove's life on depending on Manila for Oracle RAC. Clint raised a
> similar concern about the dependency on Heat.
>
> In response, Kevin:
>
> > That view of dependencies is why Kubernetes development is outpacing
> > OpenStacks and some users are leaving IMO. Not trying to be mean
> > here but trying to shine some light on this issue.
>
> I disagree, but that's a topic for another email thread and maybe not
> even an email thread but an in-person conversation with suitable
> beverages. It is a religious discussion which is best handled in a
> different forum; such as the emacs-vi forum.
>
> ----
>
> I wrote:
>
> > - A guest agent running on a tenant instance, with connectivity to a
> > shared management message bus is a security loophole; encrypting
> > traffic, per-tenant-passwords, and any other scheme is merely
> > lipstick on a security hole
>
> Clint asks:
>
> This is a broad statement, and I'm not sure I understand the actual
> risk you're presenting here as "a security loophole".
>
> How else would you administer a database server than through some
> kind of agent? Whether that agent is a python daemon of our making,
> sshd, or whatever kubernetes component lets you change things,
> they're all administrative pieces that sit next to the resource.
>
> Amrith: The issue is that the guest agent (currently) running in a
> tenants context needs to establish a connection to a shared rabbitmq
> server running in the service (control plane) context. I am fine with
> a guest agent running in the control plan establishing a connection
> into a guest VM if required, not the other way around.
>
> ----
>
> Clint makes a distinction between a database cluster within an
> OpenStack deployment and an uber database cluster spanning clouds,
> recommending that the latter is best left to a tertiary
> orchestrator. Further, these are two distinct things, pick one and do
> it well.
>
> Amrith: A valid approach and one that will allow Trove to focus on the
> high value single OpenStack deployment of a db cluster (and to Jay's
> point, do it well).
>
> ----
>
> Consensus:
>
> Trove should expose (what Matt Fischer calls) BlackBox DB, not storage +
> compute.
>
> Address rabbitmq security concerns differently; move guest agent off
> instance.
>
> Don't reinvent the orchestration piece.
>
> Fewer DB's better support
>
> Clusters are first class citizens, not an afterthought
>
> Clusters spanning regions and openstack deployments
>
> Restart the service VM's discussion:
> https://review.openstack.org/#/c/438134/
>
> ----
>
> Several people emailed me privately and said they (or their companies)
> would like to invest resources in Trove. Some indicated that they (or
> their companies) would like to invest resources in Trove if the
> commitment was towards a certain direction or technology choice.
> Others have offered resources if the direction would be to provide
> an AWS compatible API.
>
> To anyone who wants to contribute resources to a project, please do
> it. Big companies considering contributing one or two people to a
> project and making it seem like a big decision is really an indication
> of a lack of seriousness. If the project is really valuable to you,
> you'd have put people on it already. The fact that you haven't speaks
> volumes.
>
> To those who want to place pre-conditions on technology choice, I have
> no (good) words for you.
>
> Thanks to all who participated, I appreciate all the input. I will be
> setting up a session at the Denver PTG to specifically continue this
> conversation and hope you will all be able to attend. As soon as time
> slots for PTG are announced, I will try and pick this slot and request
> that you please attend.
>
> Thanks,
>
> -amrith
>
>
>
>
> On Sun, Jun 18, 2017 at 6:35 AM, Amrith Kumar <amrith.kumar at gmail.com>
> wrote:
>
> > Trove has evolved rapidly over the past several years, since integration
> > in IceHouse when it only supported single instances of a few databases.
> > Today it supports a dozen databases including clusters and replication.
> >
> > The user survey [1] indicates that while there is strong interest in the
> > project, there are few large production deployments that are known of (by
> > the development team).
> >
> > Recent changes in the OpenStack community at large (company realignments,
> > acquisitions, layoffs) and the Trove community in particular, coupled with
> > a mounting burden of technical debt have prompted me to make this proposal
> > to re-architect Trove.
> >
> > This email summarizes several of the issues that face the project, both
> > structurally and architecturally. This email does not claim to include a
> > detailed specification for what the new Trove would look like, merely the
> > recommendation that the community should come together and develop one so
> > that the project can be sustainable and useful to those who wish to use it
> > in the future.
> >
> > TL;DR
> >
> > Trove, with support for a dozen or so databases today, finds itself in a
> > bind because there are few developers, and a code-base with a significant
> > amount of technical debt.
> >
> > Some architectural choices which the team made over the years have
> > consequences which make the project less than ideal for deployers.
> >
> > Given that there are no major production deployments of Trove at present,
> > this provides us an opportunity to reset the project, learn from our v1 and
> > come up with a strong v2.
> >
> > An important aspect of making this proposal work is that we seek to
> > eliminate the effort (planning, and coding) involved in migrating existing
> > Trove v1 deployments to the proposed Trove v2. Effectively, with work
> > beginning on Trove v2 as proposed here, Trove v1 as released with Pike will
> > be marked as deprecated and users will have to migrate to Trove v2 when it
> > becomes available.
> >
> > While I would very much like to continue to support the users on Trove v1
> > through this transition, the simple fact is that absent community
> > participation this will be impossible. Furthermore, given that there are no
> > production deployments of Trove at this time, it seems pointless to build
> > that upgrade path from Trove v1 to Trove v2; it would be the proverbial
> > bridge from nowhere.
> >
> > This (previous) statement is, I realize, contentious. There are those who
> > have told me that an upgrade path must be provided, and there are those who
> > have told me of unnamed deployments of Trove that would suffer. To this,
> > all I can say is that if an upgrade path is of value to you, then please
> > commit the development resources to participate in the community to make
> > that possible. But equally, preventing a v2 of Trove or delaying it will
> > only make the v1 that we have today less valuable.
> >
> > We have learned a lot from v1, and the hope is that we can address that in
> > v2. Some of the more significant things that I have learned are:
> >
> > - We should adopt a versioned front-end API from the very beginning;
> > making the REST API versioned is not a ‘v2 feature’
> >
> > - A guest agent running on a tenant instance, with connectivity to a
> > shared management message bus is a security loophole; encrypting traffic,
> > per-tenant-passwords, and any other scheme is merely lipstick on a security
> > hole
> >
> > - Reliance on Nova for compute resources is fine, but dependence on Nova
> > VM specific capabilities (like instance rebuild) is not; it makes things
> > like containers or bare-metal second class citizens
> >
> > - A fair portion of what Trove does is resource orchestration; don’t
> > reinvent the wheel, there’s Heat for that. Admittedly, Heat wasn’t as far
> > along when Trove got started but that’s not the case today and we have an
> > opportunity to fix that now
> >
> > - A similarly significant portion of what Trove does is to implement a
> > state-machine that will perform specific workflows involved in implementing
> > database specific operations. This makes the Trove taskmanager a stateful
> > entity. Some of the operations could take a fair amount of time. This is a
> > serious architectural flaw
> >
> > - Tenants should not ever be able to directly interact with the underlying
> > storage and compute used by database instances; that should be the default
> > configuration, not an untested deployment alternative
> >
> > - The CI should test all databases that are considered to be ‘supported’
> > without excessive use of resources in the gate; better code modularization
> > will help determine the tests which can safely be skipped in testing changes
> >
> > - Clusters should be first class citizens not an afterthought, single
> > instance databases may be the ‘special case’, not the other way around
> >
> > - The project must provide guest images (or at least complete tooling for
> > deployers to build these); while the project can’t distribute operating
> > systems and database software, the current deployment model merely impedes
> > adoption
> >
> > - Clusters spanning OpenStack deployments are a real thing that must be
> > supported
> >
> > This might sound harsh, that isn’t the intent. Each of these is the
> > consequence of one or more perfectly rational decisions. Some of those
> > decisions have had unintended consequences, and others were made knowing
> > that we would be incurring some technical debt; debt we have not had the
> > time or resources to address. Fixing all these is not impossible, it just
> > takes the dedication of resources by the community.
> >
> > I do not have a complete design for what the new Trove would look like.
> > For example, I don’t know how we will interact with other projects (like
> > Heat). Many questions remain to be explored and answered.
> >
> > Would it suffice to just use the existing Heat resources and build
> > templates around those, or will it be better to implement custom Trove
> > resources and then orchestrate things based on those resources?
> >
> > Would Trove implement the workflows required for multi-stage database
> > operations by itself, or would it rely on some other project (say Mistral)
> > for this? Is Mistral really a workflow service, or just cron on steroids? I
> > don’t know the answer but I would like to find out.
> >
> > While we don’t have the answers to these questions, I think this is a
> > conversation that we must have, one that we must decide on, and then as a
> > community commit the resources required to make a Trove v2 which delivers
> > on the mission of the project; “To provide scalable and reliable Cloud
> > Database as a Service provisioning functionality for both relational and
> > non-relational database engines, and to continue to improve its
> > fully-featured and extensible open source framework.”[2]
> >
> > Thanks,
> >
> > -amrith
> >
> >
> > [1] https://www.openstack.org/assets/survey/April2017SurveyReport.pdf
> > [2] https://wiki.openstack.org/wiki/Trove#Mission_Statement
> >
> >
> >
More information about the OpenStack-dev
mailing list