[openstack-dev] [trove][all][tc] A proposal to rearchitect Trove
Fox, Kevin M
Kevin.Fox at pnnl.gov
Wed Jul 12 22:46:53 UTC 2017
There is a use case where some sites have folks buy whole bricks of compute nodes that get added to the overarching cloud, but using AZ's or HostAggregates/Flavors to dedicate the hardware to the users.
You might want to land the db vm on the hardware for that project and one would expect the normal quota would be dinged for it rather then a special trove quota. Otherwise they may have more quota then the hosts can actually handle.
Thanks,
Kevin
________________________________________
From: Doug Hellmann [doug at doughellmann.com]
Sent: Wednesday, July 12, 2017 6:57 AM
To: openstack-dev
Subject: Re: [openstack-dev] [trove][all][tc] A proposal to rearchitect Trove
Excerpts from Amrith Kumar's message of 2017-07-12 06:14:28 -0500:
> All:
>
> First, let me thank all of you who responded and provided feedback
> on what I wrote. I've summarized what I heard below and am posting
> it as one consolidated response rather than responding to each
> of your messages and making this thread even deeper.
>
> As I say at the end of this email, I will be setting up a session at
> the Denver PTG to specifically continue this conversation and hope
> you will all be able to attend. As soon as time slots for PTG are
> announced, I will try and pick this slot and request that you please
> attend.
>
> ----
>
> Thierry: naming issue; call it Hoard if it does not have a migration
> path.
>
> ----
>
> Kevin: use a container approach with k8s as the orchestration
> mechanism, addresses multiple issues including performance. Trove to
> provide containers for multiple components which cooperate to provide
> a single instance of a database or cluster. Don't put all components
> (agent, monitoring, database) in a single VM, decoupling makes
> migraiton and upgrades easier and allows trove to reuse database
> vendor supplied containers. Performance of databases in VM's poor
> compared to databases on bare-metal.
>
> ----
>
> Doug Hellmann:
>
> > Does "service VM" need to be a first-class thing? Akanda creates
> > them, using a service user. The VMs are tied to a "router" which is
> > the billable resource that the user understands and interacts with
> > through the API.
>
> Amrith: Doug, yes because we're looking not just for service VM's but all
> resources provisioned by a service. So, to Matt's comment about a
> blackbox DBaaS, the VM's, storage, snapshots, ... they should all be
> owned by the service, charged to a users quota but not visible to the
> user directly.
I still don't understand. If you have entities that represent the
DBaaS "host" or "database" or "database backup" or whatever, then
you put a quota on those entities and you bill for them. If the
database actually runs in a VM or the backup is a snapshot, those
are implementation details. You don't want to have to rewrite your
quota management or billing integration if those details change.
Doug
>
> ----
>
> Jay:
>
> > Frankly, I believe all of these types of services should be built
> > as applications that run on OpenStack (or other)
> > infrastructure. In other words, they should not be part of the
> > infrastructure itself.
> >
> > There's really no need for a user of a DBaaS to have access to the
> > host or hosts the DB is running on. If the user really wanted
> > that, they would just spin up a VM/baremetal server and install
> > the thing themselves.
>
> and subsequently in follow-up with Zane:
>
> > Think only in terms of what a user of a DBaaS really wants. At the
> > end of the day, all they want is an address in the cloud where they
> > can point their application to write and read data from.
> > ...
> > At the end of the day, I think Trove is best implemented as a hosted
> > application that exposes an API to its users that is entirely
> > separate from the underlying infrastructure APIs like
> > Cinder/Nova/Neutron.
>
> Amrith: Yes, I agree, +1000
>
> ----
>
> Clint (in response to Jay's proposal regarding the service making all
> resources multi-tenant) raised a concern about having multi-tenant
> shared resources. The issue is with ensuring separation between
> tenants (don't want to use the word isolation because this is database
> related).
>
> Amrith: yes, definitely a concern and one that we don't have today
> because each DB is a VM of its own. Personally, I'd rather stick with
> that construct, one DB per VM/container/baremetal and leave that be
> the separation boundary.
>
> ----
>
> Zane: Discomfort over throwing out working code, grass is greener on
> the other side, is there anything to salvage?
>
> Amrith: Yes, there is certainly a 'grass is greener with a rewrite'
> fallacy. But, there is stuff that can be salvaged. The elements are
> still good, they are separable and can be used with the new
> project. Much of the controller logic however will fall by the
> wayside.
>
> In a similar vein, Clint asks about the elements that Trove provides,
> "how has that worked out".
>
> Amrith: Honestly, not well. Trove only provided reference elements
> suitable for development use. Never really production hardened
> ones. For example, the image elements trove provides don't bake the
> guest agent in; they assume that at VM launch, the guest agent code
> will be slurped (technical term) from the controller and
> launched. Great for debugging, not great for production. That is
> something that should change. But, equally, I've heard disagreements
> saying that slurping the guest agent at runtime is clever and good
> in production.
>
> ----
>
> Zane: consider using Mistral for workflow.
>
> > The disadvantage, obviously, is that it requires the cloud to offer
> > Mistral as-a-Service, which currently doesn't include nearly as many
> > clouds as I'd like.
>
> Amrith: Yes, as we discussed, we are in agreement with both parts of
> this recommendation.
>
> Zane, Jay and Dims: a subtle distinction between Tessmaster and Magnum
> (I want a database figure out the lower layers, vs. I want a k8s
> cluster).
>
> ----
>
> Zane: Fun fact: Trove started out as a *complete fork* of Nova(!).
>
> Amrith: Not fun at all :) Never, ever, ever, ever f5g do that
> again. Yeah, sure, if you can have i18n, and k8s, I can have f5g :)
>
> ----
>
> Thierry:
>
> > We generally need to be very careful about creating dependencies
> > between OpenStack projects.
> > ...
> > I understand it's a hard trade-off: you want to reuse functionality
> > rather than reinvent it in every project... we just need to
> > recognize the cost of doing that.
>
> Amrith: Yes, this is part of my concern re: Mistral, and earlier in
> trove's life on depending on Manila for Oracle RAC. Clint raised a
> similar concern about the dependency on Heat.
>
> In response, Kevin:
>
> > That view of dependencies is why Kubernetes development is outpacing
> > OpenStacks and some users are leaving IMO. Not trying to be mean
> > here but trying to shine some light on this issue.
>
> I disagree, but that's a topic for another email thread and maybe not
> even an email thread but an in-person conversation with suitable
> beverages. It is a religious discussion which is best handled in a
> different forum; such as the emacs-vi forum.
>
> ----
>
> I wrote:
>
> > - A guest agent running on a tenant instance, with connectivity to a
> > shared management message bus is a security loophole; encrypting
> > traffic, per-tenant-passwords, and any other scheme is merely
> > lipstick on a security hole
>
> Clint asks:
>
> This is a broad statement, and I'm not sure I understand the actual
> risk you're presenting here as "a security loophole".
>
> How else would you administer a database server than through some
> kind of agent? Whether that agent is a python daemon of our making,
> sshd, or whatever kubernetes component lets you change things,
> they're all administrative pieces that sit next to the resource.
>
> Amrith: The issue is that the guest agent (currently) running in a
> tenants context needs to establish a connection to a shared rabbitmq
> server running in the service (control plane) context. I am fine with
> a guest agent running in the control plan establishing a connection
> into a guest VM if required, not the other way around.
>
> ----
>
> Clint makes a distinction between a database cluster within an
> OpenStack deployment and an uber database cluster spanning clouds,
> recommending that the latter is best left to a tertiary
> orchestrator. Further, these are two distinct things, pick one and do
> it well.
>
> Amrith: A valid approach and one that will allow Trove to focus on the
> high value single OpenStack deployment of a db cluster (and to Jay's
> point, do it well).
>
> ----
>
> Consensus:
>
> Trove should expose (what Matt Fischer calls) BlackBox DB, not storage +
> compute.
>
> Address rabbitmq security concerns differently; move guest agent off
> instance.
>
> Don't reinvent the orchestration piece.
>
> Fewer DB's better support
>
> Clusters are first class citizens, not an afterthought
>
> Clusters spanning regions and openstack deployments
>
> Restart the service VM's discussion:
> https://review.openstack.org/#/c/438134/
>
> ----
>
> Several people emailed me privately and said they (or their companies)
> would like to invest resources in Trove. Some indicated that they (or
> their companies) would like to invest resources in Trove if the
> commitment was towards a certain direction or technology choice.
> Others have offered resources if the direction would be to provide
> an AWS compatible API.
>
> To anyone who wants to contribute resources to a project, please do
> it. Big companies considering contributing one or two people to a
> project and making it seem like a big decision is really an indication
> of a lack of seriousness. If the project is really valuable to you,
> you'd have put people on it already. The fact that you haven't speaks
> volumes.
>
> To those who want to place pre-conditions on technology choice, I have
> no (good) words for you.
>
> Thanks to all who participated, I appreciate all the input. I will be
> setting up a session at the Denver PTG to specifically continue this
> conversation and hope you will all be able to attend. As soon as time
> slots for PTG are announced, I will try and pick this slot and request
> that you please attend.
>
> Thanks,
>
> -amrith
>
>
>
>
> On Sun, Jun 18, 2017 at 6:35 AM, Amrith Kumar <amrith.kumar at gmail.com>
> wrote:
>
> > Trove has evolved rapidly over the past several years, since integration
> > in IceHouse when it only supported single instances of a few databases.
> > Today it supports a dozen databases including clusters and replication.
> >
> > The user survey [1] indicates that while there is strong interest in the
> > project, there are few large production deployments that are known of (by
> > the development team).
> >
> > Recent changes in the OpenStack community at large (company realignments,
> > acquisitions, layoffs) and the Trove community in particular, coupled with
> > a mounting burden of technical debt have prompted me to make this proposal
> > to re-architect Trove.
> >
> > This email summarizes several of the issues that face the project, both
> > structurally and architecturally. This email does not claim to include a
> > detailed specification for what the new Trove would look like, merely the
> > recommendation that the community should come together and develop one so
> > that the project can be sustainable and useful to those who wish to use it
> > in the future.
> >
> > TL;DR
> >
> > Trove, with support for a dozen or so databases today, finds itself in a
> > bind because there are few developers, and a code-base with a significant
> > amount of technical debt.
> >
> > Some architectural choices which the team made over the years have
> > consequences which make the project less than ideal for deployers.
> >
> > Given that there are no major production deployments of Trove at present,
> > this provides us an opportunity to reset the project, learn from our v1 and
> > come up with a strong v2.
> >
> > An important aspect of making this proposal work is that we seek to
> > eliminate the effort (planning, and coding) involved in migrating existing
> > Trove v1 deployments to the proposed Trove v2. Effectively, with work
> > beginning on Trove v2 as proposed here, Trove v1 as released with Pike will
> > be marked as deprecated and users will have to migrate to Trove v2 when it
> > becomes available.
> >
> > While I would very much like to continue to support the users on Trove v1
> > through this transition, the simple fact is that absent community
> > participation this will be impossible. Furthermore, given that there are no
> > production deployments of Trove at this time, it seems pointless to build
> > that upgrade path from Trove v1 to Trove v2; it would be the proverbial
> > bridge from nowhere.
> >
> > This (previous) statement is, I realize, contentious. There are those who
> > have told me that an upgrade path must be provided, and there are those who
> > have told me of unnamed deployments of Trove that would suffer. To this,
> > all I can say is that if an upgrade path is of value to you, then please
> > commit the development resources to participate in the community to make
> > that possible. But equally, preventing a v2 of Trove or delaying it will
> > only make the v1 that we have today less valuable.
> >
> > We have learned a lot from v1, and the hope is that we can address that in
> > v2. Some of the more significant things that I have learned are:
> >
> > - We should adopt a versioned front-end API from the very beginning;
> > making the REST API versioned is not a ‘v2 feature’
> >
> > - A guest agent running on a tenant instance, with connectivity to a
> > shared management message bus is a security loophole; encrypting traffic,
> > per-tenant-passwords, and any other scheme is merely lipstick on a security
> > hole
> >
> > - Reliance on Nova for compute resources is fine, but dependence on Nova
> > VM specific capabilities (like instance rebuild) is not; it makes things
> > like containers or bare-metal second class citizens
> >
> > - A fair portion of what Trove does is resource orchestration; don’t
> > reinvent the wheel, there’s Heat for that. Admittedly, Heat wasn’t as far
> > along when Trove got started but that’s not the case today and we have an
> > opportunity to fix that now
> >
> > - A similarly significant portion of what Trove does is to implement a
> > state-machine that will perform specific workflows involved in implementing
> > database specific operations. This makes the Trove taskmanager a stateful
> > entity. Some of the operations could take a fair amount of time. This is a
> > serious architectural flaw
> >
> > - Tenants should not ever be able to directly interact with the underlying
> > storage and compute used by database instances; that should be the default
> > configuration, not an untested deployment alternative
> >
> > - The CI should test all databases that are considered to be ‘supported’
> > without excessive use of resources in the gate; better code modularization
> > will help determine the tests which can safely be skipped in testing changes
> >
> > - Clusters should be first class citizens not an afterthought, single
> > instance databases may be the ‘special case’, not the other way around
> >
> > - The project must provide guest images (or at least complete tooling for
> > deployers to build these); while the project can’t distribute operating
> > systems and database software, the current deployment model merely impedes
> > adoption
> >
> > - Clusters spanning OpenStack deployments are a real thing that must be
> > supported
> >
> > This might sound harsh, that isn’t the intent. Each of these is the
> > consequence of one or more perfectly rational decisions. Some of those
> > decisions have had unintended consequences, and others were made knowing
> > that we would be incurring some technical debt; debt we have not had the
> > time or resources to address. Fixing all these is not impossible, it just
> > takes the dedication of resources by the community.
> >
> > I do not have a complete design for what the new Trove would look like.
> > For example, I don’t know how we will interact with other projects (like
> > Heat). Many questions remain to be explored and answered.
> >
> > Would it suffice to just use the existing Heat resources and build
> > templates around those, or will it be better to implement custom Trove
> > resources and then orchestrate things based on those resources?
> >
> > Would Trove implement the workflows required for multi-stage database
> > operations by itself, or would it rely on some other project (say Mistral)
> > for this? Is Mistral really a workflow service, or just cron on steroids? I
> > don’t know the answer but I would like to find out.
> >
> > While we don’t have the answers to these questions, I think this is a
> > conversation that we must have, one that we must decide on, and then as a
> > community commit the resources required to make a Trove v2 which delivers
> > on the mission of the project; “To provide scalable and reliable Cloud
> > Database as a Service provisioning functionality for both relational and
> > non-relational database engines, and to continue to improve its
> > fully-featured and extensible open source framework.”[2]
> >
> > Thanks,
> >
> > -amrith
> >
> >
> > [1] https://www.openstack.org/assets/survey/April2017SurveyReport.pdf
> > [2] https://wiki.openstack.org/wiki/Trove#Mission_Statement
> >
> >
> >
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-dev
mailing list