[openstack-dev] [trove][all][tc] A proposal to rearchitect Trove

Amrith Kumar amrith.kumar at gmail.com
Fri Jul 14 04:22:52 UTC 2017


Kevin,

In interests of 'keeping it simple', I'm going to try and prioritize the
use-cases and pick implementation strategies which target the higher
priority ones without needlessly excluding other (lower priority) ones.

Thanks,

-amrith

--
Amrith Kumar
​
P.S. Verizon is hiring ​OpenStack engineers nationwide. If you are
interested, please contact me or visit https://t.co/gGoUzYvqbE


On Wed, Jul 12, 2017 at 5:46 PM, Fox, Kevin M <Kevin.Fox at pnnl.gov> wrote:

> There is a use case where some sites have folks buy whole bricks of
> compute nodes that get added to the overarching cloud, but using AZ's or
> HostAggregates/Flavors to dedicate the hardware to the users.
>
> You might want to land the db vm on the hardware for that project and one
> would expect the normal quota would be dinged for it rather then a special
> trove quota. Otherwise they may have more quota then the hosts can actually
> handle.
>
> Thanks,
> Kevin
> ________________________________________
> From: Doug Hellmann [doug at doughellmann.com]
> Sent: Wednesday, July 12, 2017 6:57 AM
> To: openstack-dev
> Subject: Re: [openstack-dev] [trove][all][tc] A proposal to rearchitect
> Trove
>
> Excerpts from Amrith Kumar's message of 2017-07-12 06:14:28 -0500:
> > All:
> >
> > First, let me thank all of you who responded and provided feedback
> > on what I wrote. I've summarized what I heard below and am posting
> > it as one consolidated response rather than responding to each
> > of your messages and making this thread even deeper.
> >
> > As I say at the end of this email, I will be setting up a session at
> > the Denver PTG to specifically continue this conversation and hope
> > you will all be able to attend. As soon as time slots for PTG are
> > announced, I will try and pick this slot and request that you please
> > attend.
> >
> > ----
> >
> > Thierry: naming issue; call it Hoard if it does not have a migration
> > path.
> >
> > ----
> >
> > Kevin: use a container approach with k8s as the orchestration
> > mechanism, addresses multiple issues including performance. Trove to
> > provide containers for multiple components which cooperate to provide
> > a single instance of a database or cluster. Don't put all components
> > (agent, monitoring, database) in a single VM, decoupling makes
> > migraiton and upgrades easier and allows trove to reuse database
> > vendor supplied containers. Performance of databases in VM's poor
> > compared to databases on bare-metal.
> >
> > ----
> >
> > Doug Hellmann:
> >
> > > Does "service VM" need to be a first-class thing?  Akanda creates
> > > them, using a service user. The VMs are tied to a "router" which is
> > > the billable resource that the user understands and interacts with
> > > through the API.
> >
> > Amrith: Doug, yes because we're looking not just for service VM's but all
> > resources provisioned by a service. So, to Matt's comment about a
> > blackbox DBaaS, the VM's, storage, snapshots, ... they should all be
> > owned by the service, charged to a users quota but not visible to the
> > user directly.
>
> I still don't understand. If you have entities that represent the
> DBaaS "host" or "database" or "database backup" or whatever, then
> you put a quota on those entities and you bill for them. If the
> database actually runs in a VM or the backup is a snapshot, those
> are implementation details. You don't want to have to rewrite your
> quota management or billing integration if those details change.
>
> Doug
>
> >
> > ----
> >
> > Jay:
> >
> > > Frankly, I believe all of these types of services should be built
> > > as applications that run on OpenStack (or other)
> > > infrastructure. In other words, they should not be part of the
> > > infrastructure itself.
> > >
> > > There's really no need for a user of a DBaaS to have access to the
> > > host or hosts the DB is running on. If the user really wanted
> > > that, they would just spin up a VM/baremetal server and install
> > > the thing themselves.
> >
> > and subsequently in follow-up with Zane:
> >
> > > Think only in terms of what a user of a DBaaS really wants. At the
> > > end of the day, all they want is an address in the cloud where they
> > > can point their application to write and read data from.
> > > ...
> > > At the end of the day, I think Trove is best implemented as a hosted
> > > application that exposes an API to its users that is entirely
> > > separate from the underlying infrastructure APIs like
> > > Cinder/Nova/Neutron.
> >
> > Amrith: Yes, I agree, +1000
> >
> > ----
> >
> > Clint (in response to Jay's proposal regarding the service making all
> > resources multi-tenant) raised a concern about having multi-tenant
> > shared resources. The issue is with ensuring separation between
> > tenants (don't want to use the word isolation because this is database
> > related).
> >
> > Amrith: yes, definitely a concern and one that we don't have today
> > because each DB is a VM of its own. Personally, I'd rather stick with
> > that construct, one DB per VM/container/baremetal and leave that be
> > the separation boundary.
> >
> > ----
> >
> > Zane: Discomfort over throwing out working code, grass is greener on
> > the other side, is there anything to salvage?
> >
> > Amrith: Yes, there is certainly a 'grass is greener with a rewrite'
> > fallacy. But, there is stuff that can be salvaged. The elements are
> > still good, they are separable and can be used with the new
> > project. Much of the controller logic however will fall by the
> > wayside.
> >
> > In a similar vein, Clint asks about the elements that Trove provides,
> > "how has that worked out".
> >
> > Amrith: Honestly, not well. Trove only provided reference elements
> > suitable for development use. Never really production hardened
> > ones. For example, the image elements trove provides don't bake the
> > guest agent in; they assume that at VM launch, the guest agent code
> > will be slurped (technical term) from the controller and
> > launched. Great for debugging, not great for production. That is
> > something that should change. But, equally, I've heard disagreements
> > saying that slurping the guest agent at runtime is clever and good
> > in production.
> >
> > ----
> >
> > Zane: consider using Mistral for workflow.
> >
> > > The disadvantage, obviously, is that it requires the cloud to offer
> > > Mistral as-a-Service, which currently doesn't include nearly as many
> > > clouds as I'd like.
> >
> > Amrith: Yes, as we discussed, we are in agreement with both parts of
> > this recommendation.
> >
> > Zane, Jay and Dims: a subtle distinction between Tessmaster and Magnum
> > (I want a database figure out the lower layers, vs. I want a k8s
> > cluster).
> >
> > ----
> >
> > Zane: Fun fact: Trove started out as a *complete fork* of Nova(!).
> >
> > Amrith: Not fun at all :) Never, ever, ever, ever f5g do that
> > again. Yeah, sure, if you can have i18n, and k8s, I can have f5g :)
> >
> > ----
> >
> > Thierry:
> >
> > > We generally need to be very careful about creating dependencies
> > > between OpenStack projects.
> > > ...
> > > I understand it's a hard trade-off: you want to reuse functionality
> > > rather than reinvent it in every project... we just need to
> > > recognize the cost of doing that.
> >
> > Amrith: Yes, this is part of my concern re: Mistral, and earlier in
> > trove's life on depending on Manila for Oracle RAC. Clint raised a
> > similar concern about the dependency on Heat.
> >
> > In response, Kevin:
> >
> > > That view of dependencies is why Kubernetes development is outpacing
> > > OpenStacks and some users are leaving IMO. Not trying to be mean
> > > here but trying to shine some light on this issue.
> >
> > I disagree, but that's a topic for another email thread and maybe not
> > even an email thread but an in-person conversation with suitable
> > beverages. It is a religious discussion which is best handled in a
> > different forum; such as the emacs-vi forum.
> >
> > ----
> >
> > I wrote:
> >
> > > - A guest agent running on a tenant instance, with connectivity to a
> > > shared management message bus is a security loophole; encrypting
> > > traffic, per-tenant-passwords, and any other scheme is merely
> > > lipstick on a security hole
> >
> > Clint asks:
> >
> >  This is a broad statement, and I'm not sure I understand the actual
> >  risk you're presenting here as "a security loophole".
> >
> >  How else would you administer a database server than through some
> >  kind of agent? Whether that agent is a python daemon of our making,
> >  sshd, or whatever kubernetes component lets you change things,
> >  they're all administrative pieces that sit next to the resource.
> >
> > Amrith: The issue is that the guest agent (currently) running in a
> > tenants context needs to establish a connection to a shared rabbitmq
> > server running in the service (control plane) context. I am fine with
> > a guest agent running in the control plan establishing a connection
> > into a guest VM if required, not the other way around.
> >
> > ----
> >
> > Clint makes a distinction between a database cluster within an
> > OpenStack deployment and an uber database cluster spanning clouds,
> > recommending that the latter is best left to a tertiary
> > orchestrator. Further, these are two distinct things, pick one and do
> > it well.
> >
> > Amrith: A valid approach and one that will allow Trove to focus on the
> > high value single OpenStack deployment of a db cluster (and to Jay's
> > point, do it well).
> >
> > ----
> >
> > Consensus:
> >
> > Trove should expose (what Matt Fischer calls) BlackBox DB, not storage +
> > compute.
> >
> > Address rabbitmq security concerns differently; move guest agent off
> > instance.
> >
> > Don't reinvent the orchestration piece.
> >
> > Fewer DB's better support
> >
> > Clusters are first class citizens, not an afterthought
> >
> > Clusters spanning regions and openstack deployments
> >
> > Restart the service VM's discussion:
> > https://review.openstack.org/#/c/438134/
> >
> > ----
> >
> > Several people emailed me privately and said they (or their companies)
> > would like to invest resources in Trove. Some indicated that they (or
> > their companies) would like to invest resources in Trove if the
> > commitment was towards a certain direction or technology choice.
> > Others have offered resources if the direction would be to provide
> > an AWS compatible API.
> >
> > To anyone who wants to contribute resources to a project, please do
> > it. Big companies considering contributing one or two people to a
> > project and making it seem like a big decision is really an indication
> > of a lack of seriousness. If the project is really valuable to you,
> > you'd have put people on it already. The fact that you haven't speaks
> > volumes.
> >
> > To those who want to place pre-conditions on technology choice, I have
> > no (good) words for you.
> >
> > Thanks to all who participated, I appreciate all the input. I will be
> > setting up a session at the Denver PTG to specifically continue this
> > conversation and hope you will all be able to attend. As soon as time
> > slots for PTG are announced, I will try and pick this slot and request
> > that you please attend.
> >
> > Thanks,
> >
> > -amrith
> >
> >
> >
> >
> > On Sun, Jun 18, 2017 at 6:35 AM, Amrith Kumar <amrith.kumar at gmail.com>
> > wrote:
> >
> > > Trove has evolved rapidly over the past several years, since
> integration
> > > in IceHouse when it only supported single instances of a few databases.
> > > Today it supports a dozen databases including clusters and replication.
> > >
> > > The user survey [1] indicates that while there is strong interest in
> the
> > > project, there are few large production deployments that are known of
> (by
> > > the development team).
> > >
> > > Recent changes in the OpenStack community at large (company
> realignments,
> > > acquisitions, layoffs) and the Trove community in particular, coupled
> with
> > > a mounting burden of technical debt have prompted me to make this
> proposal
> > > to re-architect Trove.
> > >
> > > This email summarizes several of the issues that face the project, both
> > > structurally and architecturally. This email does not claim to include
> a
> > > detailed specification for what the new Trove would look like, merely
> the
> > > recommendation that the community should come together and develop one
> so
> > > that the project can be sustainable and useful to those who wish to
> use it
> > > in the future.
> > >
> > > TL;DR
> > >
> > > Trove, with support for a dozen or so databases today, finds itself in
> a
> > > bind because there are few developers, and a code-base with a
> significant
> > > amount of technical debt.
> > >
> > > Some architectural choices which the team made over the years have
> > > consequences which make the project less than ideal for deployers.
> > >
> > > Given that there are no major production deployments of Trove at
> present,
> > > this provides us an opportunity to reset the project, learn from our
> v1 and
> > > come up with a strong v2.
> > >
> > > An important aspect of making this proposal work is that we seek to
> > > eliminate the effort (planning, and coding) involved in migrating
> existing
> > > Trove v1 deployments to the proposed Trove v2. Effectively, with work
> > > beginning on Trove v2 as proposed here, Trove v1 as released with Pike
> will
> > > be marked as deprecated and users will have to migrate to Trove v2
> when it
> > > becomes available.
> > >
> > > While I would very much like to continue to support the users on Trove
> v1
> > > through this transition, the simple fact is that absent community
> > > participation this will be impossible. Furthermore, given that there
> are no
> > > production deployments of Trove at this time, it seems pointless to
> build
> > > that upgrade path from Trove v1 to Trove v2; it would be the proverbial
> > > bridge from nowhere.
> > >
> > > This (previous) statement is, I realize, contentious. There are those
> who
> > > have told me that an upgrade path must be provided, and there are
> those who
> > > have told me of unnamed deployments of Trove that would suffer. To
> this,
> > > all I can say is that if an upgrade path is of value to you, then
> please
> > > commit the development resources to participate in the community to
> make
> > > that possible. But equally, preventing a v2 of Trove or delaying it
> will
> > > only make the v1 that we have today less valuable.
> > >
> > > We have learned a lot from v1, and the hope is that we can address
> that in
> > > v2. Some of the more significant things that I have learned are:
> > >
> > > - We should adopt a versioned front-end API from the very beginning;
> > > making the REST API versioned is not a ‘v2 feature’
> > >
> > > - A guest agent running on a tenant instance, with connectivity to a
> > > shared management message bus is a security loophole; encrypting
> traffic,
> > > per-tenant-passwords, and any other scheme is merely lipstick on a
> security
> > > hole
> > >
> > > - Reliance on Nova for compute resources is fine, but dependence on
> Nova
> > > VM specific capabilities (like instance rebuild) is not; it makes
> things
> > > like containers or bare-metal second class citizens
> > >
> > > - A fair portion of what Trove does is resource orchestration; don’t
> > > reinvent the wheel, there’s Heat for that. Admittedly, Heat wasn’t as
> far
> > > along when Trove got started but that’s not the case today and we have
> an
> > > opportunity to fix that now
> > >
> > > - A similarly significant portion of what Trove does is to implement a
> > > state-machine that will perform specific workflows involved in
> implementing
> > > database specific operations. This makes the Trove taskmanager a
> stateful
> > > entity. Some of the operations could take a fair amount of time. This
> is a
> > > serious architectural flaw
> > >
> > > - Tenants should not ever be able to directly interact with the
> underlying
> > > storage and compute used by database instances; that should be the
> default
> > > configuration, not an untested deployment alternative
> > >
> > > - The CI should test all databases that are considered to be
> ‘supported’
> > > without excessive use of resources in the gate; better code
> modularization
> > > will help determine the tests which can safely be skipped in testing
> changes
> > >
> > > - Clusters should be first class citizens not an afterthought, single
> > > instance databases may be the ‘special case’, not the other way around
> > >
> > > - The project must provide guest images (or at least complete tooling
> for
> > > deployers to build these); while the project can’t distribute operating
> > > systems and database software, the current deployment model merely
> impedes
> > > adoption
> > >
> > > - Clusters spanning OpenStack deployments are a real thing that must be
> > > supported
> > >
> > > This might sound harsh, that isn’t the intent. Each of these is the
> > > consequence of one or more perfectly rational decisions. Some of those
> > > decisions have had unintended consequences, and others were made
> knowing
> > > that we would be incurring some technical debt; debt we have not had
> the
> > > time or resources to address. Fixing all these is not impossible, it
> just
> > > takes the dedication of resources by the community.
> > >
> > > I do not have a complete design for what the new Trove would look like.
> > > For example, I don’t know how we will interact with other projects
> (like
> > > Heat). Many questions remain to be explored and answered.
> > >
> > > Would it suffice to just use the existing Heat resources and build
> > > templates around those, or will it be better to implement custom Trove
> > > resources and then orchestrate things based on those resources?
> > >
> > > Would Trove implement the workflows required for multi-stage database
> > > operations by itself, or would it rely on some other project (say
> Mistral)
> > > for this? Is Mistral really a workflow service, or just cron on
> steroids? I
> > > don’t know the answer but I would like to find out.
> > >
> > > While we don’t have the answers to these questions, I think this is a
> > > conversation that we must have, one that we must decide on, and then
> as a
> > > community commit the resources required to make a Trove v2 which
> delivers
> > > on the mission of the project; “To provide scalable and reliable Cloud
> > > Database as a Service provisioning functionality for both relational
> and
> > > non-relational database engines, and to continue to improve its
> > > fully-featured and extensible open source framework.”[2]
> > >
> > > Thanks,
> > >
> > > -amrith​
> > >
> > >
> > > [1] https://www.openstack.org/assets/survey/April2017SurveyReport.pdf
> > > [2] https://wiki.openstack.org/wiki/Trove#Mission_Statement
> > >
> > >
> > >
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20170713/ee74b3f7/attachment-0001.html>


More information about the OpenStack-dev mailing list