[openstack-dev] [trove][all][tc] A proposal to rearchitect Trove
Fei Long Wang
feilong at catalyst.net.nz
Tue Jun 20 01:52:49 UTC 2017
On 20/06/17 12:56, Curtis wrote:
> On Sun, Jun 18, 2017 at 5:35 AM, Amrith Kumar <amrith.kumar at gmail.com> wrote:
>> Trove has evolved rapidly over the past several years, since integration in
>> IceHouse when it only supported single instances of a few databases. Today
>> it supports a dozen databases including clusters and replication.
>>
>> The user survey [1] indicates that while there is strong interest in the
>> project, there are few large production deployments that are known of (by
>> the development team).
>>
>> Recent changes in the OpenStack community at large (company realignments,
>> acquisitions, layoffs) and the Trove community in particular, coupled with a
>> mounting burden of technical debt have prompted me to make this proposal to
>> re-architect Trove.
>>
>> This email summarizes several of the issues that face the project, both
>> structurally and architecturally. This email does not claim to include a
>> detailed specification for what the new Trove would look like, merely the
>> recommendation that the community should come together and develop one so
>> that the project can be sustainable and useful to those who wish to use it
>> in the future.
>>
>> TL;DR
>>
>> Trove, with support for a dozen or so databases today, finds itself in a
>> bind because there are few developers, and a code-base with a significant
>> amount of technical debt.
>>
>> Some architectural choices which the team made over the years have
>> consequences which make the project less than ideal for deployers.
>>
>> Given that there are no major production deployments of Trove at present,
>> this provides us an opportunity to reset the project, learn from our v1 and
>> come up with a strong v2.
>>
>> An important aspect of making this proposal work is that we seek to
>> eliminate the effort (planning, and coding) involved in migrating existing
>> Trove v1 deployments to the proposed Trove v2. Effectively, with work
>> beginning on Trove v2 as proposed here, Trove v1 as released with Pike will
>> be marked as deprecated and users will have to migrate to Trove v2 when it
>> becomes available.
>>
>> While I would very much like to continue to support the users on Trove v1
>> through this transition, the simple fact is that absent community
>> participation this will be impossible. Furthermore, given that there are no
>> production deployments of Trove at this time, it seems pointless to build
>> that upgrade path from Trove v1 to Trove v2; it would be the proverbial
>> bridge from nowhere.
>>
>> This (previous) statement is, I realize, contentious. There are those who
>> have told me that an upgrade path must be provided, and there are those who
>> have told me of unnamed deployments of Trove that would suffer. To this, all
>> I can say is that if an upgrade path is of value to you, then please commit
>> the development resources to participate in the community to make that
>> possible. But equally, preventing a v2 of Trove or delaying it will only
>> make the v1 that we have today less valuable.
>>
>> We have learned a lot from v1, and the hope is that we can address that in
>> v2. Some of the more significant things that I have learned are:
>>
>> - We should adopt a versioned front-end API from the very beginning; making
>> the REST API versioned is not a ‘v2 feature’
>>
>> - A guest agent running on a tenant instance, with connectivity to a shared
>> management message bus is a security loophole; encrypting traffic,
>> per-tenant-passwords, and any other scheme is merely lipstick on a security
>> hole
>>
>> - Reliance on Nova for compute resources is fine, but dependence on Nova VM
>> specific capabilities (like instance rebuild) is not; it makes things like
>> containers or bare-metal second class citizens
>>
>> - A fair portion of what Trove does is resource orchestration; don’t
>> reinvent the wheel, there’s Heat for that. Admittedly, Heat wasn’t as far
>> along when Trove got started but that’s not the case today and we have an
>> opportunity to fix that now
>>
>> - A similarly significant portion of what Trove does is to implement a
>> state-machine that will perform specific workflows involved in implementing
>> database specific operations. This makes the Trove taskmanager a stateful
>> entity. Some of the operations could take a fair amount of time. This is a
>> serious architectural flaw
>>
>> - Tenants should not ever be able to directly interact with the underlying
>> storage and compute used by database instances; that should be the default
>> configuration, not an untested deployment alternative
>>
> As an operator I wouldn't run Trove as it is, unless I absolutely had to.
>
> I think it is a good idea to reboot the project. I really think the
> concept of "service VMs" should be a thing. I'm not sure where the
> OpenStack community has landed on that, my fault for not paying close
> attention, but we should be able to create VMs for a tenant that are
> not managed by the tenant but that could be billed to them in some
> fashion. At least that's my opinion.
Re the 'service VMs', yep, it could be very useful. And in Zaqar, we're
working on a spec to support 'service queue', similar like the 'service
VMs', so that the service user can create queues in user's tenant. And I
can imagine Trove could benefit from that feature as well.
>
>> - The CI should test all databases that are considered to be ‘supported’
>> without excessive use of resources in the gate; better code modularization
>> will help determine the tests which can safely be skipped in testing changes
>>
>> - Clusters should be first class citizens not an afterthought, single
>> instance databases may be the ‘special case’, not the other way around
> Definitely agree on that. Cluster first model.
>
>> - The project must provide guest images (or at least complete tooling for
>> deployers to build these); while the project can’t distribute operating
>> systems and database software, the current deployment model merely impedes
>> adoption
>>
>> - Clusters spanning OpenStack deployments are a real thing that must be
>> supported
>>
> I'm curious as to how this will be done. This is a requirement in
> NFV-land as well for other services. Would be very powerful and is
> needed in other areas.
>
> Thanks,
> Curtis.
>
>> This might sound harsh, that isn’t the intent. Each of these is the
>> consequence of one or more perfectly rational decisions. Some of those
>> decisions have had unintended consequences, and others were made knowing
>> that we would be incurring some technical debt; debt we have not had the
>> time or resources to address. Fixing all these is not impossible, it just
>> takes the dedication of resources by the community.
>>
>> I do not have a complete design for what the new Trove would look like. For
>> example, I don’t know how we will interact with other projects (like Heat).
>> Many questions remain to be explored and answered.
>>
>> Would it suffice to just use the existing Heat resources and build templates
>> around those, or will it be better to implement custom Trove resources and
>> then orchestrate things based on those resources?
>>
>> Would Trove implement the workflows required for multi-stage database
>> operations by itself, or would it rely on some other project (say Mistral)
>> for this? Is Mistral really a workflow service, or just cron on steroids? I
>> don’t know the answer but I would like to find out.
>>
>> While we don’t have the answers to these questions, I think this is a
>> conversation that we must have, one that we must decide on, and then as a
>> community commit the resources required to make a Trove v2 which delivers on
>> the mission of the project; “To provide scalable and reliable Cloud Database
>> as a Service provisioning functionality for both relational and
>> non-relational database engines, and to continue to improve its
>> fully-featured and extensible open source framework.”[2]
>>
>> Thanks,
>>
>> -amrith
>>
>>
>> [1] https://www.openstack.org/assets/survey/April2017SurveyReport.pdf
>> [2] https://wiki.openstack.org/wiki/Trove#Mission_Statement
>>
>>
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
--
Cheers & Best regards,
Feilong Wang (王飞龙)
--------------------------------------------------------------------------
Senior Cloud Software Engineer
Tel: +64-48032246
Email: flwang at catalyst.net.nz
Catalyst IT Limited
Level 6, Catalyst House, 150 Willis Street, Wellington
--------------------------------------------------------------------------
More information about the OpenStack-dev
mailing list