[openstack-dev] [trove][all][tc] A proposal to rearchitect Trove
zbitter at redhat.com
Tue Jun 20 20:57:22 UTC 2017
On 20/06/17 11:45, Jay Pipes wrote:
> Good discussion, Zane. Comments inline.
> On 06/20/2017 11:01 AM, Zane Bitter wrote:
>> On 20/06/17 10:08, Jay Pipes wrote:
>>> On 06/20/2017 09:42 AM, Doug Hellmann wrote:
>>>> Does "service VM" need to be a first-class thing? Akanda creates
>>>> them, using a service user. The VMs are tied to a "router" which
>>>> is the billable resource that the user understands and interacts with
>>>> through the API.
>>> Frankly, I believe all of these types of services should be built as
>>> applications that run on OpenStack (or other) infrastructure. In
>>> other words, they should not be part of the infrastructure itself.
>>> There's really no need for a user of a DBaaS to have access to the
>>> host or hosts the DB is running on. If the user really wanted that,
>>> they would just spin up a VM/baremetal server and install the thing
>> Hey Jay,
>> I'd be interested in exploring this idea with you, because I think
>> everyone agrees that this would be a good goal, but at least in my
>> mind it's not obvious what the technical solution should be.
>> (Actually, I've read your email a bunch of times now, and I go back
>> and forth on which one you're actually advocating for.) The two
>> options, as I see it, are as follows:
>> 1) The database VMs are created in the user's tena^W project. They
>> connect directly to the tenant's networks, are governed by the user's
>> quota, and are billed to the project as Nova VMs (on top of whatever
>> additional billing might come along with the management services). A
>> [future] feature in Nova (https://review.openstack.org/#/c/438134/)
>> allows the Trove service to lock down access so that the user cannot
>> actually interact with the server using Nova, but must go through the
>> Trove API. On a cloud that doesn't include Trove, a user could run
>> Trove as an application themselves and all it would have to do
>> differently is not pass the service token to lock down the VM.
>> 2) The database VMs are created in a project belonging to the operator
>> of the service. They're connected to the user's network through
>> <magic>, and isolated from other users' databases running in the same
>> project through <security groups? hierarchical projects? magic?>.
>> Trove has its own quota management and billing. The user cannot
>> interact with the server using Nova since it is owned by a different
>> project. On a cloud that doesn't include Trove, a user could run Trove
>> as an application themselves, by giving it credentials for their own
>> project and disabling all of the cross-tenant networking stuff.
> None of the above :)
> Don't think about VMs at all. Or networking plumbing. Or volume storage
> or any of that.
OK, but somebody has to ;)
> Think only in terms of what a user of a DBaaS really wants. At the end
> of the day, all they want is an address in the cloud where they can
> point their application to write and read data from.
> Do they want that data connection to be fast and reliable? Of course,
> but how that happens is irrelevant to them
> Do they want that data to be safe and backed up? Of course, but how that
> happens is irrelevant to them.
Fair enough. The world has changed a lot since RDS (which was the model
for Trove) was designed, it's certainly worth reviewing the base
assumptions before embarking on a new design.
> The problem with many of these high-level *aaS projects is that they
> consider their user to be a typical tenant of general cloud
> infrastructure -- focused on launching VMs and creating volumes and
> networks etc. And the discussions around the implementation of these
> projects always comes back to minutia about how to set up secure
> communication channels between a control plane message bus and the
> service VMs.
Incidentally, the reason that discussions always come back to that is
because OpenStack isn't very good at it, which is a huge problem not
only for the *aaS projects but for user applications in general running
If we had fine-grained authorisation and ubiquitous multi-tenant
asynchronous messaging in OpenStack then I firmly believe that we, and
application developers, would be in much better shape.
> If you create these projects as applications that run on cloud
> infrastructure (OpenStack, k8s or otherwise),
I'm convinced there's an interesting idea here, but the terminology
you're using doesn't really capture it. When you say 'as applications
that run on cloud infrastructure', it sounds like you mean they should
run in a Nova VM, or in a Kubernetes cluster somewhere, rather than on
the OpenStack control plane. I don't think that's what you mean though,
because you can (and IIUC Rackspace does) deploy OpenStack services that
way already, and it has no real effect on the architecture of those
> then the discussions focus
> instead on how the real end-users -- the ones that actually call the
> APIs and utilize the service -- would interact with the APIs and not the
> underlying infrastructure itself.
> Here's an example to think about...
> What if a provider of this DBaaS service wanted to jam 100 database
> instances on a single VM and provide connectivity to those database
> instances to 100 different tenants?
> Would those tenants know if those databases were all serviced from a
> single database server process running on the VM?
You bet they would when one (or all) of the other 99 decided to run a
really expensive query at an inopportune moment :)
> Or 100 contains each
> running a separate database server process? Or 10 containers running 10
> database server processes each?
> No, of course not. And the tenant wouldn't care at all, because the
Well, if they had any kind of regulatory (or even performance)
requirements then the tenant might care really quite a lot. But I take
your point that many might not and it would be good to be able to offer
them lower cost options.
> point of the DBaaS service is to get a database. It isn't to get one or
> more VMs/containers/baremetal servers.
I'm not sure I entirely agree here. There are two kinds of DBaaS. One is
a data API: a multitenant database a la DynamoDB. Those are very cool,
and I'm excited about the potential to reduce the granularity of billing
to a minimum, in much the same way Swift does for storage, and I'm sad
that OpenStack's attempt in this space (MagnetoDB) didn't work out. But
Trove is not that.
People use Trove because they want to use a *particular* database, but
still have all the upgrades, backups, &c. handled for them. Given that
the choice of database is explicitly *not* abstracted away from them,
things like how many different VMs/containers/baremetal servers the
database is running on are very much relevant IMHO, because what you
want depends on both the database and how you're trying to use it. And
because (afaik) none of them have native multitenancy, it's necessary
that no tenant should have to share with any other.
Essentially Trove operates at a moderate level of abstraction -
somewhere between managing the database + the infrastructure it runs on
yourself and just an API endpoint you poke data into. It also operates
at the coarse end of a granularity spectrum running from
VMs->Containers->pay as you go.
It's reasonable to want to move closer to the middle of the granularity
spectrum. But you can't go all the way to the high abstraction/fine
grained ends of the spectra (which turn out to be equivalent) without
becoming something qualitatively different.
> At the end of the day, I think Trove is best implemented as a hosted
> application that exposes an API to its users that is entirely separate
> from the underlying infrastructure APIs like Cinder/Nova/Neutron.
> This is similar to Kevin's k8s Operator idea, which I support but in a
> generic fashion that isn't specific to k8s.
> In the same way that k8s abstracts the underlying infrastructure (via
> its "cloud provider" concept), I think that Trove and similar projects
> need to use a similar abstraction and focus on providing a different API
> to their users that doesn't leak the underlying infrastructure API
> concepts out.
OK, so trying to summarise (stop me if I'm getting it wrong):
essentially you support option (2) because it is a closed abstraction.
Trove has its own quota management, billing, &c. and the user can't see
the VM, so the operator is free to substitute a different backend that
allocates compute capacity in finer-grained increments than Nova does.
Interestingly, that's only an issue because there is no finer-grained
compute resource than a VM available through the OpenStack API. If there
were an OpenStack API (or even just a Keystone-authenticated API) to a
shared, multitenant container orchestration cluster, this wouldn't be an
issue. But apart from OpenShift, I can't think of any cloud service
that's doing that - AWS, Google, OpenStack are all using the model where
the COE cluster is deployed on VMs that are owned by a particular
tenant. Of all the things you could run in containers on shared servers,
databases have arguably the most to lose (performance, security) and the
least to gain (since they're by definition stateful). So my question is:
if this is such a good idea for databases, why isn't anybody doing it
for everything container-based? i.e. instead of Magnum/Zun should we
just be working on a Keystone auth gateway for OpenShift (a.k.a. the
_one_ thing that _everyone_ had hitherto agreed was definitely out of
scope :D )?
Until then it seems to me that the tradeoff is between decoupling it
from the particular cloud it's running on so that users can optionally
deploy it standalone (essentially Vish's proposed solution for the *aaS
services from many moons ago) vs. decoupling it from OpenStack in
general so that the operator has more flexibility in how to deploy.
I'd love to be able to cover both - from a user using it standalone to
spin up and manage a DB in containers on a shared PaaS, through to a
user accessing it as a service to provide a DB running on a dedicated VM
or bare metal server, and everything in between. I don't know is such a
thing is feasible. I suspect we're going to have to talk a lot about VMs
and network plumbing and volume storage :)
>> Of course the current situation, as Amrith alluded to, where the
>> default is option (1) except without the lock-down feature in Nova,
>> though some operators are deploying option (2) but it's not tested
>> upstream... clearly that's the worst of all possible worlds, and AIUI
>> nobody disagrees with that.
>> To my mind, (1) sounds more like "applications that run on OpenStack
>> (or other) infrastructure", since it doesn't require stuff like the
>> admin-only cross-project networking that makes it effectively "part of
>> the infrastructure itself" - as evidenced by the fact that
>> unprivileged users can run it standalone with little more than a
>> simple auth middleware change. But I suspect you are going to use
>> similar logic to argue for (2)? I'd be interested to hear your thoughts.
>> OpenStack Development Mailing List (not for usage questions)
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
More information about the OpenStack-dev