[openstack-dev] [tc] Active or passive role with our database layer

Monty Taylor mordred at inaugust.com
Sun May 21 19:38:04 UTC 2017

Hi all!

As the discussion around PostgreSQL has progressed, it has come clear to 
me that there is a decently deep philosophical question on which we do 
not currently share either definition or agreement. I believe that the 
lack of clarity on this point is one of the things that makes the 
PostgreSQL conversation difficult.

I believe the question is between these two things:

* Should OpenStack assume the existence of an external database service 
that it treat as an black-box on the other side of a connection string?

* Should OpenStack take an active and/or opinionated role in managing 
the database service?

A potentially obvious question about that (asked by Mike Bayer in a 
different thread) is: "what do you mean by managing?"

What I mean by managing is doing all of the things you can do related to 
database operational controls short of installing the software, writing 
the basic db config files to disk and stopping and starting the 
services. It means being much more prescriptive about what types of 
config we support, validating config settings that cannot be overridden 
at runtime and refusing to operate if they are unworkable.

Why would we want to be 'more active'? When managing and tuning 
databases, there are some things that are driven by the environment and 
some things that are driven by the application.

Things that are driven by the environment include things like the amount 
of RAM actually available, whether or not the machines running the 
database are dedicated or shared, firewall settings, selinux settings 
and what versions of software are available.

Things that are driven by the application are things like character set 
and collation, schema design, data types, schema upgrade and HA strategies.

One might argue that HA strategies are an operator concern, but in 
reality the set of workable HA strategies is tightly constrained by how 
the application works, and the pairing an application expecting one HA 
strategy with a deployment implementing a different one can have 
negative results ranging from unexpected downtime to data corruption.

For example: An HA strategy using slave promotion and a VIP that points 
at the current write master paired with an application incorrectly 
configured to do such a thing can lead to writes to the wrong host after 
a failover event and an application that seems to be running fine until 
the data turns up weird after a while.

For the areas in which the characteristics of the database are tied 
closely to the application behavior, there is a constrained set of valid 
choices at the database level. Sometimes that constrained set only has 
one member.

The approach to those is what I'm talking about when I ask the question 
about "external" or "active".

In the "external" approach, we document the expectations and then write 
the code assuming that the database is set up appropriately. We may 
provide some helper tools, such as 'nova-manage db sync' and 
documentation on the sequence of steps the operator should take.

In the "active" approach, we still document expectations, but we also 
validate them. If they are not what we expect but can be changed at 
runtime, we change them overriding conflicting environmental config, and 
if we can't, we hard-stop indicating an unsuitable environment. Rather 
than providing helper tools, we perform the steps needed ourselves, in 
the order they need to be performed, ensuring that they are done in the 
manner in which they need to be done.

Some examples:

* Character Sets / Collations

We currently enforce at testing time that all database migrations are 
explicit about InnoDB. We also validate in oslo.db that table character 
sets have the string 'utf8' in them. (only on MySQL) We do not have any 
check for case-sensitive or case-insensitive collations (these affect 
sorting and comparison operations) Because we don't, different server 
config settings or different database backends for different clouds can 
actually behave differently through the REST API.

To deal with that:

First we'd have to decide whether case sensitive or case insensitive was 
what we wanted. If we decided we wanted case sensitive, we could add an 
enforcement of that in oslo.db, and write migrations to get from case 
insensitive indexes to case sensitive indexes on tables where we 
detected that a case insensitive collation had been used. If we decided 
we wanted to stick with case insensitive we could similarly add code to 
enforce it on MySQL. To enforce it actively on PostgresSQL, we'd need to 
either switch our code that's using comparisons to use the sqlalchemy 
case-insensitive versions explicitly, or maybe write some sort of 
overloaded driver for PG that turns all comparisons into 
case-insensitive, which would wrap both sides of comparisons in lower() 
calls (which has some indexing concerns, but let's ignore that for the 
moment) We could also take the 'external' approach and just document it, 
then define API tests and try to tie the insensitive behavior in the API 
to Interop Compliance. I'm not 100% sure how a db operator would 
remediate this - but PG has some fancy computed index features - so 
maybe it would be possible.

A similar issue lurks with the fact that MySQL unicode storage is 3-byte 
by default and 4-byte is opt-in. We could take the 'external' approach 
and document it and assume the operator has configured their my.cnf with 
the appropriate default, or taken an 'active' approach where we override 
it in all the models and make migrations to get us from 3 to 4 byte.

* Schema Upgrades

The way you roll out online schema changes is highly dependent on your 
database architecture.

Just limiting to the MySQL world:

If you do Galera, you can do roll them out in Total Order or Rolling 
fashion. Total Order locks basically everything while it's happening, so 
isn't a candidate for "online". In rolling you apply the schema change 
to one node at a time. If you do that, the application has to be able to 
deal with both forms of the table, and you have to deal with ensuring 
that data can replicate appropriately while the schema change is happening.

If you do DRBD active/passive or a single-node deployment you only have 
one upgrade operation to perform, but you will only lock certain things 
- depending on what schema change operations you were performing.

If you do master/slave, you can roll out the schema change to your 
slaves one at a time, wait for them all to catch up, then promote a 
slave taking the current master out of commission - update the old 
master then then put it into the slave pool. Like Galera rolling, the 
app needs to be able to handle old and new versions and the replication 
stream needs to be able to replicate between the versions.

Making sure that the stream is able to replicate puts a set of 
limitations on the types of schema changes you can perform, but it is an 
understandable constrained set.

In either approach the OpenStack service has to be able to talk to both 
old and new versions of the schema. And in either approach we need to 
make sure to limit the schema change operations to the set that can be 
accomplished in an online fashion. We also have to be careful to not 
start writing values to new columns until all of the nodes have been 
updated, because the replication stream can't replicate the new column 
value to nodes that don't have the new column.

In either approach we can decide to limit the number of architectures we 
support for "online" upgrades.

In an 'external' approach, we make sure to do those things, we write 
documentation and we assume the database will be updated appropriately. 
We can document that if the deployer chooses to do Total Order on 
Galera, they will not have online upgrades. There will also have to be a 
deployer step to let the services know that they can start writing 
values to the new schema format once the upgrade is complete.

In an 'active' approach, we can notice that we have an update available 
to run, and we can drive it from code. We can check for Galera, and if 
it's there we can run the upgrade in Rolling fashion one node at a time 
with no work needed on the part of the deployer. Since we're driving the 
upgrade, we know when it's done, so we can signal ourselves to start 
using the new version. We'd obviously have to pick the set of acceptable 
architectures we can handle consistently orchestrating.

* Versions

It's worth noting that behavior for schema updates and other things 
change over time with backend database version. We set minimum versions 
of other things, like libvirt and OVS - so we might also want to set 
minimum versions for what we can support in the database. That way we 
can know for a given release of OpenStack what DDL operations are safe 
to use for a rolling upgrade and what are not. That means detecting such 
a version and potentially refusing to perform an upgrade if the version 
isn't acceptable. That reduces the operator's ability to choose what 
version of the database software to run, but increases our ability to be 
able to provide tooling and operations that we can be confident will work.

== Summary ==

These are just a couple of examples - but I hope they're at least mildly 
useful to explain some of the sorts of issues at hand - and why I think 
we need to clarify what our intent is separate from the issue of what 
databases we "support".

Some operations have one and only one "right" way to be done. For those 
operations if we take an 'active' approach, we can implement them once 
and not make all of our deployers and distributors each implement and 
run them. However, there is a cost to that. Automatic and prescriptive 
behavior has a higher dev cost that is proportional to the number of 
supported architectures. This then implies a need to limit deployer 
architecture choices.

On the other hand, taking an 'external' approach allows us to federate 
the work of supporting the different architectures to the deployers. 
This means more work on the deployer's part, but also potentially a 
greater amount of freedom on their part to deploy supporting services 
the way they want. It means that some of the things that have been 
requested of us - such as easier operation and an increase in the number 
of things that can be upgraded with no-downtime - might become 
prohibitively costly for us to implement.

I honestly think that both are acceptable choices we can make and that 
for any given topic there are middle grounds to be found at any given 
moment in time.

BUT - without a decision as to what our long-term philosophical intent 
in this space is that is clear and understandable to everyone, we cannot 
have successful discussions about the impact of implementation choices, 
since we will not have a shared understanding of the problem space or 
the solutions we're talking about.

For my part - I hear complaints that OpenStack is 'difficult' to operate 
and requests for us to make it easier. This is why I have been 
advocating some actions that are clearly rooted in an 'active' worldview.

Finally, this is focused on the database layer but similar questions 
arise in other places. What is our philosophy on prescriptive/active 
choices on our part coupled with automated action and ease of operation 
vs. expanded choices for the deployer at the expense of configuration 
and operational complexity. For now let's see if we can answer it for 
databases, and see where that gets us.

Thanks for reading.


More information about the OpenStack-dev mailing list