[openstack-dev] [tc] Active or passive role with our database layer

Octave J. Orgeron octave.orgeron at oracle.com
Tue May 23 17:10:29 UTC 2017

Comments below..

On 5/21/2017 1:38 PM, Monty Taylor wrote:
> Hi all!
> As the discussion around PostgreSQL has progressed, it has come clear 
> to me that there is a decently deep philosophical question on which we 
> do not currently share either definition or agreement. I believe that 
> the lack of clarity on this point is one of the things that makes the 
> PostgreSQL conversation difficult.
> I believe the question is between these two things:
> * Should OpenStack assume the existence of an external database 
> service that it treat as an black-box on the other side of a 
> connection string?
> * Should OpenStack take an active and/or opinionated role in managing 
> the database service?
> A potentially obvious question about that (asked by Mike Bayer in a 
> different thread) is: "what do you mean by managing?"
> What I mean by managing is doing all of the things you can do related 
> to database operational controls short of installing the software, 
> writing the basic db config files to disk and stopping and starting 
> the services. It means being much more prescriptive about what types 
> of config we support, validating config settings that cannot be 
> overridden at runtime and refusing to operate if they are unworkable.

I think it's helpful and important for us to have automation tooling 
like tripleo, puppet, etc. that can stand up a MySQL database. But we 
also have to realize that as shops mature, they will deploy more 
complicated database topologies, clustered configurations, and 
replication scenarios. So I think we shouldn't go overboard with being 
prescriptive. We also have to realize that in the enterprise space, 
databases are usually deployed and managed by a separate database team, 
which means less control over that layer. So we shouldn't force people 
into this model. We should provide best practice documentation, examples 
(tripleo, puppet, ansible, etc.), and leave it up to the operator.

> Why would we want to be 'more active'? When managing and tuning 
> databases, there are some things that are driven by the environment 
> and some things that are driven by the application.
> Things that are driven by the environment include things like the 
> amount of RAM actually available, whether or not the machines running 
> the database are dedicated or shared, firewall settings, selinux 
> settings and what versions of software are available.

This is a good example of an area that we should focus on documenting 
best practices and leave it to the operator to implement. Guidelines 
around cpu, memory, security settings, tunables, etc. are what's needed 
here. Today, there isn't really any guidance or best practices on even 
sizing the database(s) for a given deployment size.

> Things that are driven by the application are things like character 
> set and collation, schema design, data types, schema upgrade and HA 
> strategies.

These are things that we can have a bit more control or direction on.

> One might argue that HA strategies are an operator concern, but in 
> reality the set of workable HA strategies is tightly constrained by 
> how the application works, and the pairing an application expecting 
> one HA strategy with a deployment implementing a different one can 
> have negative results ranging from unexpected downtime to data 
> corruption.
> For example: An HA strategy using slave promotion and a VIP that 
> points at the current write master paired with an application 
> incorrectly configured to do such a thing can lead to writes to the 
> wrong host after a failover event and an application that seems to be 
> running fine until the data turns up weird after a while.

This is definitely a more complicated area that becomes more and more 
specific to the clustering technology being used. Galera vs. MySQL 
Cluster is a good example. Galera has an active/passive architecture 
where the above issues become a concern for sure. While MySQL Cluster 
(NDB) is an active/active architecture, so losing a node only effects 
any uncommitted transactions, that could easily be addressed with a 
retry. These topologies will become more complicated as people start 
looking at cross regional replication and DR.

> For the areas in which the characteristics of the database are tied 
> closely to the application behavior, there is a constrained set of 
> valid choices at the database level. Sometimes that constrained set 
> only has one member.
> The approach to those is what I'm talking about when I ask the 
> question about "external" or "active".
> In the "external" approach, we document the expectations and then 
> write the code assuming that the database is set up appropriately. We 
> may provide some helper tools, such as 'nova-manage db sync' and 
> documentation on the sequence of steps the operator should take.
> In the "active" approach, we still document expectations, but we also 
> validate them. If they are not what we expect but can be changed at 
> runtime, we change them overriding conflicting environmental config, 
> and if we can't, we hard-stop indicating an unsuitable environment. 
> Rather than providing helper tools, we perform the steps needed 
> ourselves, in the order they need to be performed, ensuring that they 
> are done in the manner in which they need to be done.

This might be a trickier situation, especially if the database(s) are in 
a separate or dedicated environment that the OpenStack service processes 
don't have access to. Of course for SQL commands, this isn't a problem. 
But changing the configuration files and restarting the database may be 
a harder thing to expect.

> Some examples:
> * Character Sets / Collations
> We currently enforce at testing time that all database migrations are 
> explicit about InnoDB. We also validate in oslo.db that table 
> character sets have the string 'utf8' in them. (only on MySQL) We do 
> not have any check for case-sensitive or case-insensitive collations 
> (these affect sorting and comparison operations) Because we don't, 
> different server config settings or different database backends for 
> different clouds can actually behave differently through the REST API.
> To deal with that:
> First we'd have to decide whether case sensitive or case insensitive 
> was what we wanted. If we decided we wanted case sensitive, we could 
> add an enforcement of that in oslo.db, and write migrations to get 
> from case insensitive indexes to case sensitive indexes on tables 
> where we detected that a case insensitive collation had been used. If 
> we decided we wanted to stick with case insensitive we could similarly 
> add code to enforce it on MySQL. To enforce it actively on 
> PostgresSQL, we'd need to either switch our code that's using 
> comparisons to use the sqlalchemy case-insensitive versions 
> explicitly, or maybe write some sort of overloaded driver for PG that 
> turns all comparisons into case-insensitive, which would wrap both 
> sides of comparisons in lower() calls (which has some indexing 
> concerns, but let's ignore that for the moment) We could also take the 
> 'external' approach and just document it, then define API tests and 
> try to tie the insensitive behavior in the API to Interop Compliance. 
> I'm not 100% sure how a db operator would remediate this - but PG has 
> some fancy computed index features - so maybe it would be possible.

I think that abstraction with oslo.db would be the right path here. But 
you are also right that we need to have a consistent compliance policy 
at the API layer. We may fix things down at the DB level with oslo.db, 
but everything on top of that needs to also fall in-line. There is a 
very high chance that there are hard-coded workarounds or assumptions in 
the services and apis today.

> A similar issue lurks with the fact that MySQL unicode storage is 
> 3-byte by default and 4-byte is opt-in. We could take the 'external' 
> approach and document it and assume the operator has configured their 
> my.cnf with the appropriate default, or taken an 'active' approach 
> where we override it in all the models and make migrations to get us 
> from 3 to 4 byte.

I think an active approach on this would be ideal, just like the utf8 
and InnoDB settings are today. FYI, not all services are enforcing these 
in a consistent manor today. Another example of something that should be 
abstracted at the oslo.db layer and get the human element out of the way.

> * Schema Upgrades
> The way you roll out online schema changes is highly dependent on your 
> database architecture.
> Just limiting to the MySQL world:
> If you do Galera, you can do roll them out in Total Order or Rolling 
> fashion. Total Order locks basically everything while it's happening, 
> so isn't a candidate for "online". In rolling you apply the schema 
> change to one node at a time. If you do that, the application has to 
> be able to deal with both forms of the table, and you have to deal 
> with ensuring that data can replicate appropriately while the schema 
> change is happening.
> If you do DRBD active/passive or a single-node deployment you only 
> have one upgrade operation to perform, but you will only lock certain 
> things - depending on what schema change operations you were performing.
> If you do master/slave, you can roll out the schema change to your 
> slaves one at a time, wait for them all to catch up, then promote a 
> slave taking the current master out of commission - update the old 
> master then then put it into the slave pool. Like Galera rolling, the 
> app needs to be able to handle old and new versions and the 
> replication stream needs to be able to replicate between the versions.
> Making sure that the stream is able to replicate puts a set of 
> limitations on the types of schema changes you can perform, but it is 
> an understandable constrained set.
> In either approach the OpenStack service has to be able to talk to 
> both old and new versions of the schema. And in either approach we 
> need to make sure to limit the schema change operations to the set 
> that can be accomplished in an online fashion. We also have to be 
> careful to not start writing values to new columns until all of the 
> nodes have been updated, because the replication stream can't 
> replicate the new column value to nodes that don't have the new column.

This is another area where something like MySQL Cluster (NDB) would 
operate differently because it's an active/active architecture. So 
limiting the number of online changes while a table is locked across the 
cluster would be very important. There is also the timeouts for the 
applications to consider, something that could be abstracted again with 

> In either approach we can decide to limit the number of architectures 
> we support for "online" upgrades.
> In an 'external' approach, we make sure to do those things, we write 
> documentation and we assume the database will be updated 
> appropriately. We can document that if the deployer chooses to do 
> Total Order on Galera, they will not have online upgrades. There will 
> also have to be a deployer step to let the services know that they can 
> start writing values to the new schema format once the upgrade is 
> complete.
> In an 'active' approach, we can notice that we have an update 
> available to run, and we can drive it from code. We can check for 
> Galera, and if it's there we can run the upgrade in Rolling fashion 
> one node at a time with no work needed on the part of the deployer. 
> Since we're driving the upgrade, we know when it's done, so we can 
> signal ourselves to start using the new version. We'd obviously have 
> to pick the set of acceptable architectures we can handle consistently 
> orchestrating.

This would be an interesting idea to expand to a autonomic orchestration 
framework within the control plane to handle the database upgrades 
online and the restarting of the dependent services in the correct 
order. If we only focus on the database piece, it may not be as 
interesting for operators.

> * Versions
> It's worth noting that behavior for schema updates and other things 
> change over time with backend database version. We set minimum 
> versions of other things, like libvirt and OVS - so we might also want 
> to set minimum versions for what we can support in the database. That 
> way we can know for a given release of OpenStack what DDL operations 
> are safe to use for a rolling upgrade and what are not. That means 
> detecting such a version and potentially refusing to perform an 
> upgrade if the version isn't acceptable. That reduces the operator's 
> ability to choose what version of the database software to run, but 
> increases our ability to be able to provide tooling and operations 
> that we can be confident will work.

Validating the MySQL database version is a good idea. The features do 
change over time. A good example is how in 5.7, you'll get warnings 
about duplicate indexes being dropped in a future release which will 
definitely affect multiple services today.

> == Summary ==
> These are just a couple of examples - but I hope they're at least 
> mildly useful to explain some of the sorts of issues at hand - and why 
> I think we need to clarify what our intent is separate from the issue 
> of what databases we "support".
> Some operations have one and only one "right" way to be done. For 
> those operations if we take an 'active' approach, we can implement 
> them once and not make all of our deployers and distributors each 
> implement and run them. However, there is a cost to that. Automatic 
> and prescriptive behavior has a higher dev cost that is proportional 
> to the number of supported architectures. This then implies a need to 
> limit deployer architecture choices.
> On the other hand, taking an 'external' approach allows us to federate 
> the work of supporting the different architectures to the deployers. 
> This means more work on the deployer's part, but also potentially a 
> greater amount of freedom on their part to deploy supporting services 
> the way they want. It means that some of the things that have been 
> requested of us - such as easier operation and an increase in the 
> number of things that can be upgraded with no-downtime - might become 
> prohibitively costly for us to implement.
> I honestly think that both are acceptable choices we can make and that 
> for any given topic there are middle grounds to be found at any given 
> moment in time.
> BUT - without a decision as to what our long-term philosophical intent 
> in this space is that is clear and understandable to everyone, we 
> cannot have successful discussions about the impact of implementation 
> choices, since we will not have a shared understanding of the problem 
> space or the solutions we're talking about.
> For my part - I hear complaints that OpenStack is 'difficult' to 
> operate and requests for us to make it easier. This is why I have been 
> advocating some actions that are clearly rooted in an 'active' worldview.
> Finally, this is focused on the database layer but similar questions 
> arise in other places. What is our philosophy on prescriptive/active 
> choices on our part coupled with automated action and ease of 
> operation vs. expanded choices for the deployer at the expense of 
> configuration and operational complexity. For now let's see if we can 
> answer it for databases, and see where that gets us.
> Thanks for reading.
> Monty
> __________________________________________________________________________ 
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: 
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20170523/2902f8cd/attachment-0001.html>

More information about the OpenStack-dev mailing list