Open Stack

Mon May 22 02:09:28 UTC 2017

On 05/21/2017 03:38 PM, Monty Taylor wrote:
> documentation on the sequence of steps the operator should take.
> 
> In the "active" approach, we still document expectations, but we also 
> validate them. If they are not what we expect but can be changed at 
> runtime, we change them overriding conflicting environmental config, and 
> if we can't, we hard-stop indicating an unsuitable environment. Rather 
> than providing helper tools, we perform the steps needed ourselves, in 
> the order they need to be performed, ensuring that they are done in the 
> manner in which they need to be done.

we do this in places like tripleo.   The MySQL configs and such are 
checked into the source tree, it includes details like 
innodb_file_per_table, timeouts used by haproxy, etc.   I know tripleo 
is not like the service itself like Nova but it's also not exactly 
something we hand off to the operators to figure out from scratch either.

We do some of it in oslo.db as well.  We set things like MySQL SQL_MODE. 
  We try to make sure the unicode-ish flags are set up and that we're 
using utf-8 encoding.

> 
> Some examples:
> 
> * Character Sets / Collations
> 
> We currently enforce at testing time that all database migrations are 
> explicit about InnoDB. We also validate in oslo.db that table character 
> sets have the string 'utf8' in them. (only on MySQL) We do not have any 
> check for case-sensitive or case-insensitive collations (these affect 
> sorting and comparison operations) Because we don't, different server 
> config settings or different database backends for different clouds can 
> actually behave differently through the REST API.
> 
> To deal with that:
> 
> First we'd have to decide whether case sensitive or case insensitive was 
> what we wanted. If we decided we wanted case sensitive, we could add an 
> enforcement of that in oslo.db, and write migrations to get from case 
> insensitive indexes to case sensitive indexes on tables where we 
> detected that a case insensitive collation had been used. If we decided 
> we wanted to stick with case insensitive we could similarly add code to 
> enforce it on MySQL. To enforce it actively on PostgresSQL, we'd need to 
> either switch our code that's using comparisons to use the sqlalchemy 
> case-insensitive versions explicitly, or maybe write some sort of 
> overloaded driver for PG that turns all comparisons into 
> case-insensitive, which would wrap both sides of comparisons in lower() 
> calls (which has some indexing concerns, but let's ignore that for the 
> moment) We could also take the 'external' approach and just document it, 
> then define API tests and try to tie the insensitive behavior in the API 
> to Interop Compliance. I'm not 100% sure how a db operator would 
> remediate this - but PG has some fancy computed index features - so 
> maybe it would be possible.

let's make the case sensitivity explicitly enforced!

> 
> A similar issue lurks with the fact that MySQL unicode storage is 3-byte 
> by default and 4-byte is opt-in. We could take the 'external' approach 
> and document it and assume the operator has configured their my.cnf with 
> the appropriate default, or taken an 'active' approach where we override 
> it in all the models and make migrations to get us from 3 to 4 byte.

let's force MySQL to use utf8mb4!   Although I am curious what is the 
actual use case we want to hit here (which gets into, zzzeek is ignorant 
as to which unicode glyphs actually live in 4-byte utf8 characters).

> 
> * Schema Upgrades
> 
> The way you roll out online schema changes is highly dependent on your 
> database architecture.
> 
> Just limiting to the MySQL world:
> 
> If you do Galera, you can do roll them out in Total Order or Rolling 
> fashion. Total Order locks basically everything while it's happening, so 
> isn't a candidate for "online". In rolling you apply the schema change 
> to one node at a time. If you do that, the application has to be able to 
> deal with both forms of the table, and you have to deal with ensuring 
> that data can replicate appropriately while the schema change is happening.

Galera replicates DDL operations.   If I add a column on a node, it pops 
up on the other nodes too in a similar way as transactions are 
replicated, e.g. nearly synchronous.   I would *assume* it has to do 
this in the context of it's usual transaction ordering, even though 
MySQL doesn't do transactional DDL, so that if the cluster sees 
transaction A, schema change B, transaction C that depends on B, that 
ordering is serialized appropriately.    However, even if it doesn't do 
that, the rolling upgrades we do don't start the services talking to the 
new schema structures until the DDL changes are complete, and Galera is 
near-synchronous replication.

Also speaking to the "active" question, we certainly have all kinds of 
logic in Openstack (the optimistic update strategy in particular) that 
take "Galera" into account.  And of course we have Galera config inside 
of tripleo.  So that's kind of the "active" approach, I think.

> 
> If you do DRBD active/passive or a single-node deployment you only have 
> one upgrade operation to perform, but you will only lock certain things 
> - depending on what schema change operations you were performing.
> 
> If you do master/slave, you can roll out the schema change to your 
> slaves one at a time, wait for them all to catch up, then promote a 
> slave taking the current master out of commission - update the old 
> master then then put it into the slave pool. Like Galera rolling, the 
> app needs to be able to handle old and new versions and the replication 
> stream needs to be able to replicate between the versions.
> 
> Making sure that the stream is able to replicate puts a set of 
> limitations on the types of schema changes you can perform, but it is an 
> understandable constrained set.

My current thinking for online upgrades, the schema changes and the 
application speaking to those schema changes are at least isolated 
states of the openstack cluster as a whole.    That's at least how it 
seems to work right now.   Also right now, Openstack has almost no code 
I'm aware of that takes advantage of true master / asynchronous slaves. 
  While it's been kind of stuck in oslo.db for years, and in 
enginefacade I added new decorators that allow you to declare a method 
as safe to run in a "slave", applications are hardly using this feature 
at all.  I vaguely recall one obscure feature in Nova maybe using it for 
something.  But last I checked, even if you configure Opentack with a 
"master" and "slave" database URL (which we support!), 90% of everything 
is on the "master" anyway (perhaps some projects that I never look at do 
in fact use the "slave", please let me know as I should probably be more 
familiar with that).

> 
> In either approach the OpenStack service has to be able to talk to both 
> old and new versions of the schema. And in either approach we need to 
> make sure to limit the schema change operations to the set that can be 
> accomplished in an online fashion. We also have to be careful to not 
> start writing values to new columns until all of the nodes have been 
> updated, because the replication stream can't replicate the new column 
> value to nodes that don't have the new column.

This is...what everyone (except keystone w/ the evil triggers) does 
already, I thought?

> 
> In either approach we can decide to limit the number of architectures we 
> support for "online" upgrades.
> 
> In an 'external' approach, we make sure to do those things, we write 
> documentation and we assume the database will be updated appropriately. 
> We can document that if the deployer chooses to do Total Order on 
> Galera, they will not have online upgrades. There will also have to be a 
> deployer step to let the services know that they can start writing 
> values to the new schema format once the upgrade is complete.
> 
> In an 'active' approach, we can notice that we have an update available 
> to run, and we can drive it from code. We can check for Galera, and if 
> it's there we can run the upgrade in Rolling fashion one node at a time 
> with no work needed on the part of the deployer. Since we're driving the 
> upgrade, we know when it's done, so we can signal ourselves to start 
> using the new version. We'd obviously have to pick the set of acceptable 
> architectures we can handle consistently orchestrating.
> 
> * Versions
> 
> It's worth noting that behavior for schema updates and other things 
> change over time with backend database version. We set minimum versions 
> of other things, like libvirt and OVS - so we might also want to set 
> minimum versions for what we can support in the database. 

agree though so far I don't think we've hit too many features that have 
an issue here, the MySQL/Mariadb 5.x set of features are ubiquitous now 
and that's pretty much what we target.  In the Postgresql world, they 
are crazy with the new syntaxes every release (to my dismay having to 
support them all) but none of these are really appropriate for Openstack 
as long as we are targeting MySQL also.

That way we
> can know for a given release of OpenStack what DDL operations are safe 
> to use for a rolling upgrade and what are not. That means detecting such 
> a version and potentially refusing to perform an upgrade if the version 
> isn't acceptable. That reduces the operator's ability to choose what 
> version of the database software to run, but increases our ability to be 
> able to provide tooling and operations that we can be confident will work.

We definitely make sure that if we put a migration directive somewhere, 
it's going to work on the MySQL/MariaDB's that are in general use.  I 
think there might have even been some behavior recently that was perhaps 
on the 5.5/5.6 border but I can't recall.

> 
> == Summary ==
> 
> These are just a couple of examples - but I hope they're at least mildly 
> useful to explain some of the sorts of issues at hand - and why I think 
> we need to clarify what our intent is separate from the issue of what 
> databases we "support".
> 
> Some operations have one and only one "right" way to be done. For those 
> operations if we take an 'active' approach, we can implement them once 
> and not make all of our deployers and distributors each implement and 
> run them. However, there is a cost to that. Automatic and prescriptive 
> behavior has a higher dev cost that is proportional to the number of 
> supported architectures. This then implies a need to limit deployer 
> architecture choices.
> 
> On the other hand, taking an 'external' approach allows us to federate 
> the work of supporting the different architectures to the deployers. 
> This means more work on the deployer's part, but also potentially a 
> greater amount of freedom on their part to deploy supporting services 
> the way they want. It means that some of the things that have been 
> requested of us - such as easier operation and an increase in the number 
> of things that can be upgraded with no-downtime - might become 
> prohibitively costly for us to implement.

I think right now we are doing a "hybrid".  If you're on a MySQL 
variant, you get the cadillac version and if you're going with 
Postgresql, you get the stick shift.    I'm not endorsing this but it 
does seem work to some extent.

> 
> I honestly think that both are acceptable choices we can make and that 
> for any given topic there are middle grounds to be found at any given 
> moment in time.

ok i just said that

> 
> BUT - without a decision as to what our long-term philosophical intent 
> in this space is that is clear and understandable to everyone, we cannot 
> have successful discussions about the impact of implementation choices, 
> since we will not have a shared understanding of the problem space or 
> the solutions we're talking about.
> 
> For my part - I hear complaints that OpenStack is 'difficult' to operate 
> and requests for us to make it easier. This is why I have been 
> advocating some actions that are clearly rooted in an 'active' worldview.

I think this goes to a point I typed on the etherpad in the boston 
session, I don't think that MySQL defaults to 3-byte utf8 or that if a 
deployer happens to use Postgresql they suddenly get case sensitive 
comparisons are the big reasons openstack is "difficult".    I find 
openstack to be really difficult but setting the db connection URL and 
running the "db-manage" scripts is kind of the easiest part of it (but 
of course, I'm super biased on that).

> 
> Finally, this is focused on the database layer but similar questions 
> arise in other places. What is our philosophy on prescriptive/active 
> choices on our part coupled with automated action and ease of operation 
> vs. expanded choices for the deployer at the expense of configuration 
> and operational complexity. For now let's see if we can answer it for 
> databases, and see where that gets us.
> 
> Thanks for reading.
> 
> Monty
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Open Stack

[openstack-dev] [tc] Active or passive role with our database layer

OpenStack

Community

Documentation

Branding & Legal