[openstack-dev] [tc] Active or passive role with our database layer

Mike Bayer mbayer at redhat.com
Tue May 23 17:49:51 UTC 2017



On 05/23/2017 01:10 PM, Octave J. Orgeron wrote:
> Comments below..
> 
> On 5/21/2017 1:38 PM, Monty Taylor wrote:

>> For example: An HA strategy using slave promotion and a VIP that 
>> points at the current write master paired with an application 
>> incorrectly configured to do such a thing can lead to writes to the 
>> wrong host after a failover event and an application that seems to be 
>> running fine until the data turns up weird after a while.
> 
> This is definitely a more complicated area that becomes more and more 
> specific to the clustering technology being used. Galera vs. MySQL 
> Cluster is a good example. Galera has an active/passive architecture 
> where the above issues become a concern for sure. 

This is not my understanding; Galera is multi-master and if you lose a 
node, you don't lose any committed transactions; the writesets are 
validated as acceptable by, and pushed out to all nodes before your 
commit succeeds.   There's an option to make it wait until all those 
writesets are fully written to disk as well, but even with that option 
flipped off, if you COMMIT to one node then that node explodes, you lose 
nothing. your writesets have been verified as will be accepted by all 
the other nodes.

active/active is the second bullet point on the main homepage: 
http://galeracluster.com/products/


>>
>> In the "active" approach, we still document expectations, but we also 
>> validate them. If they are not what we expect but can be changed at 
>> runtime, we change them overriding conflicting environmental config, 
>> and if we can't, we hard-stop indicating an unsuitable environment. 
>> Rather than providing helper tools, we perform the steps needed 
>> ourselves, in the order they need to be performed, ensuring that they 
>> are done in the manner in which they need to be done.
> 
> This might be a trickier situation, especially if the database(s) are in 
> a separate or dedicated environment that the OpenStack service processes 
> don't have access to. Of course for SQL commands, this isn't a problem. 
> But changing the configuration files and restarting the database may be 
> a harder thing to expect.

nevertheless the HA setup within tripleo does do this, currently using 
Pacemaker and resource agents.    This is within the scope of at least 
parts of Openstack.

> 
>>
>> In either approach the OpenStack service has to be able to talk to 
>> both old and new versions of the schema. And in either approach we 
>> need to make sure to limit the schema change operations to the set 
>> that can be accomplished in an online fashion. We also have to be 
>> careful to not start writing values to new columns until all of the 
>> nodes have been updated, because the replication stream can't 
>> replicate the new column value to nodes that don't have the new column.
> 
> This is another area where something like MySQL Cluster (NDB) would 
> operate differently because it's an active/active architecture. So 
> limiting the number of online changes while a table is locked across the 
> cluster would be very important. There is also the timeouts for the 
> applications to consider, something that could be abstracted again with 
> oslo.db.

So the DDL we do on Galera, to confirm but also clarify Monty's point, 
is under the realm of "total order isolation", which means it's going to 
hold up the whole cluster while DDL is applied to all nodes.   Monty 
says this disqualifies it as an "online upgrade", which is because if 
you emitted DDL that had to run default values into a million rows then 
your whole cluster would temporarily have to wait for that to happen; we 
handle that by making sure we don't do migrations with that kind of data 
requirement and while yes, the DB has to wait for a schema change to 
apply, they are at least very short (in theory).   For practical 
purposes, it is *mostly* an "online" style of migration because all the 
services that talk to the database can keep on talking to the database 
without being stopped, upgraded to new software version, and restarted, 
which IMO is what's really hard about "online" upgrades.   It does mean 
that services will just have a little more latency while operations 
proceed.  Maybe we need a new term called "quasi-online" or something 
like that.

Facebook has released a Python version of their "online" schema 
migration tool for MySQL which does the full blown "create a new, blank 
table" approach, e.g. which contains the newer version of the schema, so 
that nothing at all stops or slows down at all.  And then to manage 
between the two tables while everything is running it also makes a 
"change capture" table to keep track of what's going on, and then to 
wire it all together it uses...triggers! 
https://github.com/facebookincubator/OnlineSchemaChange/wiki/How-OSC-works. 
   Crazy Facebook kids.  How we know that "make two more tables and wire 
it all together with new triggers" in fact is more performant than just, 
"add a column to the table", I'm not sure how/when they make that 
determination.   I don't see an Openstack cluster as quite the same 
thing as hosting a site like Facebook so I lean towards the more liberal 
interpretation of "online upgrades".



>>
>> * Versions
>>
>> It's worth noting that behavior for schema updates and other things 
>> change over time with backend database version. We set minimum 
>> versions of other things, like libvirt and OVS - so we might also want 
>> to set minimum versions for what we can support in the database. That 
>> way we can know for a given release of OpenStack what DDL operations 
>> are safe to use for a rolling upgrade and what are not. That means 
>> detecting such a version and potentially refusing to perform an 
>> upgrade if the version isn't acceptable. That reduces the operator's 
>> ability to choose what version of the database software to run, but 
>> increases our ability to be able to provide tooling and operations 
>> that we can be confident will work.
> 
> Validating the MySQL database version is a good idea. The features do 
> change over time. A good example is how in 5.7, you'll get warnings 
> about duplicate indexes being dropped in a future release which will 
> definitely affect multiple services today.
> 
>>
>> == Summary ==
>>
>> These are just a couple of examples - but I hope they're at least 
>> mildly useful to explain some of the sorts of issues at hand - and why 
>> I think we need to clarify what our intent is separate from the issue 
>> of what databases we "support".
>>
>> Some operations have one and only one "right" way to be done. For 
>> those operations if we take an 'active' approach, we can implement 
>> them once and not make all of our deployers and distributors each 
>> implement and run them. However, there is a cost to that. Automatic 
>> and prescriptive behavior has a higher dev cost that is proportional 
>> to the number of supported architectures. This then implies a need to 
>> limit deployer architecture choices.
>>
>> On the other hand, taking an 'external' approach allows us to federate 
>> the work of supporting the different architectures to the deployers. 
>> This means more work on the deployer's part, but also potentially a 
>> greater amount of freedom on their part to deploy supporting services 
>> the way they want. It means that some of the things that have been 
>> requested of us - such as easier operation and an increase in the 
>> number of things that can be upgraded with no-downtime - might become 
>> prohibitively costly for us to implement.
>>
>> I honestly think that both are acceptable choices we can make and that 
>> for any given topic there are middle grounds to be found at any given 
>> moment in time.
>>
>> BUT - without a decision as to what our long-term philosophical intent 
>> in this space is that is clear and understandable to everyone, we 
>> cannot have successful discussions about the impact of implementation 
>> choices, since we will not have a shared understanding of the problem 
>> space or the solutions we're talking about.
>>
>> For my part - I hear complaints that OpenStack is 'difficult' to 
>> operate and requests for us to make it easier. This is why I have been 
>> advocating some actions that are clearly rooted in an 'active' worldview.
>>
>> Finally, this is focused on the database layer but similar questions 
>> arise in other places. What is our philosophy on prescriptive/active 
>> choices on our part coupled with automated action and ease of 
>> operation vs. expanded choices for the deployer at the expense of 
>> configuration and operational complexity. For now let's see if we can 
>> answer it for databases, and see where that gets us.
>>
>> Thanks for reading.
>>
>> Monty
>>
>> __________________________________________________________________________ 
>>
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: 
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 



More information about the OpenStack-dev mailing list