Thanks for all the replies. For anyone watching this thread, This was discussed further at OpenStack TC meeting yesterday: https://meetings.opendev.org/meetings/tc/2025/tc.2025-09-09-17.00.log.txt An ehterpad for this issue was created as well: https://etherpad.opendev.org/p/gazpacho-collations-charsets Hope this helps keep everyone in the loop. Seunghun Lee Cloud Engineer, StackHPC On Mon, 8 Sept 2025 at 12:33, Sean Mooney <smooney@redhat.com> wrote:
On 06/09/2025 08:35, Dmitriy Rabotyagov wrote:
Also for a bit of context, this issue was actually discovered during OpenStack-Ansible's attempt to bump MariaDB version from 11.4 to 11.8 (which is the next LTS), which failed on Magnum deployment [1], as Lee mentioned.
this is something we have been aware of for a number of releases
historically openstack has thake the postion that the collation type and charset are not encoded in the db schema and determind by the defaults configured by the opeart when they install the db.
this was partly because they are not portabel actoss db backend i.e mysql vs postgrsql vs sqlite3
we have considerd if we should chang that or not in the past but if we were to do that and start encoding the expected coalation type in the schema that woudl require a datamigration of existign databases in some cases and it would also change api bevhior.
for the most part our api are intended to be case sensitive not insensive.
in some cases our python code will endup enforcing that and in outher case it does not and the behvior is determind by the db. case and point are metadata tag, image properties and flaovr extra sepcs. all tree are defiend to be case senstive and for standard extra specs or image properteis they will be ignored if you do not use lower case. wether tis valid to have "key" and "Key" as seperate keys for a flavor depend on your coalation type but form a api point of view those soudl be 2 seperate keys.
that also extend to mac adress of neutron port which are expect to be lowercase eventhoguh they are in hex and could be considerd equivlent.
so if we were to encode the expected charater set and coalation tyep would be |utf8mb4 and utf8mb4_bin;|
|the ||mb4 is imporant as we have never actully required only the 3 byte utf8 subset. |
the last time we talked abotu this properly in the nova project was durign the yoga ptg
https://etherpad.opendev.org/p/r.e70aa851abf8644c29c8abe4bce32b81#L284
at the time we did not want to force existing installation to adotp a new coalation type and charaterset but we did want to chang ethe defualt for new installations. we never got the time to fix this but we proably shoudl bring this up again at the ptg.
nova has some other techinial debth related to using signed 32bit integer primary keys that we can exaust so if we were to fix this we would probly also work on fixing that https://etherpad.opendev.org/p/r.22d95e20f7133350dcad573c015ed7da#L431
i wonder if we need a corss project session on this or not?
rene it might be good to add the db topics to nova ptg topic in anycase even if there isnt a wider topic to see if there is interest in finally solveing those schema issues.
So far we're planning to cover MariaDB collation inconsistency with [2], but one thing that anybody attempting to upgrade to MariaDB 11.8 today must be aware about, is that collation settings are not working as they used to for decades. This can potentially lead operators to collation compatibility issues if they're using any kind of "manual" deployments.
Some more context on this change I've laid out in a bug report to MariaDB [3].
I think we have actually multiple topics/sub-topics here in fact: * Usage of default charset and if we want to keep or change approach with it * Should we attempt to use default collation with selected charset or let operators handle this as it is today * Inform operators about potential issues with MariaDB 11.8 collations, especially on stable releases.
these are all valid question and partly what stalled out the topic on the nova side. i think in the longterm we shoudl specify the coalation type and charaterset in teh db schema and provde a way for admins to opt into it on upgrade. for the mysql farmialry of dbs we shoudl be using |utf8mb4 and utf8mb4_bin in my opion in most cases. although if we are specifyign it via the schema we can do it per colume if project decied some columns shoudl have a diffent default.|
PostgreSQL and sqlite3 dont really treat this the same as maradb but we woudl need to find equvalent to ensure they have overall the same behavior.
https://review.opendev.org/c/openstack/openstack-ansible-galera_server/+/959...
[3] https://jira.mariadb.org/browse/MDEV-37544
пт, 5 сент. 2025 г. в 19:46, Goutham Pacha Ravi <gouthampravi@gmail.com :
On Fri, Sep 5, 2025 at 8:38 AM Seunghun Lee <seunghun@stackhpc.com> wrote: > > Dear OpenStack developers > > I'm a Kolla contributor and was trying to bump MariaDB from 10.11 to 11.8 for Kolla and Kolla-Ansible projects. > During this, I found problems that can happen across the whole OpenStack services with MariaDB's default charset and collation set changes. > > Here's the note I made for each problem I found. > > Problems with default collation set change > > MariaDB changed default collation set of charset utf8mb3 from utf8mb3_general_ci to utf8mb3_uca1400_ai_ci from 11.5 https://jira.mariadb.org/browse/MDEV-25829 > > K-A currently sets database level charset to utf8 (Which is aliased to utf8mb3) and collation set to utf8_general_ci (aliased to utf8mb3_general_ci) > Problem detected when deploying fresh magnum on MariaDB 11.8: https://bugs.launchpad.net/magnum/+bug/2121797 > This is because how MariaDB works. If charset is not specified when creating a table, MariaDB uses database level charset and collation set (if set) or server level charset and collation set > If only charset is specified when creating a table, MariaDB uses MariaDB default (defined as a list of mapping in variable ``collation_connection``) > Magnum specified charset when creating *some* of tables not all -> Collation set inconsistency in magnum database. > > Server-wide collation set inconsistency (when DB and services are freshly deployed) > > Some services such as Nova, Keystone, Cinder, designate provides charset when creating tables (Also alters database level charset to utf8 in first alembic op) Let's call them group A > Some services such as Glance, Neutron, Octavia, Placement don't. Let's call them group B > Then we have Magnum which mixed up. Let's call it... Magnum > These differences cause database level collation set inconsistency between group A and group B services because of MariaDB's behaviour on creating a table and database > > Potential problem for MariaDB 10.11 to 11.8 upgrade > > Existing databases in MariaDB server have collation set of utf8mb3_general_ci > Because of MariaDB's new default collation set for utf8mb3 and how the table creation works, if any services decide to create a new table, they can be created with utf8mb3_uca1400_ai_ci and causes collation set inconsistency > This issue can be invisible at first but if any project decides to create new tables then do string comparison between the old tables and new tables, it will be visible > Potential solution A: Override MariaDB default for utf8mb3 to utf8mb3_general_ci for each service clients
https://mariadb.com/docs/server/reference/data-types/string-data-types/chara...
> > Pros: Easy > Cons: > > In fact, it doesn't work. Because there are slight differences on how each projects create tables, only some of them (Keystone, designate) are affected > Even if it worked, It's just delaying the problem. Looks like community pushed forward in the past: https://review.opendev.org/c/openstack/kolla-ansible/+/455154 > > Potential solution B: Change K-A's server level collation set config to utf8mb3_uca1400_ai_ci then migrate (ALTER TABLE) all tables to use new collation on upgrade > > Pros: We'll probably don't need to worry about this matter for a long time > Cons: > > Service disruption, inter-table relationships such as foreign keys make altering table hard. > > Percona-toolkit claims it can be done with near-zero downtime with FK handling strategies. Can be useful.
https://docs.percona.com/percona-toolkit/pt-online-schema-change.html
(GPLv2 license) > > Potential solution C: Use MariaDB 11.4 instead > > Pros: 11.4 is also an LTS but without this drama > Cons: Not the latest LTS > > Problems with default character set change > > MariaDB's new default charset utf8mb4 https://jira.mariadb.org/browse/MDEV-19123 > This is not urgent but MariaDB is planning to remove alias between utf8 and utf8mb3 https://jira.mariadb.org/browse/MDEV-30041 > Currently most of openstack projects use 'utf8' when specifying a character set not 'utf8mb3'. This will all need to be updated to 'utf8mb3'. > Or we can try moving on to utf8mb4 but ironic need to use utf8mb3 due to its internal design
https://docs.openstack.org/ironic/latest/install/install.html#set-up-the-dat...
> Transition to utf8mb4 would also need a migration plan > Using MariaDB 11.4 also helps here because this change is from 11.6 > > > I suggest to raise this topic on next OpenStack TC meeting. I think it's worth having a inter-project conversation as these will affect any projects that uses DB.
+1 Thanks for raising this issue. This was on the TC's agenda as a spillover * from the past week's meeting: https://wiki.openstack.org/wiki/Meetings/TechnicalCommittee
*
https://meetings.opendev.org/meetings/tc/2025/tc.2025-09-02-17.01.log.html#l...
> > What do people think about this? Any inputs are appreciated. > > Kind regards, > Seunghun Lee > Cloud Engineer, StackHPC