[neutron] [kolla-ansible] alembic mismatch, schema differences
Hi all, We've caused ourselves some bother with our neutron database, and I found the following thread instructive, so I hope we can get some help to resolve the bother. https://lists.openstack.org/pipermail/openstack-discuss/2022-November/031312... We're running kolla-ansible train (yes, upgrades planned). We wanted to test something with a wallaby container, and while mainly just pulling train versions of containers on a single compute node, we also changed openstack_release variable to wallaby in globals.yml - this was the mistake. Later that day, we ran kolla-ansible deploy --limit $HOST for a new compute node. We stopped at the following error, realising the above mistake. --- 2023-04-05 12:25:50.487 7 ERROR nova nova.exception.TooOldComputeService: Current Nova version does not support computes older than Victoria but the minimum compute service level in your cell is 40 and the oldest supported service level is 52. 2023-04-05 12:25:50.487 7 ERROR nova --- Now when we deploy a node it gets to task "Running Neutron bootstrap container" and gives error: ---- "alembic.script.revision.ResolutionError: No such revision or branch '6135a7bd4425'" ---- Using insight from the above list thread, I fetched DB entries: MariaDB [neutron]> select * from alembic_version; +--------------+ | version_num | +--------------+ | 5c85685d616d | | 6135a7bd4425 | +--------------+ Running the same from the test cluster which was deployed in the same way: MariaDB [neutron]> select * from alembic_version; +--------------+ | version_num | +--------------+ | 5c85685d616d | | c613d0b82681 | +--------------+ Also, comparing `neutron-db-manage history --verbose` across the two clusters, output is the same in both test and prod, except for the order of 2 lines: ---- -Branches into: 56893333aa52, 30018084ed99 -Branches into: c40fbb377ad, 67c8e8d61d5 +Branches into: 30018084ed99, 56893333aa52 +Branches into: 67c8e8d61d5, c40fbb377ad ---- Finally, I looked at the schema of the neutron database across the two. There are significant differences. for i in `mysql neutron -bNe "show tables;"`; do echo "TABLE: $i"; mysql neutron -e "desc $i;"; done > prod-neutron-descs Repeat for test. $ cat prod-neutron-descs | nc termbin.com 9999 https://termbin.com/8ppi $ cat test-neutron-descs | nc termbin.com 9999 https://termbin.com/cfqz Our cluster and Neutron in particular is operating fine, but we can't deploy new nodes to it. May I have some help on how to resolve this? Thank you, Greg.
Hi, I don't really have more input than what I reported in the mentioned thread. I'm also not familiar with kolla-ansible, unfortunately. But I would check for files with the non-existing revision number on the control node(s)? Something like this depending on your distro could reveal something: # find /usr/lib/python3/dist-packages/neutron/db/migration/alembic_migrations/versions/ -name *6135a7bd4425* If there are files that don't belong there remove them and retry the neutron-db-manage commands with the correct revisions in the table. But if you're using containers I'd be curious how those files would get there... Does a release change in the globals.yaml also redeploy control containers? But on the other hand, you seem to have deployed only a compute node with wallaby? I hope someone with kolla-ansible experience can chime in here. Sorry if I can't be more helpful... Eugen Zitat von Gregory Orange <gregory.orange@pawsey.org.au>:
Hi all,
We've caused ourselves some bother with our neutron database, and I found the following thread instructive, so I hope we can get some help to resolve the bother. https://lists.openstack.org/pipermail/openstack-discuss/2022-November/031312...
We're running kolla-ansible train (yes, upgrades planned). We wanted to test something with a wallaby container, and while mainly just pulling train versions of containers on a single compute node, we also changed openstack_release variable to wallaby in globals.yml - this was the mistake.
Later that day, we ran kolla-ansible deploy --limit $HOST for a new compute node. We stopped at the following error, realising the above mistake. --- 2023-04-05 12:25:50.487 7 ERROR nova nova.exception.TooOldComputeService: Current Nova version does not support computes older than Victoria but the minimum compute service level in your cell is 40 and the oldest supported service level is 52. 2023-04-05 12:25:50.487 7 ERROR nova ---
Now when we deploy a node it gets to task "Running Neutron bootstrap container" and gives error: ---- "alembic.script.revision.ResolutionError: No such revision or branch '6135a7bd4425'" ----
Using insight from the above list thread, I fetched DB entries: MariaDB [neutron]> select * from alembic_version; +--------------+ | version_num | +--------------+ | 5c85685d616d | | 6135a7bd4425 | +--------------+
Running the same from the test cluster which was deployed in the same way: MariaDB [neutron]> select * from alembic_version; +--------------+ | version_num | +--------------+ | 5c85685d616d | | c613d0b82681 | +--------------+
Also, comparing `neutron-db-manage history --verbose` across the two clusters, output is the same in both test and prod, except for the order of 2 lines: ---- -Branches into: 56893333aa52, 30018084ed99 -Branches into: c40fbb377ad, 67c8e8d61d5 +Branches into: 30018084ed99, 56893333aa52 +Branches into: 67c8e8d61d5, c40fbb377ad ----
Finally, I looked at the schema of the neutron database across the two. There are significant differences. for i in `mysql neutron -bNe "show tables;"`; do echo "TABLE: $i"; mysql neutron -e "desc $i;"; done > prod-neutron-descs Repeat for test. $ cat prod-neutron-descs | nc termbin.com 9999 https://termbin.com/8ppi $ cat test-neutron-descs | nc termbin.com 9999 https://termbin.com/cfqz
Our cluster and Neutron in particular is operating fine, but we can't deploy new nodes to it.
May I have some help on how to resolve this?
Thank you, Greg.
On 11/4/23 17:16, Eugen Block wrote:
I don't really have more input than what I reported in the mentioned thread. I'm also not familiar with kolla-ansible, unfortunately. But I would check for files with the non-existing revision number on the control node(s)? Something like this depending on your distro could reveal something:
# find /usr/lib/python3/dist-packages/neutron/db/migration/alembic_migrations/versions/ -name *6135a7bd4425*
Two matches in the same layer of the same Docker volume: /usr/lib/python3/dist-packages/neutron/db/migration/alembic_migrations/versions/wallaby/expand/__pycache__/6135a7bd4425_add_rbac_support_for_address_group.cpython-38.pyc /usr/lib/python3/dist-packages/neutron/db/migration/alembic_migrations/versions/wallaby/expand/6135a7bd4425_add_rbac_support_for_address_group.py I have no time to look further right now though.
If there are files that don't belong there remove them and retry the neutron-db-manage commands with the correct revisions in the table. But if you're using containers I'd be curious how those files would get there... Does a release change in the globals.yaml also redeploy control containers? But on the other hand, you seem to have deployed only a compute node with wallaby? I hope someone with kolla-ansible experience can chime in here.
It only deployed wallaby containers to the compute node, but a neutron container on that node then appears to have run `neutron-db-manage upgrade` and changed the database schema.
Sorry if I can't be more helpful...
No problem, thank you for the attempt. We managed to revert the changes - I'll post separately.
Solution: Manually revert the database schema changes. We found that `neutron-db-manage downgrade` has been unsupported for quite some time, so that avenue was closed to us. So, we restored backups of before and after to some temporary mariadb servers, to look at and compare the data. In the end this is how we reverted it (mysql < revert_wallaby_schema.sql): use neutron alter table address_scopes alter column shared drop default; alter table meteringlabelrules drop column source_ip_prefix; alter table meteringlabelrules drop column destination_ip_prefix; alter table ml2_vlan_allocations drop constraint check_ml2_vlan_allocations0vlan_id; alter table networkdhcpagentbindings drop index uniq_network_dhcp_agent_binding0network_id0binding_index0; alter table networkdhcpagentbindings drop column binding_index; alter table networks modify mtu int(11) null default null; alter table portforwardings drop constraint portforwardings_ibfk_3; alter table portforwardings drop index uniq_portforwardings0standard_attr_id; alter table portforwardings drop column standard_attr_id; alter table portuplinkstatuspropagation alter column propagate_uplink_status set default 0; alter table quotas drop index uniq_quotas0project_id0resource; alter table securitygroups drop constraint CONSTRAINT_1; alter table securitygroups drop column stateful; alter table securitygrouprules drop constraint securitygrouprules_ibfk_4; alter table securitygrouprules drop column remote_address_group_id; alter table subnetpools alter column shared drop default; alter table subnets drop constraint CONSTRAINT_1; alter table subnets drop column in_use; update alembic_version set version_num='c613d0b82681' where version_num='6135a7bd4425'; drop table address_associations; drop table addressgrouprbacs; drop table address_groups; drop table addressscoperbacs; drop table dvr_fip_gateway_port_network; drop table network_subnet_lock; drop table ovn_hash_ring; drop table ovn_revision_numbers; drop table portdeviceprofiles; drop table portnumaaffinitypolicies; drop table subnet_dns_publish_fixed_ips; drop table subnetpoolrbacs; And everything is fine again. Greg.
Hi, I don't think pyc is problem, is it same file but byte compiled. 1. .py: This is normally the input source code that you've written. 2. .pyc: This is the compiled bytecode. If you import a module, python will build a *.pyc file that contains the bytecode to make importing it again later easier (and faster). Two matches in the same layer of the same Docker volume: /usr/lib/python3/dist-packages/neutron/db/migration/alembic_migrations/versions/wallaby/expand/__pycache__/6135a7bd4425_add_rbac_support_for_address_group.cpython-38.pyc /usr/lib/python3/dist-packages/neutron/db/migration/alembic_migrations/versions/wallaby/expand/6135a7bd4425_add_rbac_support_for_address_group.py Michal Arbet Openstack Engineer Ultimum Technologies a.s. Na Poříčí 1047/26, 11000 Praha 1 Czech Republic +420 604 228 897 michal.arbet@ultimum.io *https://ultimum.io <https://ultimum.io/>* LinkedIn <https://www.linkedin.com/company/ultimum-technologies> | Twitter <https://twitter.com/ultimumtech> | Facebook <https://www.facebook.com/ultimumtechnologies/timeline> po 17. 4. 2023 v 8:04 odesílatel Gregory Orange < gregory.orange@pawsey.org.au> napsal:
Solution: Manually revert the database schema changes.
We found that `neutron-db-manage downgrade` has been unsupported for quite some time, so that avenue was closed to us. So, we restored backups of before and after to some temporary mariadb servers, to look at and compare the data. In the end this is how we reverted it (mysql < revert_wallaby_schema.sql):
use neutron
alter table address_scopes alter column shared drop default; alter table meteringlabelrules drop column source_ip_prefix; alter table meteringlabelrules drop column destination_ip_prefix; alter table ml2_vlan_allocations drop constraint check_ml2_vlan_allocations0vlan_id; alter table networkdhcpagentbindings drop index uniq_network_dhcp_agent_binding0network_id0binding_index0; alter table networkdhcpagentbindings drop column binding_index; alter table networks modify mtu int(11) null default null; alter table portforwardings drop constraint portforwardings_ibfk_3; alter table portforwardings drop index uniq_portforwardings0standard_attr_id; alter table portforwardings drop column standard_attr_id; alter table portuplinkstatuspropagation alter column propagate_uplink_status set default 0; alter table quotas drop index uniq_quotas0project_id0resource; alter table securitygroups drop constraint CONSTRAINT_1; alter table securitygroups drop column stateful; alter table securitygrouprules drop constraint securitygrouprules_ibfk_4; alter table securitygrouprules drop column remote_address_group_id; alter table subnetpools alter column shared drop default; alter table subnets drop constraint CONSTRAINT_1; alter table subnets drop column in_use;
update alembic_version set version_num='c613d0b82681' where version_num='6135a7bd4425';
drop table address_associations; drop table addressgrouprbacs; drop table address_groups; drop table addressscoperbacs; drop table dvr_fip_gateway_port_network; drop table network_subnet_lock; drop table ovn_hash_ring; drop table ovn_revision_numbers; drop table portdeviceprofiles; drop table portnumaaffinitypolicies; drop table subnet_dns_publish_fixed_ips; drop table subnetpoolrbacs;
And everything is fine again.
Greg.
Hi, I'm glad you worked that out. But one thing I'm wondering since I'm not familiar with kolla (as I'm stating repeatedly), why does the compute node have the capabilities to manipulate the neutron database? Is that by design? We have our own deployment mechanism on baremetal and there's no "connection =..." string available on compute nodes. Maybe I misunderstood something, but I find it strange that such a thing could happen that easily. Thanks, Eugen Zitat von Gregory Orange <gregory.orange@pawsey.org.au>:
Solution: Manually revert the database schema changes.
We found that `neutron-db-manage downgrade` has been unsupported for quite some time, so that avenue was closed to us. So, we restored backups of before and after to some temporary mariadb servers, to look at and compare the data. In the end this is how we reverted it (mysql < revert_wallaby_schema.sql):
use neutron
alter table address_scopes alter column shared drop default; alter table meteringlabelrules drop column source_ip_prefix; alter table meteringlabelrules drop column destination_ip_prefix; alter table ml2_vlan_allocations drop constraint check_ml2_vlan_allocations0vlan_id; alter table networkdhcpagentbindings drop index uniq_network_dhcp_agent_binding0network_id0binding_index0; alter table networkdhcpagentbindings drop column binding_index; alter table networks modify mtu int(11) null default null; alter table portforwardings drop constraint portforwardings_ibfk_3; alter table portforwardings drop index uniq_portforwardings0standard_attr_id; alter table portforwardings drop column standard_attr_id; alter table portuplinkstatuspropagation alter column propagate_uplink_status set default 0; alter table quotas drop index uniq_quotas0project_id0resource; alter table securitygroups drop constraint CONSTRAINT_1; alter table securitygroups drop column stateful; alter table securitygrouprules drop constraint securitygrouprules_ibfk_4; alter table securitygrouprules drop column remote_address_group_id; alter table subnetpools alter column shared drop default; alter table subnets drop constraint CONSTRAINT_1; alter table subnets drop column in_use;
update alembic_version set version_num='c613d0b82681' where version_num='6135a7bd4425';
drop table address_associations; drop table addressgrouprbacs; drop table address_groups; drop table addressscoperbacs; drop table dvr_fip_gateway_port_network; drop table network_subnet_lock; drop table ovn_hash_ring; drop table ovn_revision_numbers; drop table portdeviceprofiles; drop table portnumaaffinitypolicies; drop table subnet_dns_publish_fixed_ips; drop table subnetpoolrbacs;
And everything is fine again.
Greg.
Hi, Dnia wtorek, 18 kwietnia 2023 13:37:12 CEST Eugen Block pisze:
Hi, I'm glad you worked that out. But one thing I'm wondering since I'm not familiar with kolla (as I'm stating repeatedly), why does the compute node have the capabilities to manipulate the neutron database? Is that by design?
It shouldn't. Only neutron-server should have access to the DB.
We have our own deployment mechanism on baremetal and there's no "connection =..." string available on compute nodes. Maybe I misunderstood something, but I find it strange that such a thing could happen that easily.
Thanks, Eugen
Zitat von Gregory Orange <gregory.orange@pawsey.org.au>:
Solution: Manually revert the database schema changes.
We found that `neutron-db-manage downgrade` has been unsupported for quite some time, so that avenue was closed to us. So, we restored backups of before and after to some temporary mariadb servers, to look at and compare the data. In the end this is how we reverted it (mysql < revert_wallaby_schema.sql):
use neutron
alter table address_scopes alter column shared drop default; alter table meteringlabelrules drop column source_ip_prefix; alter table meteringlabelrules drop column destination_ip_prefix; alter table ml2_vlan_allocations drop constraint check_ml2_vlan_allocations0vlan_id; alter table networkdhcpagentbindings drop index uniq_network_dhcp_agent_binding0network_id0binding_index0; alter table networkdhcpagentbindings drop column binding_index; alter table networks modify mtu int(11) null default null; alter table portforwardings drop constraint portforwardings_ibfk_3; alter table portforwardings drop index uniq_portforwardings0standard_attr_id; alter table portforwardings drop column standard_attr_id; alter table portuplinkstatuspropagation alter column propagate_uplink_status set default 0; alter table quotas drop index uniq_quotas0project_id0resource; alter table securitygroups drop constraint CONSTRAINT_1; alter table securitygroups drop column stateful; alter table securitygrouprules drop constraint securitygrouprules_ibfk_4; alter table securitygrouprules drop column remote_address_group_id; alter table subnetpools alter column shared drop default; alter table subnets drop constraint CONSTRAINT_1; alter table subnets drop column in_use;
update alembic_version set version_num='c613d0b82681' where version_num='6135a7bd4425';
drop table address_associations; drop table addressgrouprbacs; drop table address_groups; drop table addressscoperbacs; drop table dvr_fip_gateway_port_network; drop table network_subnet_lock; drop table ovn_hash_ring; drop table ovn_revision_numbers; drop table portdeviceprofiles; drop table portnumaaffinitypolicies; drop table subnet_dns_publish_fixed_ips; drop table subnetpoolrbacs;
And everything is fine again.
Greg.
-- Slawek Kaplonski Principal Software Engineer Red Hat
Yeah, thanks for the confirmation, I agree. Zitat von Slawek Kaplonski <skaplons@redhat.com>:
Hi,
Dnia wtorek, 18 kwietnia 2023 13:37:12 CEST Eugen Block pisze:
Hi, I'm glad you worked that out. But one thing I'm wondering since I'm not familiar with kolla (as I'm stating repeatedly), why does the compute node have the capabilities to manipulate the neutron database? Is that by design?
It shouldn't. Only neutron-server should have access to the DB.
We have our own deployment mechanism on baremetal and there's no "connection =..." string available on compute nodes. Maybe I misunderstood something, but I find it strange that such a thing could happen that easily.
Thanks, Eugen
Zitat von Gregory Orange <gregory.orange@pawsey.org.au>:
Solution: Manually revert the database schema changes.
We found that `neutron-db-manage downgrade` has been unsupported for quite some time, so that avenue was closed to us. So, we restored backups of before and after to some temporary mariadb servers, to look at and compare the data. In the end this is how we reverted it (mysql < revert_wallaby_schema.sql):
use neutron
alter table address_scopes alter column shared drop default; alter table meteringlabelrules drop column source_ip_prefix; alter table meteringlabelrules drop column destination_ip_prefix; alter table ml2_vlan_allocations drop constraint check_ml2_vlan_allocations0vlan_id; alter table networkdhcpagentbindings drop index uniq_network_dhcp_agent_binding0network_id0binding_index0; alter table networkdhcpagentbindings drop column binding_index; alter table networks modify mtu int(11) null default null; alter table portforwardings drop constraint portforwardings_ibfk_3; alter table portforwardings drop index uniq_portforwardings0standard_attr_id; alter table portforwardings drop column standard_attr_id; alter table portuplinkstatuspropagation alter column propagate_uplink_status set default 0; alter table quotas drop index uniq_quotas0project_id0resource; alter table securitygroups drop constraint CONSTRAINT_1; alter table securitygroups drop column stateful; alter table securitygrouprules drop constraint securitygrouprules_ibfk_4; alter table securitygrouprules drop column remote_address_group_id; alter table subnetpools alter column shared drop default; alter table subnets drop constraint CONSTRAINT_1; alter table subnets drop column in_use;
update alembic_version set version_num='c613d0b82681' where version_num='6135a7bd4425';
drop table address_associations; drop table addressgrouprbacs; drop table address_groups; drop table addressscoperbacs; drop table dvr_fip_gateway_port_network; drop table network_subnet_lock; drop table ovn_hash_ring; drop table ovn_revision_numbers; drop table portdeviceprofiles; drop table portnumaaffinitypolicies; drop table subnet_dns_publish_fixed_ips; drop table subnetpoolrbacs;
And everything is fine again.
Greg.
-- Slawek Kaplonski Principal Software Engineer Red Hat
On 18/4/23 20:56, Slawek Kaplonski wrote:
Dnia wtorek, 18 kwietnia 2023 13:37:12 CEST Eugen Block pisze:
Hi, I'm glad you worked that out. But one thing I'm wondering since I'm not familiar with kolla (as I'm stating repeatedly), why does the compute node have the capabilities to manipulate the neutron database? Is that by design? It shouldn't. Only neutron-server should have access to the DB.
A question for kolla devs then I guess. Thank you for identifying this concern.
We have our own deployment mechanism on baremetal and there's no "connection =..." string available on compute nodes. Maybe I misunderstood something, but I find it strange that such a thing could happen that easily.
The following files have the setting on our compute nodes. We haven't customised this setting, so its presence wasn't caused by us per se. # ack -l "^connection =" /etc/kolla /etc/kolla/neutron-openvswitch-agent/neutron.conf /etc/kolla/neutron-metadata-agent/neutron.conf /etc/kolla/neutron-l3-agent/neutron.conf
participants (4)
-
Eugen Block
-
Gregory Orange
-
Michal Arbet
-
Slawek Kaplonski