After rebuilding Queens clusters on Train, race condition causes Designate record creation to fail

8 Oct 2021

      Hello everyone. It's great to be back working on OpenStack again. I'm at Verisign now. I can hardly describe how happy I am to have an employer that does not attach nonsense to the bottom of my emails!

We are rebuilding our clusters from Queens to Train. On the new Train clusters, customers are complaining that deleting a VM and then immediately creating a new one with the same name (via Terraform for example) intermittently results in a missing DNS record. We can duplicate the issue by building a VM with terraform, tainting it, and applying.

Before applying the change, we see the DNS record in the recordset:

$ openstack recordset list dva3.vrsn.com.  --all |grep openstack-terra

| f9aa73c1-84ba-4854-be71-cbb616de672c | 8d1c84082a044a53abe0d519ed9e8c60 | openstack-terra-test-host.dev-ostck.dva3.vrsn.com.        | A     | 10.220.4.89                                                           | ACTIVE | NONE   |

$

and we can pull it from the DNS server on the controllers:

$ for i in {1..3}; do dig @dva3-ctrl${i}.cloud.vrsn.com -t axfr dva3.vrsn.com. |grep openstack-terra; done

openstack-terra-test-host.dev-ostck.dva3.vrsn.com. 1 IN A 10.220.4.89

openstack-terra-test-host.dev-ostck.dva3.vrsn.com. 1 IN A 10.220.4.89

openstack-terra-test-host.dev-ostck.dva3.vrsn.com. 1 IN A 10.220.4.89

After applying the change, we don't see it:

$ openstack recordset list dva3.vrsn.com.  --all |grep openstack-terra

| f9aa73c1-84ba-4854-be71-cbb616de672c | 8d1c84082a044a53abe0d519ed9e8c60 | openstack-terra-test-host.dev-ostck.dva3.vrsn.com.        | A     | 10.220.4.89                                                           | ACTIVE | NONE   |

$

$ for i in {1..3}; do dig @dva3-ctrl${i}.cloud.vrsn.com -t axfr dva3.vrsn.com. |grep openstack-terra; done

$ openstack recordset list dva3.vrsn.com.  --all |grep openstack-terra

$

We see this in the logs:

2021-10-09 01:53:44.307 27 ERROR oslo_messaging.notify.dispatcher oslo_db.exception.DBDuplicateEntry: (pymysql.err.IntegrityError) (1062, "Duplicate entry 'c70e693b4c47402db088c43a5a177134-openstack-terra-test-host.de...' for key 'unique_recordset'")

2021-10-09 01:53:44.307 27 ERROR oslo_messaging.notify.dispatcher [SQL: INSERT INTO recordsets (id, version, created_at, zone_shard, tenant_id, zone_id, name, type, ttl, reverse_name) VALUES (%(id)s, %(version)s, %(created_at)s, %(zone_shard)s, %(tenant_id)s, %(zone_id)s, %(name)s, %(type)s, %(ttl)s, %(reverse_name)s)]

2021-10-09 01:53:44.307 27 ERROR oslo_messaging.notify.dispatcher [parameters: {'id': 'dbbb904c347241a791aa01ca33a87b23', 'version': 1, 'created_at': datetime.datetime(2021, 10, 9, 1, 53, 44, 182652), 'zone_shard': 3184, 'tenant_id': '8d1c84082a044a53abe0d519ed9e8c60', 'zone_id': 'c70e693b4c47402db088c43a5a177134', 'name': 'openstack-terra-test-host.dev-ostck.dva3.vrsn.com.', 'type': 'A', 'ttl': None, 'reverse_name': '.moc.nsrv.3avd.kctso-ved.tsoh-tset-arret-kcatsnepo'}]

It appears that Designate is trying to create the new record before the deletion of the old one finishes.

Is anyone else seeing this on Train? The same set of actions doesn't cause this error in Queens. Do we need to change something in our Designate config, to make it wait until the old records are finished deleting before attempting to create the new ones?

Braden, Albert

Michael Johnson

Braden, Albert

Braden, Albert

Michael Johnson

Braden, Albert

Michael Johnson

tags

participants (2)