After rebuilding Queens clusters on Train, race condition causes Designate record creation to fail

Braden, Albert abraden at verisign.com
Mon Oct 11 18:48:07 UTC 2021


I think so. I see this:

ansible/roles/designate/templates/designate.conf.j2:backend_url = {{ redis_connection_string }}

ansible/group_vars/all.yml:redis_connection_string: "redis://{% for host in groups['redis'] %}{% if host == groups['redis'][0] %}admin:{{ redis_master_password }}@{{ 'api' | kolla_address(host) | put_address_in_context('url') }}:{{ redis_sentinel_port }}?sentinel=kolla{% else %}&sentinel_fallback={{ 'api' | kolla_address(host) | put_address_in_context('url') }}:{{ redis_sentinel_port }}{% endif %}{% endfor %}&db=0&socket_timeout=60&retry_on_timeout=yes"

Did anything with the distributed lock manager between Queens and Train?

-----Original Message-----
From: Michael Johnson <johnsomor at gmail.com> 
Sent: Monday, October 11, 2021 1:15 PM
To: Braden, Albert <abraden at verisign.com>
Cc: openstack-discuss at lists.openstack.org
Subject: [EXTERNAL] Re: After rebuilding Queens clusters on Train, race condition causes Designate record creation to fail

Caution: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. 

Hi Albert,

Have you configured your distributed lock manager for Designate?

[coordination]
backend_url = <DLM URL>

Michael

On Fri, Oct 8, 2021 at 7:38 PM Braden, Albert <abraden at verisign.com> wrote:
>
> Hello everyone. It’s great to be back working on OpenStack again. I’m at Verisign now. I can hardly describe how happy I am to have an employer that does not attach nonsense to the bottom of my emails!
>
>
>
> We are rebuilding our clusters from Queens to Train. On the new Train clusters, customers are complaining that deleting a VM and then immediately creating a new one with the same name (via Terraform for example) intermittently results in a missing DNS record. We can duplicate the issue by building a VM with terraform, tainting it, and applying.
>
>
>
> Before applying the change, we see the DNS record in the recordset:
>
>
>
> $ openstack recordset list dva3.vrsn.com.  --all |grep openstack-terra
>
> | f9aa73c1-84ba-4854-be71-cbb616de672c | 8d1c84082a044a53abe0d519ed9e8c60 | openstack-terra-test-host.dev-ostck.dva3.vrsn.com.        | A     | 10.220.4.89                                                           | ACTIVE | NONE   |
>
> $
>
>
>
> and we can pull it from the DNS server on the controllers:
>
>
>
> $ for i in {1..3}; do dig @dva3-ctrl${i}.cloud.vrsn.com -t axfr dva3.vrsn.com. |grep openstack-terra; done
>
> openstack-terra-test-host.dev-ostck.dva3.vrsn.com. 1 IN A 10.220.4.89
>
> openstack-terra-test-host.dev-ostck.dva3.vrsn.com. 1 IN A 10.220.4.89
>
> openstack-terra-test-host.dev-ostck.dva3.vrsn.com. 1 IN A 10.220.4.89
>
>
>
> After applying the change, we don’t see it:
>
>
>
> $ openstack recordset list dva3.vrsn.com.  --all |grep openstack-terra
>
> | f9aa73c1-84ba-4854-be71-cbb616de672c | 8d1c84082a044a53abe0d519ed9e8c60 | openstack-terra-test-host.dev-ostck.dva3.vrsn.com.        | A     | 10.220.4.89                                                           | ACTIVE | NONE   |
>
> $
>
> $ for i in {1..3}; do dig @dva3-ctrl${i}.cloud.vrsn.com -t axfr dva3.vrsn.com. |grep openstack-terra; done
>
> $ openstack recordset list dva3.vrsn.com.  --all |grep openstack-terra
>
> $
>
>
>
> We see this in the logs:
>
>
>
> 2021-10-09 01:53:44.307 27 ERROR oslo_messaging.notify.dispatcher oslo_db.exception.DBDuplicateEntry: (pymysql.err.IntegrityError) (1062, "Duplicate entry 'c70e693b4c47402db088c43a5a177134-openstack-terra-test-host.de...' for key 'unique_recordset'")
>
> 2021-10-09 01:53:44.307 27 ERROR oslo_messaging.notify.dispatcher [SQL: INSERT INTO recordsets (id, version, created_at, zone_shard, tenant_id, zone_id, name, type, ttl, reverse_name) VALUES (%(id)s, %(version)s, %(created_at)s, %(zone_shard)s, %(tenant_id)s, %(zone_id)s, %(name)s, %(type)s, %(ttl)s, %(reverse_name)s)]
>
> 2021-10-09 01:53:44.307 27 ERROR oslo_messaging.notify.dispatcher [parameters: {'id': 'dbbb904c347241a791aa01ca33a87b23', 'version': 1, 'created_at': datetime.datetime(2021, 10, 9, 1, 53, 44, 182652), 'zone_shard': 3184, 'tenant_id': '8d1c84082a044a53abe0d519ed9e8c60', 'zone_id': 'c70e693b4c47402db088c43a5a177134', 'name': 'openstack-terra-test-host.dev-ostck.dva3.vrsn.com.', 'type': 'A', 'ttl': None, 'reverse_name': '.moc.nsrv.3avd.kctso-ved.tsoh-tset-arret-kcatsnepo'}]
>
>
>
> It appears that Designate is trying to create the new record before the deletion of the old one finishes.
>
>
>
> Is anyone else seeing this on Train? The same set of actions doesn’t cause this error in Queens. Do we need to change something in our Designate config, to make it wait until the old records are finished deleting before attempting to create the new ones?


More information about the openstack-discuss mailing list