[kolla] [train] [designate] Terraform "repave" causes DNS records to become orphaned

Mohammed Naser mnaser at vexxhost.com
Thu Feb 16 18:22:47 UTC 2023


On Thu, Feb 16, 2023 at 12:57 PM Albert Braden <ozzzo at yahoo.com> wrote:

> We have customers who use Terraform to build their clusters. They do a
> thing that they call “repave” where they run an ansible playbook that calls
> “terraform destroy” and then immediately calls “terraform apply” to rebuild
> the cluster. It looks like Designate is not able to keep up, and it fails
> too delete one or more of the DNS records. We have 3 records, IPv4 forward
> (A) and reverse (PTR) and IPv6 forward (AAAA).
>
> When Designate fails to delete a record, it becomes orphaned. On the next
> “repave” the record is not deleted, because it’s not associated with the
> new VM, and we see errors in designate-sink.log:
>
> 2023-02-13 02:49:40.824 27 ERROR oslo_messaging.notify.dispatcher
> [parameters: {'id': '1282a6780f2f493c81ed20bc62ef370f', 'version': 1,
> 'created_at': datetime.datetime(2023, 2, 13, 2, 49, 40, 814726),
> 'zone_shard': 97, 'tenant_id': '130b797392d24b408e73c2be545d0a20',
> 'zone_id': '0616b8e0852540e59fd383cfb678af32', 'recordset_id':
> '1fc5a9eaea824d0f8b53eb91ea9ff6e2', 'data': '10.22.0.210', 'hash':
> 'e3270256501fceb97a14d4133d394880', 'managed': 1, 'managed_plugin_type':
> 'handler', 'managed_plugin_name': 'our_nova_fixed',
> 'managed_resource_type': 'instance', 'managed_resource_id':
> '842833cb9410404bbd5009eb6e0bf90a', 'status': 'PENDING', 'action':
> 'UPDATE', 'serial': 1676256582}]
>> 2023-02-13 02:49:40.824 27 ERROR oslo_messaging.notify.dispatcher
> designate.exceptions.DuplicateRecord: Duplicate Record
>
> The orphaned record is causing a mariadb collision because a record with
> that name and IP already exists. When this happens with an IPv6 record, it
> looks like Designate tries to create the IPv6 record, and fails, and then
> does not try to create an IPv4 record, which causes trouble because
> Terraform waits for the name resolution to work.
>
> The obvious solution is to tell TF users to introduce a delay between
> “destroy” and “apply” but that would be non-trivial for them, and we would
> prefer to fix it on our end. What can I do, to make Designate gracefully
> manage cases where a cluster is deleted and then immediately rebuilt with
> the same names and IPs? Also, how can I clean up these orphaned records.
> I’ve been asking the customer to destroy, and then deleting the record, and
> then asking them to rebuild, but that is a manual process for them. Is it
> possible to link the orphaned record to the new VM so that it will be
> deleted on the next “repave?”
>

or perhaps the Terraform module should wait until the resource is fully
gone in case the delete is actually asynchronus? same way that a VM delete
is asynchronus


> Example:
>
> This VM was built today:
> $ os server show f5e75688-5fa9-41b6-876f-289e0ebc04b9|grep launched_at
> | OS-SRV-USG:launched_at              | 2023-02-16T02:48:49.000000
>
> The A record was created in January:
> $ os recordset show 0616b8e0852540e59fd383cfb678af32
> 1fc5a9ea-ea82-4d0f-8b53-eb91ea9ff6e2|grep created_at
> | created_at  | 2023-01-25T02:48:52.000000           |
>
>

-- 
Mohammed Naser
VEXXHOST, Inc.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230216/25b6eecf/attachment.htm>


More information about the openstack-discuss mailing list