<div>                Yes, we have 3 controllers per region. Theoretically we could write some TF code that would wait for the deletions to finish before rebuilding; the hard part would be getting our customers to deploy it. For them TF is just a thing that builds servers so that they can work, and asking them to change it would be a heavy burden. I'm hoping to find a way to fix it in Openstack.<br>            </div>            <div class="yahoo_quoted" style="margin:10px 0px 0px 0.8ex;border-left:1px solid #ccc;padding-left:1ex;">                        <div style="font-family:'Helvetica Neue', Helvetica, Arial, sans-serif;font-size:13px;color:#26282a;">                                <div>                    On Thursday, February 16, 2023, 03:14:30 PM EST, Eugen Block <eblock@nde.ag> wrote:                </div>                <div><br></div>                <div><br></div>                <div>I wonder if it’s the same (or similar) issue I asked about in November  <br>[1]. Do you have a HA cloud with multiple control nodes? One of our  <br>customers also uses terraform to deploy clusters and they have to  <br>enable a sleep between destroy and create commands, otherwise a wrong  <br>(deleted) project ID will be applied. We figured out it was the  <br>keystone role cache but still haven’t found a way to achieve both a  <br>reasonable performance (tried different cache settings) and quicker  <br>terraform redeployments.<br><br>[1]  <br><a href="https://lists.openstack.org/pipermail/openstack-discuss/2022-November/031122.html" target="_blank">https://lists.openstack.org/pipermail/openstack-discuss/2022-November/031122.html</a><br><br><br>Zitat von Mohammed Naser <<a ymailto="mailto:mnaser@vexxhost.com" href="mailto:mnaser@vexxhost.com">mnaser@vexxhost.com</a>>:<br><br>> On Thu, Feb 16, 2023 at 12:57 PM Albert Braden <<a ymailto="mailto:ozzzo@yahoo.com" href="mailto:ozzzo@yahoo.com">ozzzo@yahoo.com</a>> wrote:<br>><br>>> We have customers who use Terraform to build their clusters. They do a<br>>> thing that they call “repave” where they run an ansible playbook that calls<br>>> “terraform destroy” and then immediately calls “terraform apply” to rebuild<br>>> the cluster. It looks like Designate is not able to keep up, and it fails<br>>> too delete one or more of the DNS records. We have 3 records, IPv4 forward<br>>> (A) and reverse (PTR) and IPv6 forward (AAAA).<br>>><br>>> When Designate fails to delete a record, it becomes orphaned. On the next<br>>> “repave” the record is not deleted, because it’s not associated with the<br>>> new VM, and we see errors in designate-sink.log:<br>>><br>>> 2023-02-13 02:49:40.824 27 ERROR oslo_messaging.notify.dispatcher<br>>> [parameters: {'id': '1282a6780f2f493c81ed20bc62ef370f', 'version': 1,<br>>> 'created_at': datetime.datetime(2023, 2, 13, 2, 49, 40, 814726),<br>>> 'zone_shard': 97, 'tenant_id': '130b797392d24b408e73c2be545d0a20',<br>>> 'zone_id': '0616b8e0852540e59fd383cfb678af32', 'recordset_id':<br>>> '1fc5a9eaea824d0f8b53eb91ea9ff6e2', 'data': '10.22.0.210', 'hash':<br>>> 'e3270256501fceb97a14d4133d394880', 'managed': 1, 'managed_plugin_type':<br>>> 'handler', 'managed_plugin_name': 'our_nova_fixed',<br>>> 'managed_resource_type': 'instance', 'managed_resource_id':<br>>> '842833cb9410404bbd5009eb6e0bf90a', 'status': 'PENDING', 'action':<br>>> 'UPDATE', 'serial': 1676256582}]<br>>> …<br>>> 2023-02-13 02:49:40.824 27 ERROR oslo_messaging.notify.dispatcher<br>>> designate.exceptions.DuplicateRecord: Duplicate Record<br>>><br>>> The orphaned record is causing a mariadb collision because a record with<br>>> that name and IP already exists. When this happens with an IPv6 record, it<br>>> looks like Designate tries to create the IPv6 record, and fails, and then<br>>> does not try to create an IPv4 record, which causes trouble because<br>>> Terraform waits for the name resolution to work.<br>>><br>>> The obvious solution is to tell TF users to introduce a delay between<br>>> “destroy” and “apply” but that would be non-trivial for them, and we would<br>>> prefer to fix it on our end. What can I do, to make Designate gracefully<br>>> manage cases where a cluster is deleted and then immediately rebuilt with<br>>> the same names and IPs? Also, how can I clean up these orphaned records.<br>>> I’ve been asking the customer to destroy, and then deleting the record, and<br>>> then asking them to rebuild, but that is a manual process for them. Is it<br>>> possible to link the orphaned record to the new VM so that it will be<br>>> deleted on the next “repave?”<br>>><br>><br>> or perhaps the Terraform module should wait until the resource is fully<br>> gone in case the delete is actually asynchronus? same way that a VM delete<br>> is asynchronus<br>><br>><br>>> Example:<br>>><br>>> This VM was built today:<br>>> $ os server show f5e75688-5fa9-41b6-876f-289e0ebc04b9|grep launched_at<br>>> | OS-SRV-USG:launched_at              | 2023-02-16T02:48:49.000000<br>>><br>>> The A record was created in January:<br>>> $ os recordset show 0616b8e0852540e59fd383cfb678af32<br>>> 1fc5a9ea-ea82-4d0f-8b53-eb91ea9ff6e2|grep created_at<br>>> | created_at  | 2023-01-25T02:48:52.000000           |<br>>><br>>><br>><br>> --<br>> Mohammed Naser<br>> VEXXHOST, Inc.<br><br><br><br><br></div>            </div>                </div>