[designate] How to avoid NXDOMAIN or stale data during cold start of a (new) machine
Hello openstack-discuss, I have a designate setup using bind9 as the user-serving DNS server. When starting a machine with either very old or no zones at all, NXDOMAIN or other actually stale data is sent out to clients as designate is not done doing an initial full sync / reconciliation. * What is the "proper" way to tackle this cold-start issue and to keep the bind from serving wrong data? ** Did I miss on any options to handle this startup case? * What is the usual runtime for an initial sync that you observe in case the backend DNS server has no zones at all anymore? Regards Christian
Hi Christian, On startup, BIND9 will start sending SOA serial number queries for all of the zones it knows about. In the case of Designate, that means BIND9 will send out requests to the miniDNS instances to check if the serial number in Designate is newer than the one in BIND9. If the serial number in Designate is newer, BIND9 will initiate a zone transfer from the miniDNS in Designate. BIND9, by default, will do 20 SOA serial number queries at a time (less on older versions of BIND). See the serial-query-rate setting in the rate limiter knowledge base article[1]. The tuning knowledge base article[2] also discusses settings that can be adjusted for secondary servers that may also help speed up a cold startup. Off my head, I don't know of a way to tell BIND9 to not answer queries via rdnc or such. I usually block network access to a new BIND9 instance until the "rdnc status" shows the "soa queries in progress" and "xfers running" drop to 0 or a low number. Maybe others will have different approaches? As for runtime of a full resync in BIND9, that really depends on the number and size of the zones as well as the configuration settings I mentioned above. The performance of the host running the miniDNS instances and database will also have an impact. Michael [1] https://kb.isc.org/v1/docs/rate-limiters-for-authoritative-zone-propagation [2] https://kb.isc.org/docs/aa-00726#options-for-tuning-secondary-servers On Tue, May 10, 2022 at 2:02 AM Christian Rohmann <christian.rohmann@inovex.de> wrote:
Hello openstack-discuss,
I have a designate setup using bind9 as the user-serving DNS server.
When starting a machine with either very old or no zones at all, NXDOMAIN or other actually stale data is sent out to clients as designate is not done doing an initial full sync / reconciliation.
* What is the "proper" way to tackle this cold-start issue and to keep the bind from serving wrong data? ** Did I miss on any options to handle this startup case?
* What is the usual runtime for an initial sync that you observe in case the backend DNS server has no zones at all anymore?
Regards
Christian
Hey Michael, all, sorry for the late reply, just got around to playing with Designate and BIND9 again ... On 10/05/2022 19:04, Michael Johnson wrote:
On startup, BIND9 will start sending SOA serial number queries for all of the zones it knows about. In the case of Designate, that means BIND9 will send out requests to the miniDNS instances to check if the serial number in Designate is newer than the one in BIND9. If the serial number in Designate is newer, BIND9 will initiate a zone transfer from the miniDNS in Designate.
BIND9, by default, will do 20 SOA serial number queries at a time (less on older versions of BIND). See the serial-query-rate setting in the rate limiter knowledge base article[1].
The tuning knowledge base article[2] also discusses settings that can be adjusted for secondary servers that may also help speed up a cold startup.
Off my head, I don't know of a way to tell BIND9 to not answer queries via rdnc or such. I usually block network access to a new BIND9 instance until the "rdnc status" shows the "soa queries in progress" and "xfers running" drop to 0 or a low number.
Thanks for your thoughts and for distinguishing the sync of existing domains via AXFR/IXFR zone-transfers, because I was actually talking about an empty BIND9 in need to actually learn about zones. Maybe I was not clear about that on my initial post. But in the case of BIND9 not having a zone configured it requires input from designate before asking for the latest zone data from mDNS. Think about a freshly restored or a newly added server to the pool that requires to have ALL zones added and until then is not doing any IXFR / AXFR, but wrongfully sending NXDOMAIN. For new zones there is https://github.com/openstack/designate/blob/5d5d83e511acbf5d6f34e9a998ff16d9... which creates the zone (via rndc addzone - https://docs.openstack.org/designate/latest/admin/backends/bind9.html#bind9-...). and triggers mDNS. But I don't quite get how Designate would recognize that a zone is "missing" on one of the BIND9 servers in the pool. What would be the task that checks or determines that a server of the pool does not have some / all of the zones (anymore)? Is there no form of health check for backends to see if they have all zones? Like a simple count or comparison of the zone list? I'd like to form a readiness check to e.g. block all external queries until all zones exist. There is an open issues about this even: https://gitlab.isc.org/isc-projects/bind9/-/issues/3081 Or is there any way / designate-manage command to trigger this or a re-creating of all zone for a backend or a way to remove and re-add a BIND9 backend to have designate treat it as new and not configured to then do all those "rndc addzone" commands? Regards Christian
Hi Christian, Sorry I misunderstood the scenario. There are two ways zones can be resynced: 1. Using the "designate-manage pool update" command. This will force an update/recreate of all of the zones. 2. When a zone is in ERROR or PENDING for too long, the WorkerPeriodicRecovery task in producer will attempt to repair the zone. I don't think there is a periodic task that checks the BIND instances for missing content at the moment. Adding one would have to be done very carefully as it would be easy to get into race conditions when new zones are being created and deployed. Michael On Mon, Jun 6, 2022 at 9:59 AM Christian Rohmann <christian.rohmann@inovex.de> wrote:
Hey Michael, all,
sorry for the late reply, just got around to playing with Designate and BIND9 again ...
On 10/05/2022 19:04, Michael Johnson wrote:
On startup, BIND9 will start sending SOA serial number queries for all of the zones it knows about. In the case of Designate, that means BIND9 will send out requests to the miniDNS instances to check if the serial number in Designate is newer than the one in BIND9. If the serial number in Designate is newer, BIND9 will initiate a zone transfer from the miniDNS in Designate.
BIND9, by default, will do 20 SOA serial number queries at a time (less on older versions of BIND). See the serial-query-rate setting in the rate limiter knowledge base article[1].
The tuning knowledge base article[2] also discusses settings that can be adjusted for secondary servers that may also help speed up a cold startup.
Off my head, I don't know of a way to tell BIND9 to not answer queries via rdnc or such. I usually block network access to a new BIND9 instance until the "rdnc status" shows the "soa queries in progress" and "xfers running" drop to 0 or a low number.
Thanks for your thoughts and for distinguishing the sync of existing domains via AXFR/IXFR zone-transfers, because I was actually talking about an empty BIND9 in need to actually learn about zones. Maybe I was not clear about that on my initial post. But in the case of BIND9 not having a zone configured it requires input from designate before asking for the latest zone data from mDNS. Think about a freshly restored or a newly added server to the pool that requires to have ALL zones added and until then is not doing any IXFR / AXFR, but wrongfully sending NXDOMAIN.
For new zones there is https://github.com/openstack/designate/blob/5d5d83e511acbf5d6f34e9a998ff16d9... which creates the zone (via rndc addzone - https://docs.openstack.org/designate/latest/admin/backends/bind9.html#bind9-...). and triggers mDNS.
But I don't quite get how Designate would recognize that a zone is "missing" on one of the BIND9 servers in the pool. What would be the task that checks or determines that a server of the pool does not have some / all of the zones (anymore)? Is there no form of health check for backends to see if they have all zones? Like a simple count or comparison of the zone list? I'd like to form a readiness check to e.g. block all external queries until all zones exist. There is an open issues about this even: https://gitlab.isc.org/isc-projects/bind9/-/issues/3081
Or is there any way / designate-manage command to trigger this or a re-creating of all zone for a backend or a way to remove and re-add a BIND9 backend to have designate treat it as new and not configured to then do all those "rndc addzone" commands?
Regards
Christian
On 07/06/2022 02:04, Michael Johnson wrote:
There are two ways zones can be resynced: 1. Using the "designate-manage pool update" command. This will force an update/recreate of all of the zones. 2. When a zone is in ERROR or PENDING for too long, the WorkerPeriodicRecovery task in producer will attempt to repair the zone.
Thanks again for these details! I suppose there is no option for a targeted run of this "pool update" command to just hit one backend server? If you don't mind me asking yet another question ... what is the difference between running a regular backend or an agent backend? So https://github.com/openstack/designate/blob/master/designate/backend/impl_bi... vs. https://github.com/openstack/designate/blob/master/designate/backend/agent_b... Regards and thanks again, Christian
The agent based backends use an agent process to interact with the backend DNS server as opposed to directly talking to them over a management protocol. I wasn't part of the project when the agent based BIND9 backend was added, so I don't have the full context of why there is an agent for BIND9. I will note that all of the agent based backends are in the "untested" or "experimental" category: https://docs.openstack.org/designate/latest/admin/support-matrix.html Since the agent based drivers don't have many contributors now, I would expect to see more of them marked deprecated and removed in a future release. I would highly recommend you use the native backend if possible. Michael On Mon, Jun 6, 2022 at 11:54 PM Christian Rohmann <christian.rohmann@inovex.de> wrote:
On 07/06/2022 02:04, Michael Johnson wrote:
There are two ways zones can be resynced: 1. Using the "designate-manage pool update" command. This will force an update/recreate of all of the zones. 2. When a zone is in ERROR or PENDING for too long, the WorkerPeriodicRecovery task in producer will attempt to repair the zone.
Thanks again for these details! I suppose there is no option for a targeted run of this "pool update" command to just hit one backend server?
If you don't mind me asking yet another question ... what is the difference between running a regular backend or an agent backend? So https://github.com/openstack/designate/blob/master/designate/backend/impl_bi... vs. https://github.com/openstack/designate/blob/master/designate/backend/agent_b...
Regards and thanks again,
Christian
On 07/06/2022 02:04, Michael Johnson wrote:
There are two ways zones can be resynced: 1. Using the "designate-manage pool update" command. This will force an update/recreate of all of the zones. 2. When a zone is in ERROR or PENDING for too long, the WorkerPeriodicRecovery task in producer will attempt to repair the zone.
I don't think there is a periodic task that checks the BIND instances for missing content at the moment. Adding one would have to be done very carefully as it would be easy to get into race conditions when new zones are being created and deployed.
Just as an update: When playing with this issue of a cold start with no zones and "designate-manage pool update" no fixing it. We found that somebody just ran into the issue of (https://bugs.launchpad.net/designate/+bug/1958409/) and proposed a fix (rndc modzone -> rndc addzone). With this patch the "pool update" does cause all them missing zones to be created in a BIND instance that has either lost it's zones or has just been added to the pool. Regards Christian
Hello again, On 01/07/2022 09:10, Christian Rohmann wrote:
On 07/06/2022 02:04, Michael Johnson wrote:
There are two ways zones can be resynced: 1. Using the "designate-manage pool update" command. This will force an update/recreate of all of the zones. [...] When playing with this issue of a cold start with no zones and "designate-manage pool update" no fixing it. We found that somebody just ran into the issue of (https://bugs.launchpad.net/designate/+bug/1958409/) and proposed a fix (rndc modzone -> rndc addzone).
With this patch the "pool update" does cause all them missing zones to be created in a BIND instance that has either lost it's zones or has just been added to the pool.
yet another update on this "cold start" and "resync" of secondary nameserver topic: Since we really did not like the scaling of calling "rndc modzone" and "rndc addzone" for each and every zone of a pool and for every pool member we looked around for other solutions. We then ran into Catalog Zones (https://datatracker.ietf.org/doc/draft-ietf-dnsop-dns-catalog-zones/), supported by major DNS servers (BIND, NSD, Knot, PowerDNS, ...), which can provide just a list of zones to secondaries for their kind consideration and they shall then provision themselves. Shameless pointer to the spec I proposed to add support for catalog zones to Designate: https://review.opendev.org/c/openstack/designate-specs/+/849109 Regards Christian
participants (2)
-
Christian Rohmann
-
Michael Johnson