One other thing that comes to mind at 30 seconds is spanning-tree port forwarding delay. PXE boot often thinks once carrier is up, that it can try and send/receive packets, however switches may still block traffic waiting for spanning-tree packets. Just from a limiting possible issues, it might be a good thing to double check network side to make sure "portfast" is the operating mode for the physical ports attached to that flat network. What this would look like is the machine appears to DHCP, but the packets would never actually reach the DHCP server. -Julia On Tue, Oct 8, 2019 at 9:55 AM fsbiz@yahoo.com <fsbiz@yahoo.com> wrote:
Thanks Julia. We have set the port_setup_delay to 30.
# Delay value to wait for Neutron agents to setup sufficient # DHCP configuration for port. (integer value) # Minimum value: 0 port_setup_delay = 30
We're hoping that in the U cycle, we'll finally have things in place where neutron tells ironic that the port setup is done and that the machine can be powered-on, but not all the code made it during Train.
This would be perfect.
Fred.
On Tuesday, October 8, 2019, 09:32:44 AM PDT, Julia Kreger <juliaashleykreger@gmail.com> wrote:
While not necessarily direct scaling of that subnet, you may want to look at ironic.conf's [neutron]port_setup_delay option. The default value is zero seconds, but increasing that value will cause the process to pause a little longer to give time for the neutron agent configuration to update, as the agent may not even know about the configuration as there are multiple steps with-in neutron, by the time the baremetal machine tries to PXE boot. We're hoping that in the U cycle, we'll finally have things in place where neutron tells ironic that the port setup is done and that the machine can be powered-on, but not all the code made it during Train.
-Julia
On Tue, Oct 8, 2019 at 9:15 AM fsbiz@yahoo.com <fsbiz@yahoo.com> wrote:
Hi folks,
We have a rather large flat network consisting of over 300 ironic baremetal nodes and are constantly having the baremetals timing out during their PXE boot due to the dhcp agent not able to respond in time.
Looking for inputs on successful DHCP scaling techniques that would help mitigate this.
thanks, Fred.