[ironic] FFE: PXE boot retry
Hi folks, I would like to ask a late FFE for https://storyboard.openstack.org/#!/story/2005167 - retry PXE boot on timeout. Random PXE failures have long been haunting both our consumers and our CI. This change may be a big relief for everyone. Since we're very late in the cycle, and the change is on the critical path, I'm making it off by default (except for the CI) with the intend to reconsider it in Ussuri. The patch https://review.opendev.org/#/c/683127/ has been tested locally, I will finish unit tests and a release note tomorrow (CEST) morning. Please let me know if you have any concerns or questions. Dmitry
Hi Dmitry, I'm fine with the FFE; it'll have minimal impact (and low risk of failure when turned off). I think we need 2 cores to agree to 'sponsor' (ie review) the feature. I am not sure we should turn it on by default, but we can discuss that in Ussuri. (Or maybe it'll depends on what the default timeout value might be...) Oh. I'm ok with reviewing. When's the cut-off date by which this needs to land? --ruby On Thu, Sep 19, 2019 at 9:04 AM Dmitry Tantsur <dtantsur@redhat.com> wrote:
Hi folks,
I would like to ask a late FFE for https://storyboard.openstack.org/#!/story/2005167 - retry PXE boot on timeout. Random PXE failures have long been haunting both our consumers and our CI. This change may be a big relief for everyone. Since we're very late in the cycle, and the change is on the critical path, I'm making it off by default (except for the CI) with the intend to reconsider it in Ussuri. The patch https://review.opendev.org/#/c/683127/ has been tested locally, I will finish unit tests and a release note tomorrow (CEST) morning.
Please let me know if you have any concerns or questions.
Dmitry
I'm good with this change FFE. I've looked through most of the submitted patch and it LGTM thus far minus unit tests. As for Ruby's question, I think ASAP given CI issues. -Julia On Thu, Sep 19, 2019 at 7:17 AM Ruby Loo <opensrloo@gmail.com> wrote:
Hi Dmitry,
I'm fine with the FFE; it'll have minimal impact (and low risk of failure when turned off). I think we need 2 cores to agree to 'sponsor' (ie review) the feature.
I am not sure we should turn it on by default, but we can discuss that in Ussuri. (Or maybe it'll depends on what the default timeout value might be...)
Oh. I'm ok with reviewing.
When's the cut-off date by which this needs to land?
--ruby
On Thu, Sep 19, 2019 at 9:04 AM Dmitry Tantsur <dtantsur@redhat.com> wrote:
Hi folks,
I would like to ask a late FFE for https://storyboard.openstack.org/#!/story/2005167 - retry PXE boot on timeout. Random PXE failures have long been haunting both our consumers and our CI. This change may be a big relief for everyone. Since we're very late in the cycle, and the change is on the critical path, I'm making it off by default (except for the CI) with the intend to reconsider it in Ussuri. The patch https://review.opendev.org/#/c/683127/ has been tested locally, I will finish unit tests and a release note tomorrow (CEST) morning.
Please let me know if you have any concerns or questions.
Dmitry
I don't risk talking about timing given the CI state.. but early next week the latest. On Thu, Sep 19, 2019 at 4:11 PM Ruby Loo <opensrloo@gmail.com> wrote:
Hi Dmitry,
I'm fine with the FFE; it'll have minimal impact (and low risk of failure when turned off). I think we need 2 cores to agree to 'sponsor' (ie review) the feature.
I am not sure we should turn it on by default, but we can discuss that in Ussuri. (Or maybe it'll depends on what the default timeout value might be...)
Oh. I'm ok with reviewing.
When's the cut-off date by which this needs to land?
--ruby
On Thu, Sep 19, 2019 at 9:04 AM Dmitry Tantsur <dtantsur@redhat.com> wrote:
Hi folks,
I would like to ask a late FFE for https://storyboard.openstack.org/#!/story/2005167 - retry PXE boot on timeout. Random PXE failures have long been haunting both our consumers and our CI. This change may be a big relief for everyone. Since we're very late in the cycle, and the change is on the critical path, I'm making it off by default (except for the CI) with the intend to reconsider it in Ussuri. The patch https://review.opendev.org/#/c/683127/ has been tested locally, I will finish unit tests and a release note tomorrow (CEST) morning.
Please let me know if you have any concerns or questions.
Dmitry
participants (3)
-
Dmitry Tantsur
-
Julia Kreger
-
Ruby Loo