[tripleo][ironic] fails to introspect: my fsm encountered an exception
Hi all, I have a clean undercloud deployment. but when I add a node, it fails to introspect. in ironic logs I see interesting lines about FSM (what is it?) [1] and in the same paste, I have provided/pasted, how openstack overcloud node import instack.json looks like. [1] http://paste.openstack.org/show/uPwWVYlO3UQbrF0WzDFH/ -- Ruslanas Gžibovskis +370 6030 7030
Hi all, Just in case. I have executed it with --debug [1]. [1] http://paste.openstack.org/show/zIHDZ4PS8d0Oi3fmB5AD/
Hi all, I have a clean undercloud deployment. but when I add a node, it fails to introspect. in ironic logs I see interesting lines about FSM (what is it?) [1] and in the same paste, I have provided/pasted, how openstack overcloud node import instack.json looks like. Also introspection -- provide with debug [2]. It do not change when I change driver (idrac, ipmi, redfish) or I specify exact host for introspection. here [3] is images and containers with ironic. Here is my latest instack file [4] Any ideas? What I could do? [1] http://paste.openstack.org/show/uPwWVYlO3UQbrF0WzDFH/ [2] http://paste.openstack.org/show/zIHDZ4PS8d0Oi3fmB5AD/ [3] http://paste.openstack.org/show/87pn8i1QGJj2JQyPEJbl/ [4] http://paste.openstack.org/show/a8UqmlI6yT6sd0503kVn/
If the code is off of master, you may want to refresh the ironic-inspector code. We (ironic) merged a patch in an attempt to fix an issue where people were using the tripleo set of tools to force reinspection, however that fix also apparently broke TripleO's ansible playbooks. For the record, fsm is short for Finite State Machine. The ironic-inspector logging seems to indicate an error is occurring in _do_inspection internally which is consistent with what some of TripleO's CI was encountering. On Tue, Sep 29, 2020 at 9:04 AM Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
Hi all,
I have a clean undercloud deployment. but when I add a node, it fails to introspect. in ironic logs I see interesting lines about FSM (what is it?) [1] and in the same paste, I have provided/pasted, how openstack overcloud node import instack.json looks like. Also introspection -- provide with debug [2]. It do not change when I change driver (idrac, ipmi, redfish) or I specify exact host for introspection. here [3] is images and containers with ironic. Here is my latest instack file [4]
Any ideas? What I could do?
[1] http://paste.openstack.org/show/uPwWVYlO3UQbrF0WzDFH/ [2] http://paste.openstack.org/show/zIHDZ4PS8d0Oi3fmB5AD/ [3] http://paste.openstack.org/show/87pn8i1QGJj2JQyPEJbl/ [4] http://paste.openstack.org/show/a8UqmlI6yT6sd0503kVn/
Hi Julia, If you could share, should I do this in container or on podman side? Which container? And git pull? I did podman image pull and all images. And it is 24 hours old now. If you could give a file or repo or anything to pull, I would appreciate it. Thank you On Tue, 29 Sep 2020, 19:42 Julia Kreger, <juliaashleykreger@gmail.com> wrote:
If the code is off of master, you may want to refresh the ironic-inspector code. We (ironic) merged a patch in an attempt to fix an issue where people were using the tripleo set of tools to force reinspection, however that fix also apparently broke TripleO's ansible playbooks.
For the record, fsm is short for Finite State Machine. The ironic-inspector logging seems to indicate an error is occurring in _do_inspection internally which is consistent with what some of TripleO's CI was encountering.
On Tue, Sep 29, 2020 at 9:04 AM Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
Hi all,
I have a clean undercloud deployment. but when I add a node, it fails to introspect. in ironic logs I see
interesting lines about FSM (what is it?) [1] and in the same paste, I have provided/pasted, how openstack overcloud node import instack.json looks like. Also introspection -- provide with debug [2].
It do not change when I change driver (idrac, ipmi, redfish) or I specify exact host for introspection. here [3] is images and containers with ironic. Here is my latest instack file [4]
Any ideas? What I could do?
[1] http://paste.openstack.org/show/uPwWVYlO3UQbrF0WzDFH/ [2] http://paste.openstack.org/show/zIHDZ4PS8d0Oi3fmB5AD/ [3] http://paste.openstack.org/show/87pn8i1QGJj2JQyPEJbl/ [4] http://paste.openstack.org/show/a8UqmlI6yT6sd0503kVn/
I am using default sources from docker.io/tripleou can be seen in [1] I still see the same errors [2], even the image was updated. I have rebuilt undercloud. and images are 18 hours old. if someone could help me to refresh ironic-inspector in other way, I could try to. but now if I exec -it into ironic_inspector container, I det ironic user and cannot login to root. If someone could waste some time from their life, and paste some links, how I could update it from the glorious master, I could help to test and will help later on ;) thank you for your time, for reading ;) [1] http://paste.openstack.org/show/87pn8i1QGJj2JQyPEJbl/ [2] http://paste.openstack.org/show/LX2h9qSvyJDw6VxwUXt5/
So interestingly enough, I can't seem to access that docker repository to look at the images. Something seems off, but it doesn't look related to the change that was reverted a couple days ago. Looking at the additional logging you provided, it look something it occuring out of order. So a few questions: 1) Was the ironic-inspector container started before or after the ironic-api container? It needs to start after for the dnsmasq filter driver to sync and operate. 2) Can you manually trigger inspection without the extra tripleo tools? "openstack baremetal node inspect <uuid>" I ask this specifically because I'm not seeing the post request in the logging your supplying, so I'm wondering if for some reason the wrong thing is happening in the tripleo tooling. When you manually inspect, you'll need to do "openstack baremetal node show <uuid> to see if the node exits inspect state. From there I'd check the logs again to see if the same error occurred or if things were successful. -Julia On Thu, Oct 1, 2020 at 2:59 AM Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
I am using default sources from docker.io/tripleou can be seen in [1] I still see the same errors [2], even the image was updated. I have rebuilt undercloud. and images are 18 hours old.
if someone could help me to refresh ironic-inspector in other way, I could try to. but now if I exec -it into ironic_inspector container, I det ironic user and cannot login to root.
If someone could waste some time from their life, and paste some links, how I could update it from the glorious master, I could help to test and will help later on ;)
thank you for your time, for reading ;)
[1] http://paste.openstack.org/show/87pn8i1QGJj2JQyPEJbl/ [2] http://paste.openstack.org/show/LX2h9qSvyJDw6VxwUXt5/
Hi Julia, 1) I think, podman ps sorts according to starting time. [1] So if we trust in it, so ironic is first one (in the bottom) and first which is still running (not configuration run). 2.1) ok, fails same place. baremetal node show CPU2 [2] 2.2) Now, logs look same too [3] 0) regarding image I have, I can podman save (a first option from man podman-save = podman save --quiet -o alpine.tar ironic-inspector:current-tripleo) P.S. baremetal is alias: alias baremetal="openstack baremetal" [1] http://paste.openstack.org/show/uejDzLWpPvMdLFAJTCam/ [2] http://paste.openstack.org/show/ryYv54g9XoWSKGdCOuqh/ [3] http://paste.openstack.org/show/syKp1MtkeOa1J5aglfNj/
you can access it here [1] I have done xz -9 to it in addition ;) so takes around 110 MB instead of 670MB [1] https://proxy.qwq.lt/fun/centos-binary-ironic-inspector.current-tripleo.tar.... On Thu, 1 Oct 2020 at 19:37, Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
Hi Julia,
1) I think, podman ps sorts according to starting time. [1] So if we trust in it, so ironic is first one (in the bottom) and first which is still running (not configuration run).
2.1) ok, fails same place. baremetal node show CPU2 [2] 2.2) Now, logs look same too [3]
0) regarding image I have, I can podman save (a first option from man podman-save = podman save --quiet -o alpine.tar ironic-inspector:current-tripleo)
P.S. baremetal is alias: alias baremetal="openstack baremetal"
[1] http://paste.openstack.org/show/uejDzLWpPvMdLFAJTCam/ [2] http://paste.openstack.org/show/ryYv54g9XoWSKGdCOuqh/ [3] http://paste.openstack.org/show/syKp1MtkeOa1J5aglfNj/
-- Ruslanas Gžibovskis +370 6030 7030
If memory serves me correctly, TripleO shares a folder outside the container for the configuration and logs are written out to the container console so the container itself is not exactly helpful. Interestingly the container contents you supplied is labeled ironic-inspector, but contains the ironic release from Ussuri. I think you're going to need someone with more context into how TripleO has assembled the container assets to provide more clarity than I can provide. My feeling is likely some sort of configuration issue for inspector, since the single inspection fails and the supplied log data shows the request coming in. On Thu, Oct 1, 2020 at 9:54 AM Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
you can access it here [1] I have done xz -9 to it in addition ;) so takes around 110 MB instead of 670MB
[1] https://proxy.qwq.lt/fun/centos-binary-ironic-inspector.current-tripleo.tar....
On Thu, 1 Oct 2020 at 19:37, Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
Hi Julia,
1) I think, podman ps sorts according to starting time. [1] So if we trust in it, so ironic is first one (in the bottom) and first which is still running (not configuration run).
2.1) ok, fails same place. baremetal node show CPU2 [2] 2.2) Now, logs look same too [3]
0) regarding image I have, I can podman save (a first option from man podman-save = podman save --quiet -o alpine.tar ironic-inspector:current-tripleo)
P.S. baremetal is alias: alias baremetal="openstack baremetal"
[1] http://paste.openstack.org/show/uejDzLWpPvMdLFAJTCam/ [2] http://paste.openstack.org/show/ryYv54g9XoWSKGdCOuqh/ [3] http://paste.openstack.org/show/syKp1MtkeOa1J5aglfNj/
-- Ruslanas Gžibovskis +370 6030 7030
Replying in line, not my favourite way, so not sure if i do this correctly or not. I could try to make access to this undercloud host if you want. On Thu, 1 Oct 2020, 20:36 Julia Kreger, <juliaashleykreger@gmail.com> wrote:
If memory serves me correctly, TripleO shares a folder outside the container for the configuration and logs are written out to the container console so the container itself is not exactly helpful.
Would you like to see exact configs? Which ones? I can grep/cat it. Same with all log files. If you need i can provide them to you. Interestingly the container contents you supplied is labeled
ironic-inspector, but contains the ironic release from Ussuri.
Yes. I use ussuri release from centos8 repos, and all the scripts it provides.
I think you're going to need someone with more context into how TripleO has assembled the container assets to provide more clarity than I can provide. My feeling is likely some sort of configuration issue for inspector, since the single inspection fails and the supplied log data shows the request coming in.
My earlier setup, which was deployed around 4 weeks ago, worked fine, and the one i have deployed last Friday, was not working. So something, if you have reverted it, might not been reverted in centos flows? Might it be right?
On Thu, Oct 1, 2020 at 9:54 AM Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
you can access it here [1] I have done xz -9 to it in addition ;) so takes around 110 MB instead of
670MB
[1]
https://proxy.qwq.lt/fun/centos-binary-ironic-inspector.current-tripleo.tar....
On Thu, 1 Oct 2020 at 19:37, Ruslanas Gžibovskis <ruslanas@lpic.lt>
Hi Julia,
1) I think, podman ps sorts according to starting time. [1] So if we trust in it, so ironic is first one (in the bottom) and first
which is still running (not configuration run).
2.1) ok, fails same place. baremetal node show CPU2 [2] 2.2) Now, logs look same too [3]
0) regarding image I have, I can podman save (a first option from man
wrote: podman-save = podman save --quiet -o alpine.tar ironic-inspector:current-tripleo)
P.S. baremetal is alias: alias baremetal="openstack baremetal"
[1] http://paste.openstack.org/show/uejDzLWpPvMdLFAJTCam/ [2] http://paste.openstack.org/show/ryYv54g9XoWSKGdCOuqh/ [3] http://paste.openstack.org/show/syKp1MtkeOa1J5aglfNj/
-- Ruslanas Gžibovskis +370 6030 7030
I am curious, could I somehow use my last known working version? It was: docker.io/tripleou/centos-binary-ironic-inspector@sha256:ad5d58c4cce48ed0c660a0be7fed69f53202a781e75b1037dcee96147e9b8c4b On Thu, 1 Oct 2020 at 21:00, Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
Replying in line, not my favourite way, so not sure if i do this correctly or not. I could try to make access to this undercloud host if you want.
On Thu, 1 Oct 2020, 20:36 Julia Kreger, <juliaashleykreger@gmail.com> wrote:
If memory serves me correctly, TripleO shares a folder outside the container for the configuration and logs are written out to the container console so the container itself is not exactly helpful.
Would you like to see exact configs? Which ones? I can grep/cat it. Same with all log files. If you need i can provide them to you.
Interestingly the container contents you supplied is labeled
ironic-inspector, but contains the ironic release from Ussuri.
Yes. I use ussuri release from centos8 repos, and all the scripts it provides.
I think you're going to need someone with more context into how TripleO has assembled the container assets to provide more clarity than I can provide. My feeling is likely some sort of configuration issue for inspector, since the single inspection fails and the supplied log data shows the request coming in.
My earlier setup, which was deployed around 4 weeks ago, worked fine, and the one i have deployed last Friday, was not working. So something, if you have reverted it, might not been reverted in centos flows? Might it be right?
On Thu, Oct 1, 2020 at 9:54 AM Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
you can access it here [1] I have done xz -9 to it in addition ;) so takes around 110 MB instead
of 670MB
[1]
https://proxy.qwq.lt/fun/centos-binary-ironic-inspector.current-tripleo.tar....
On Thu, 1 Oct 2020 at 19:37, Ruslanas Gžibovskis <ruslanas@lpic.lt>
Hi Julia,
1) I think, podman ps sorts according to starting time. [1] So if we trust in it, so ironic is first one (in the bottom) and first
which is still running (not configuration run).
2.1) ok, fails same place. baremetal node show CPU2 [2] 2.2) Now, logs look same too [3]
0) regarding image I have, I can podman save (a first option from man
wrote: podman-save = podman save --quiet -o alpine.tar ironic-inspector:current-tripleo)
P.S. baremetal is alias: alias baremetal="openstack baremetal"
[1] http://paste.openstack.org/show/uejDzLWpPvMdLFAJTCam/ [2] http://paste.openstack.org/show/ryYv54g9XoWSKGdCOuqh/ [3] http://paste.openstack.org/show/syKp1MtkeOa1J5aglfNj/
-- Ruslanas Gžibovskis +370 6030 7030
-- Ruslanas Gžibovskis +370 6030 7030
You're really in the territory of TripleO at this point. As such I'm replying with an altered subject to get their attention. On Tue, Oct 6, 2020 at 7:57 AM Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
I am curious, could I somehow use my last known working version? It was: docker.io/tripleou/centos-binary-ironic-inspector@sha256:ad5d58c4cce48ed0c660a0be7fed69f53202a781e75b1037dcee96147e9b8c4b
On Thu, 1 Oct 2020 at 21:00, Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
Replying in line, not my favourite way, so not sure if i do this correctly or not. I could try to make access to this undercloud host if you want.
On Thu, 1 Oct 2020, 20:36 Julia Kreger, <juliaashleykreger@gmail.com> wrote:
If memory serves me correctly, TripleO shares a folder outside the container for the configuration and logs are written out to the container console so the container itself is not exactly helpful.
Would you like to see exact configs? Which ones? I can grep/cat it. Same with all log files. If you need i can provide them to you.
Interestingly the container contents you supplied is labeled ironic-inspector, but contains the ironic release from Ussuri.
Yes. I use ussuri release from centos8 repos, and all the scripts it provides.
I think you're going to need someone with more context into how TripleO has assembled the container assets to provide more clarity than I can provide. My feeling is likely some sort of configuration issue for inspector, since the single inspection fails and the supplied log data shows the request coming in.
My earlier setup, which was deployed around 4 weeks ago, worked fine, and the one i have deployed last Friday, was not working. So something, if you have reverted it, might not been reverted in centos flows? Might it be right?
On Thu, Oct 1, 2020 at 9:54 AM Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
you can access it here [1] I have done xz -9 to it in addition ;) so takes around 110 MB instead of 670MB
[1] https://proxy.qwq.lt/fun/centos-binary-ironic-inspector.current-tripleo.tar....
On Thu, 1 Oct 2020 at 19:37, Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
Hi Julia,
1) I think, podman ps sorts according to starting time. [1] So if we trust in it, so ironic is first one (in the bottom) and first which is still running (not configuration run).
2.1) ok, fails same place. baremetal node show CPU2 [2] 2.2) Now, logs look same too [3]
0) regarding image I have, I can podman save (a first option from man podman-save = podman save --quiet -o alpine.tar ironic-inspector:current-tripleo)
P.S. baremetal is alias: alias baremetal="openstack baremetal"
[1] http://paste.openstack.org/show/uejDzLWpPvMdLFAJTCam/ [2] http://paste.openstack.org/show/ryYv54g9XoWSKGdCOuqh/ [3] http://paste.openstack.org/show/syKp1MtkeOa1J5aglfNj/
-- Ruslanas Gžibovskis +370 6030 7030
-- Ruslanas Gžibovskis +370 6030 7030
Hi all, I have re-deployed undercloud with older *ironic* docker images. Still same issues. I am not sure how it is done and how all the things work, BUT. when doing "baremetal inspect NODE" it gives me: | last_error | ironic-inspector inspection failed: The PXE filter driver DnsmasqFilter, state=uninitialized: my fsm encountered an exception: Can not transition from state 'uninitialized' on event 'sync' (no defined transition) any hints? This is what I see in /var.log/containers/ironic-inspector/dnsmasq.log: Oct 14 11:01:15 dnsmasq[8]: started, version 2.79 DNS disabled Oct 14 11:01:15 dnsmasq[8]: compile time options: IPv6 GNU-getopt DBus no-i18n IDN2 DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth DNSSEC loop-detect inotify Oct 14 11:01:15 dnsmasq-dhcp[8]: DHCP, IP range 10.40.1.230 -- 10.40.1.249, lease time 10m Oct 14 11:01:15 dnsmasq-dhcp[8]: read /var/lib/ironic-inspector/dhcp-hostsdir/unknown_hosts_filter Oct 14 12:09:07 dnsmasq[8]: inotify, new or changed file /var/lib/ironic-inspector/dhcp-hostsdir/24:6e:96:66:34:2a Oct 14 12:09:07 dnsmasq-dhcp[8]: read /var/lib/ironic-inspector/dhcp-hostsdir/24:6e:96:66:34:2a Also, I think this part in log, might be interesting [1], fails on ironic_inspector.pxe_filter.dnsmasq with message: join() argument must be str or bytes, not 'NoneType'; resetting the filter: TypeError: join() argument must be str or bytes, not 'NoneType' [1] http://paste.openstack.org/show/AnBXBP2p8frdsHqzsBse/ On Wed, 7 Oct 2020 at 17:24, Julia Kreger <juliaashleykreger@gmail.com> wrote:
You're really in the territory of TripleO at this point. As such I'm replying with an altered subject to get their attention.
On Tue, Oct 6, 2020 at 7:57 AM Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
I am curious, could I somehow use my last known working version? It was:
docker.io/tripleou/centos-binary-ironic-inspector@sha256:ad5d58c4cce48ed0c660a0be7fed69f53202a781e75b1037dcee96147e9b8c4b
On 10/14/20 11:37 AM, Ruslanas Gžibovskis wrote:
Hi all,
I have re-deployed undercloud with older *ironic* docker images. Still same issues. I am not sure how it is done and how all the things work, BUT. when doing "baremetal inspect NODE" it gives me:
| last_error | ironic-inspector inspection failed: The PXE filter driver DnsmasqFilter, state=uninitialized: my fsm encountered an exception: Can not transition from state 'uninitialized' on event 'sync' (no defined transition)
any hints?
This is what I see in /var.log/containers/ironic-inspector/dnsmasq.log: Oct 14 11:01:15 dnsmasq[8]: started, version 2.79 DNS disabled Oct 14 11:01:15 dnsmasq[8]: compile time options: IPv6 GNU-getopt DBus no-i18n IDN2 DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth DNSSEC loop-detect inotify Oct 14 11:01:15 dnsmasq-dhcp[8]: DHCP, IP range 10.40.1.230 -- 10.40.1.249, lease time 10m Oct 14 11:01:15 dnsmasq-dhcp[8]: read /var/lib/ironic-inspector/dhcp-hostsdir/unknown_hosts_filter Oct 14 12:09:07 dnsmasq[8]: inotify, new or changed file /var/lib/ironic-inspector/dhcp-hostsdir/24:6e:96:66:34:2a Oct 14 12:09:07 dnsmasq-dhcp[8]: read /var/lib/ironic-inspector/dhcp-hostsdir/24:6e:96:66:34:2a
Also, I think this part in log, might be interesting [1], fails on ironic_inspector.pxe_filter.dnsmasq with message:
join() argument must be str or bytes, not'NoneType'; resetting the filter: TypeError: join() argument must be str or bytes, not'NoneType'
? That is strange, do you have a port where the MAC address is "None"? Is that even possible?[1] The address field is mandatory[2], and should'nt be None for any baremetal port. How was the nodes enrolled? [1] https://opendev.org/openstack/ironic/src/branch/master/ironic/api/controller... [2] https://opendev.org/openstack/ironic/src/branch/master/ironic/api/controller...
On Wed, 7 Oct 2020 at 17:24, Julia Kreger <juliaashleykreger@gmail.com <mailto:juliaashleykreger@gmail.com>> wrote:
You're really in the territory of TripleO at this point. As such I'm replying with an altered subject to get their attention.
On Tue, Oct 6, 2020 at 7:57 AM Ruslanas Gžibovskis <ruslanas@lpic.lt <mailto:ruslanas@lpic.lt>> wrote: > > I am curious, could I somehow use my last known working version? > It was: docker.io/tripleou/centos-binary-ironic-inspector@sha256:ad5d58c4cce48ed0c660a0be7fed69f53202a781e75b1037dcee96147e9b8c4b <http://docker.io/tripleou/centos-binary-ironic-inspector@sha256:ad5d58c4cce48ed0c660a0be7fed69f53202a781e75b1037dcee96147e9b8c4b> > >
participants (3)
-
Harald Jensas
-
Julia Kreger
-
Ruslanas Gžibovskis