Communication problem between ironic-python-agent and CI server.
Hi team, I'm now trying to use ironic deployed with devstack to manage baremetal machine. However when it run into deploying stage, I open the BM server terminal and see it successfully load ramdisk and boot into it. It get the ip I assigned and I can ping it from CI server side. But it then deploy failed just about 2 minutes later. When I check ironic-conductor log with command "sudo journalctl -a --unit devstack@ir-cond" and found error like this: ERROR ironic.drivers.modules.agent_client [None req-de37bc21-8d62-41db-8983-c06789939818 None None] Failed to connect to the agent running on node ea88ba26-756d-4d32-89f4-7ff086fa8868 for invoking command iscsi.start_iscsi_target. Error: HTTPConnectionPool(host='10.0.0.25', port=9999): Max retries exceeded with url: /v1/commands?wait=true (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f5461027f10>, 'Connection to 10.0.0.25 timed out. (connect timeout=60)')): ConnectTimeout: HTTPConnectionPool(host='10.0.0.25', port=9999): Max retries exceeded with url: /v1/commands?wait=true (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f5461027f10>, 'Connection to 10.0.0.25 timed out. (connect timeout=60)')) I can ping it from CI server side, so it is strange why the connection time out between ironic-python-agent and CI server. Does anyone meet similar problem or have idea about it? Thank you! Best Regards, Guannan
Sent from my iPhone
On Nov 26, 2019, at 4:34 AM, Guannan GN2 Sun <sungn2@lenovo.com> wrote:
Hi team,
I'm now trying to use ironic deployed with devstack to manage baremetal machine.
However when it run into deploying stage, I open the BM server terminal and see it successfully load ramdisk and boot into it. It get the ip I assigned and I can ping it from CI server side. But it then deploy failed just about 2 minutes later.
When I check ironic-conductor log with command "sudo journalctl -a --unit devstack@ir-cond" and found error like this:
ERROR ironic.drivers.modules.agent_client [None req-de37bc21-8d62-41db-8983-c06789939818 None None] Failed to connect to the agent running on node ea88ba26-756d-4d32-89f4-7ff086fa8868 for invoking command iscsi.start_iscsi_target. Error: HTTPConnectionPool(host='10.0.0.25', port=9999): Max retries exceeded with url: /v1/commands?wait=true (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f5461027f10>, 'Connection to 10.0.0.25 timed out. (connect timeout=60)')): ConnectTimeout: HTTPConnectionPool(host='10.0.0.25', port=9999): Max retries exceeded with url: /v1/commands?wait=true (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f5461027f10>, 'Connection to 10.0.0.25 timed out. (connect timeout=60)'))
I can ping it from CI server side, so it is strange why the connection time out between ironic-python-agent and CI server. Does anyone meet similar problem or have idea about it?
You can ping it but can you make an HTTP request to port 9999 via something like curl?
Thank you!
Best Regards,
Guannan
Jay reached out to me and in some discussion it seems like the following is occurring: * Ramdisk is loading from tftp_server * Conductor is not able to reach the 10.0.0.0/24 subnet where the ironic-python-agent is running * There appears to be a lack of a route inside the CI host that the conductor is operating on telling the host kernel to direct packets for IPA to the neutron router. Ramdisk loading would still work if egress traffic is being NAT translated, but ingress traffic would appear like this, ironic being unable to send packets because the conductor is communicating from the context of the CI host, and any namespaces created by neutron may not be directly reachable. -Julia On Tue, Nov 26, 2019 at 1:48 AM Mohammed Naser <mnaser@vexxhost.com> wrote:
Sent from my iPhone
On Nov 26, 2019, at 4:34 AM, Guannan GN2 Sun <sungn2@lenovo.com> wrote:
Hi team,
I'm now trying to use ironic deployed with devstack to manage baremetal machine.
However when it run into deploying stage, I open the BM server terminal and see it successfully load ramdisk and boot into it. It get the ip I assigned and I can ping it from CI server side. But it then deploy failed just about 2 minutes later.
When I check ironic-conductor log with command "sudo journalctl -a --unit devstack@ir-cond" and found error like this:
ERROR ironic.drivers.modules.agent_client [None req-de37bc21-8d62-41db-8983-c06789939818 None None] Failed to connect to the agent running on node ea88ba26-756d-4d32-89f4-7ff086fa8868 for invoking command iscsi.start_iscsi_target. Error: HTTPConnectionPool(host='10.0.0.25', port=9999): Max retries exceeded with url: /v1/commands?wait=true (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f5461027f10>, 'Connection to 10.0.0.25 timed out. (connect timeout=60)')): ConnectTimeout: HTTPConnectionPool(host='10.0.0.25', port=9999): Max retries exceeded with url: /v1/commands?wait=true (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f5461027f10>, 'Connection to 10.0.0.25 timed out. (connect timeout=60)'))
I can ping it from CI server side, so it is strange why the connection time out between ironic-python-agent and CI server. Does anyone meet similar problem or have idea about it?
You can ping it but can you make an HTTP request to port 9999 via something like curl?
Thank you!
Best Regards,
Guannan
Thank you Julia and Mohammed, I guess there may have something wrong with my network configuration. Because CI server is not directly connect with BM node. As our physical network is designed like this: [cid:a7c29f59-740a-4fd7-8a84-c6a28a27a0c5] So I use neutron to create "br-ens9" between ens9 and br-int when I deploy devstack. So that it can ping to ip I assigned to eno2 on BM node when deploying. However, I don't know whether ironic conductor can communiate to ironic python agent. Is that could be the root cause? I will take a look into it. Thank you! Best Regards, Guannan ________________________________ 发件人: Julia Kreger <juliaashleykreger@gmail.com> 发送时间: 2019年11月27日 4:05:31 收件人: Mohammed Naser 抄送: Guannan GN2 Sun; openstack-discuss@lists.openstack.org; Jay Bryant1 主题: [External] Re: Communication problem between ironic-python-agent and CI server. Jay reached out to me and in some discussion it seems like the following is occurring: * Ramdisk is loading from tftp_server * Conductor is not able to reach the 10.0.0.0/24 subnet where the ironic-python-agent is running * There appears to be a lack of a route inside the CI host that the conductor is operating on telling the host kernel to direct packets for IPA to the neutron router. Ramdisk loading would still work if egress traffic is being NAT translated, but ingress traffic would appear like this, ironic being unable to send packets because the conductor is communicating from the context of the CI host, and any namespaces created by neutron may not be directly reachable. -Julia On Tue, Nov 26, 2019 at 1:48 AM Mohammed Naser <mnaser@vexxhost.com> wrote:
Sent from my iPhone
On Nov 26, 2019, at 4:34 AM, Guannan GN2 Sun <sungn2@lenovo.com> wrote:
Hi team,
I'm now trying to use ironic deployed with devstack to manage baremetal machine.
However when it run into deploying stage, I open the BM server terminal and see it successfully load ramdisk and boot into it. It get the ip I assigned and I can ping it from CI server side. But it then deploy failed just about 2 minutes later.
When I check ironic-conductor log with command "sudo journalctl -a --unit devstack@ir-cond" and found error like this:
ERROR ironic.drivers.modules.agent_client [None req-de37bc21-8d62-41db-8983-c06789939818 None None] Failed to connect to the agent running on node ea88ba26-756d-4d32-89f4-7ff086fa8868 for invoking command iscsi.start_iscsi_target. Error: HTTPConnectionPool(host='10.0.0.25', port=9999): Max retries exceeded with url: /v1/commands?wait=true (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f5461027f10>, 'Connection to 10.0.0.25 timed out. (connect timeout=60)')): ConnectTimeout: HTTPConnectionPool(host='10.0.0.25', port=9999): Max retries exceeded with url: /v1/commands?wait=true (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f5461027f10>, 'Connection to 10.0.0.25 timed out. (connect timeout=60)'))
I can ping it from CI server side, so it is strange why the connection time out between ironic-python-agent and CI server. Does anyone meet similar problem or have idea about it?
You can ping it but can you make an HTTP request to port 9999 via something like curl?
Thank you!
Best Regards,
Guannan
participants (3)
-
Guannan GN2 Sun
-
Julia Kreger
-
Mohammed Naser