Kudos to every1, with your valuable suggestions and feedback we were able to deploy the baremetal successfully. Few things we considered to make this run possible: - Yes, the openstack hypervisor list is showing details for the added Baremetal Node with the IP allocated from the pool of internalAPI. - That placement related error as was highlighted looks like a bug in Train release but it is not impacting our process. - inclusion of modified serviceNetMap in case of a composable network approach. ( as suggested by Harald) - Using the openstack openSource document as a primary reference . ( as suggest by Julia) - https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features... - This document focused clearly on the host-aggregate concept and setting baremetal flavor property to be "true". Thanks once again it was really helpful. *Also can you please share some information on this query:* *http://lists.openstack.org/pipermail/openstack-discuss/2022-February/027315.... <http://lists.openstack.org/pipermail/openstack-discuss/2022-February/027315.html> Best Regards, Lokendra On Thu, Feb 17, 2022 at 5:26 AM Julia Kreger <juliaashleykreger@gmail.com> wrote:
On Mon, Feb 14, 2022 at 9:40 PM Laurent Dumont <laurentfdumont@gmail.com> wrote:
From what I understand of baremetal nodes, they will show up as hypervisors from the Nova perspective.
Can you try "openstack hypervisor list"
From the doc
+1. This is a good idea. This will tell us if Nova is at least syncing with Ironic. If it can't push the information to placement, that is obviously going to cause issues.
Each bare metal node becomes a separate hypervisor in Nova. The hypervisor host name always matches the associated node UUID.
On Mon, Feb 14, 2022 at 10:03 AM Lokendra Rathour < lokendrarathour@gmail.com> wrote:
Hi Julia, Thanks once again. we got your point and understood the issue, but we still are facing the same issue on our TRIPLEO Train HA Setup, even if the settings are done as per your recommendations.
The error that we are seeing is again "*No valid host was found"*
So this error is a bit of a generic catch error indicating it just doesn't know how to schedule the node. But the next error you mentioned *is* telling in that a node can't be scheduled if placement is not working.
[trim]
On further debugging, we found that in the nova-scheduler logs :
*2022-02-14 12:58:22.830 7 WARNING keystoneauth.discover [-] Failed to contact the endpoint at http://172.16.2.224:8778/placement <http://172.16.2.224:8778/placement> for discovery. Fallback to using that endpoint as the base url.2022-02-14 12:58:23.438 7 WARNING keystoneauth.discover [req-ad5801e4-efd7-4159-a601-68e72c0d651f - - - - -] Failed to contact the endpoint at http://172.16.2.224:8778/placement <http://172.16.2.224:8778/placement> for discovery. Fallback to using that endpoint as the base url.*
where 172.16.2.224 is the internal IP.
going by document : Bare Metal Instances in Overcloud — TripleO 3.0.0 documentation (openstack.org) <https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/baremetal_overcloud.html>
it is given as below for commands:
(overcloud) [root@overcloud-controller-0 ~]# endpoint= http://172.16.2.224:8778/placement (overcloud) [root@overcloud-controller-0 ~]# token=$(openstack token issue -f value -c id) (overcloud) [root@overcloud-controller-0 ~]# curl -sH "X-Auth-Token: $token" $endpoint/resource_providers/<node id> | jq .inventories *null*
result is the same even if we run the curl command on public endpoint.
Please advice.
So this sounds like you have placement either not operating or incorrectly configured somehow. I am not a placement expert, but I don't think a node_id is used for resource providers.
Hopefully a placement expert can chime in here. That being said, the note about the service failing to connect to the endpoint for discovery is somewhat telling. You *should* be able to curl the root of the API, without a token, and discover a basic JSON document response with information which is used for API discovery. If that is not working, then there may be several things occuring. I would check to make sure the container(s) running placement are operating, not logging any errors, and responding properly. If they are responding if directly queried, then I wonder if there is something going on with load balancing. Possibly consider connecting to placement's port directly instead of through any sort of load balancing such as what is provided by haproxy. I think placement's log indicates the port it starts on, so that would hopefully help. It's configuration should also share the information as well.
--