[magnum][nova] AZ or Host aggregates not working as expected
Folks, I am running the kolla-ansible 2023.1 release of openstack and I have deployed magnum with ClusterAPI and things are working as expected except AZ. I have two AZ and I have mapped them in flavor properties accordingly, 1. General 2. SRIOV When I create k8s cluster from the horizon I do select "General" AZ to run my cluster in general AZ but somehow some nodes go to General compute pools and some go to SRIOV pool. It breaks things because of different networking in both pools. For testing, when I launch VMs manually then they land on their desired AZ (or Host Aggregation pool) but only magnum or k8s do not understand AZ. I am clueless and not sure what is going on here. # openstack flavor show gen.c4-m8-d40 +----------------------------+-----------------------------------------------+ | Field | Value | +----------------------------+-----------------------------------------------+ | OS-FLV-DISABLED:disabled | False | | OS-FLV-EXT-DATA:ephemeral | 0 | | access_project_ids | None | | description | None | | disk | 40 | | id | c8088b3f-1e92-405d-b310-a50c25e7040d | | name | gen.c4-m8-d40 | | os-flavor-access:is_public | True | | properties | aggregate_instance_extra_specs:general='true' | | ram | 8000 | | rxtx_factor | 1.0 | | swap | 0 | | vcpus | 4 | +----------------------------+-----------------------------------------------+ I did set property in AZ general=trure # openstack availability zone list +-----------+-------------+ | Zone Name | Zone Status | +-----------+-------------+ | general | available | | internal | available | | sriov | available |
Update: After doing a bunch of testing I found only multi-master nodes not respecting Host Aggregation or AZ rules. Magnum trying to schedule masters on two different AZ (How do I tell magnum to not do that?) If I build with a single master then everything works and lands on proper AZ. On Mon, Feb 26, 2024 at 9:42 PM Satish Patel <satish.txt@gmail.com> wrote:
Folks,
I am running the kolla-ansible 2023.1 release of openstack and I have deployed magnum with ClusterAPI and things are working as expected except AZ.
I have two AZ and I have mapped them in flavor properties accordingly, 1. General 2. SRIOV
When I create k8s cluster from the horizon I do select "General" AZ to run my cluster in general AZ but somehow some nodes go to General compute pools and some go to SRIOV pool. It breaks things because of different networking in both pools.
For testing, when I launch VMs manually then they land on their desired AZ (or Host Aggregation pool) but only magnum or k8s do not understand AZ. I am clueless and not sure what is going on here.
# openstack flavor show gen.c4-m8-d40
+----------------------------+-----------------------------------------------+ | Field | Value |
+----------------------------+-----------------------------------------------+ | OS-FLV-DISABLED:disabled | False | | OS-FLV-EXT-DATA:ephemeral | 0 | | access_project_ids | None | | description | None | | disk | 40 | | id | c8088b3f-1e92-405d-b310-a50c25e7040d | | name | gen.c4-m8-d40 | | os-flavor-access:is_public | True | | properties | aggregate_instance_extra_specs:general='true' | | ram | 8000 | | rxtx_factor | 1.0 | | swap | 0 | | vcpus | 4 |
+----------------------------+-----------------------------------------------+
I did set property in AZ general=trure
# openstack availability zone list +-----------+-------------+ | Zone Name | Zone Status | +-----------+-------------+ | general | available | | internal | available | | sriov | available |
Hi Satish, Right now this seems to be a small outstanding issue: https://github.com/vexxhost/magnum-cluster-api/issues/257 I think you've started discussing with other users of the driver who've faced a similar issue. I'll leave some more details in the issue right now Thanks, Mohammed ________________________________ From: Satish Patel <satish.txt@gmail.com> Sent: February 26, 2024 10:21 PM To: OpenStack Discuss <openstack-discuss@lists.openstack.org> Subject: Re: [magnum][nova] AZ or Host aggregates not working as expected Update: After doing a bunch of testing I found only multi-master nodes not respecting Host Aggregation or AZ rules. Magnum trying to schedule masters on two different AZ (How do I tell magnum to not do that?) If I build with a single master then everything works and lands on proper AZ. On Mon, Feb 26, 2024 at 9:42 PM Satish Patel <satish.txt@gmail.com<mailto:satish.txt@gmail.com>> wrote: Folks, I am running the kolla-ansible 2023.1 release of openstack and I have deployed magnum with ClusterAPI and things are working as expected except AZ. I have two AZ and I have mapped them in flavor properties accordingly, 1. General 2. SRIOV When I create k8s cluster from the horizon I do select "General" AZ to run my cluster in general AZ but somehow some nodes go to General compute pools and some go to SRIOV pool. It breaks things because of different networking in both pools. For testing, when I launch VMs manually then they land on their desired AZ (or Host Aggregation pool) but only magnum or k8s do not understand AZ. I am clueless and not sure what is going on here. # openstack flavor show gen.c4-m8-d40 +----------------------------+-----------------------------------------------+ | Field | Value | +----------------------------+-----------------------------------------------+ | OS-FLV-DISABLED:disabled | False | | OS-FLV-EXT-DATA:ephemeral | 0 | | access_project_ids | None | | description | None | | disk | 40 | | id | c8088b3f-1e92-405d-b310-a50c25e7040d | | name | gen.c4-m8-d40 | | os-flavor-access:is_public | True | | properties | aggregate_instance_extra_specs:general='true' | | ram | 8000 | | rxtx_factor | 1.0 | | swap | 0 | | vcpus | 4 | +----------------------------+-----------------------------------------------+ I did set property in AZ general=trure # openstack availability zone list +-----------+-------------+ | Zone Name | Zone Status | +-----------+-------------+ | general | available | | internal | available | | sriov | available |
Hi Satish, I've pushed a PR with what should be a fix: https://github.com/vexxhost/magnum-cluster-api/pull/313 We've also taken advantage of this to enable you to create node groups in specific availability zones in the same patch, as well as fixing the control plane AZ. Thanks! Mohammed ________________________________ From: Mohammed Naser <mnaser@vexxhost.com> Sent: February 27, 2024 10:00 AM To: Satish Patel <satish.txt@gmail.com>; OpenStack Discuss <openstack-discuss@lists.openstack.org> Subject: Re: [magnum][nova] AZ or Host aggregates not working as expected Hi Satish, Right now this seems to be a small outstanding issue: https://github.com/vexxhost/magnum-cluster-api/issues/257 I think you've started discussing with other users of the driver who've faced a similar issue. I'll leave some more details in the issue right now Thanks, Mohammed ________________________________ From: Satish Patel <satish.txt@gmail.com> Sent: February 26, 2024 10:21 PM To: OpenStack Discuss <openstack-discuss@lists.openstack.org> Subject: Re: [magnum][nova] AZ or Host aggregates not working as expected Update: After doing a bunch of testing I found only multi-master nodes not respecting Host Aggregation or AZ rules. Magnum trying to schedule masters on two different AZ (How do I tell magnum to not do that?) If I build with a single master then everything works and lands on proper AZ. On Mon, Feb 26, 2024 at 9:42 PM Satish Patel <satish.txt@gmail.com<mailto:satish.txt@gmail.com>> wrote: Folks, I am running the kolla-ansible 2023.1 release of openstack and I have deployed magnum with ClusterAPI and things are working as expected except AZ. I have two AZ and I have mapped them in flavor properties accordingly, 1. General 2. SRIOV When I create k8s cluster from the horizon I do select "General" AZ to run my cluster in general AZ but somehow some nodes go to General compute pools and some go to SRIOV pool. It breaks things because of different networking in both pools. For testing, when I launch VMs manually then they land on their desired AZ (or Host Aggregation pool) but only magnum or k8s do not understand AZ. I am clueless and not sure what is going on here. # openstack flavor show gen.c4-m8-d40 +----------------------------+-----------------------------------------------+ | Field | Value | +----------------------------+-----------------------------------------------+ | OS-FLV-DISABLED:disabled | False | | OS-FLV-EXT-DATA:ephemeral | 0 | | access_project_ids | None | | description | None | | disk | 40 | | id | c8088b3f-1e92-405d-b310-a50c25e7040d | | name | gen.c4-m8-d40 | | os-flavor-access:is_public | True | | properties | aggregate_instance_extra_specs:general='true' | | ram | 8000 | | rxtx_factor | 1.0 | | swap | 0 | | vcpus | 4 | +----------------------------+-----------------------------------------------+ I did set property in AZ general=trure # openstack availability zone list +-----------+-------------+ | Zone Name | Zone Status | +-----------+-------------+ | general | available | | internal | available | | sriov | available |
Thank you Mohammed, Can I just apply [1] patch manually and give it a try? Assuming after patch I can pass labels to the magnum template using controlPlaneAvailabilityZone=foo1 right? 1. https://github.com/vexxhost/magnum-cluster-api/pull/313/files On Tue, Feb 27, 2024 at 10:25 AM Mohammed Naser <mnaser@vexxhost.com> wrote:
Hi Satish,
I've pushed a PR with what should be a fix:
https://github.com/vexxhost/magnum-cluster-api/pull/313
We've also taken advantage of this to enable you to create node groups in specific availability zones in the same patch, as well as fixing the control plane AZ.
Thanks! Mohammed ------------------------------ *From:* Mohammed Naser <mnaser@vexxhost.com> *Sent:* February 27, 2024 10:00 AM *To:* Satish Patel <satish.txt@gmail.com>; OpenStack Discuss < openstack-discuss@lists.openstack.org> *Subject:* Re: [magnum][nova] AZ or Host aggregates not working as expected
Hi Satish,
Right now this seems to be a small outstanding issue:
https://github.com/vexxhost/magnum-cluster-api/issues/257
I think you've started discussing with other users of the driver who've faced a similar issue. I'll leave some more details in the issue right now
Thanks, Mohammed ------------------------------ *From:* Satish Patel <satish.txt@gmail.com> *Sent:* February 26, 2024 10:21 PM *To:* OpenStack Discuss <openstack-discuss@lists.openstack.org> *Subject:* Re: [magnum][nova] AZ or Host aggregates not working as expected
Update:
After doing a bunch of testing I found only multi-master nodes not respecting Host Aggregation or AZ rules. Magnum trying to schedule masters on two different AZ (How do I tell magnum to not do that?)
If I build with a single master then everything works and lands on proper AZ.
On Mon, Feb 26, 2024 at 9:42 PM Satish Patel <satish.txt@gmail.com> wrote:
Folks,
I am running the kolla-ansible 2023.1 release of openstack and I have deployed magnum with ClusterAPI and things are working as expected except AZ.
I have two AZ and I have mapped them in flavor properties accordingly, 1. General 2. SRIOV
When I create k8s cluster from the horizon I do select "General" AZ to run my cluster in general AZ but somehow some nodes go to General compute pools and some go to SRIOV pool. It breaks things because of different networking in both pools.
For testing, when I launch VMs manually then they land on their desired AZ (or Host Aggregation pool) but only magnum or k8s do not understand AZ. I am clueless and not sure what is going on here.
# openstack flavor show gen.c4-m8-d40
+----------------------------+-----------------------------------------------+ | Field | Value |
+----------------------------+-----------------------------------------------+ | OS-FLV-DISABLED:disabled | False | | OS-FLV-EXT-DATA:ephemeral | 0 | | access_project_ids | None | | description | None | | disk | 40 | | id | c8088b3f-1e92-405d-b310-a50c25e7040d | | name | gen.c4-m8-d40 | | os-flavor-access:is_public | True | | properties | aggregate_instance_extra_specs:general='true' | | ram | 8000 | | rxtx_factor | 1.0 | | swap | 0 | | vcpus | 4 |
+----------------------------+-----------------------------------------------+
I did set property in AZ general=trure
# openstack availability zone list +-----------+-------------+ | Zone Name | Zone Status | +-----------+-------------+ | general | available | | internal | available | | sriov | available |
Hi Satish, I haven't tested this locally yet so waiting for CI to go through. Also, the label will be the one matching Magnum, so it will be availability_zone. Thanks Mohammed ________________________________ From: Satish Patel <satish.txt@gmail.com> Sent: February 27, 2024 10:35 AM To: Mohammed Naser <mnaser@vexxhost.com> Cc: OpenStack Discuss <openstack-discuss@lists.openstack.org> Subject: Re: [magnum][nova] AZ or Host aggregates not working as expected Thank you Mohammed, Can I just apply [1] patch manually and give it a try? Assuming after patch I can pass labels to the magnum template using controlPlaneAvailabilityZone=foo1 right? 1. https://github.com/vexxhost/magnum-cluster-api/pull/313/files On Tue, Feb 27, 2024 at 10:25 AM Mohammed Naser <mnaser@vexxhost.com<mailto:mnaser@vexxhost.com>> wrote: Hi Satish, I've pushed a PR with what should be a fix: https://github.com/vexxhost/magnum-cluster-api/pull/313 We've also taken advantage of this to enable you to create node groups in specific availability zones in the same patch, as well as fixing the control plane AZ. Thanks! Mohammed ________________________________ From: Mohammed Naser <mnaser@vexxhost.com<mailto:mnaser@vexxhost.com>> Sent: February 27, 2024 10:00 AM To: Satish Patel <satish.txt@gmail.com<mailto:satish.txt@gmail.com>>; OpenStack Discuss <openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org>> Subject: Re: [magnum][nova] AZ or Host aggregates not working as expected Hi Satish, Right now this seems to be a small outstanding issue: https://github.com/vexxhost/magnum-cluster-api/issues/257 I think you've started discussing with other users of the driver who've faced a similar issue. I'll leave some more details in the issue right now Thanks, Mohammed ________________________________ From: Satish Patel <satish.txt@gmail.com<mailto:satish.txt@gmail.com>> Sent: February 26, 2024 10:21 PM To: OpenStack Discuss <openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org>> Subject: Re: [magnum][nova] AZ or Host aggregates not working as expected Update: After doing a bunch of testing I found only multi-master nodes not respecting Host Aggregation or AZ rules. Magnum trying to schedule masters on two different AZ (How do I tell magnum to not do that?) If I build with a single master then everything works and lands on proper AZ. On Mon, Feb 26, 2024 at 9:42 PM Satish Patel <satish.txt@gmail.com<mailto:satish.txt@gmail.com>> wrote: Folks, I am running the kolla-ansible 2023.1 release of openstack and I have deployed magnum with ClusterAPI and things are working as expected except AZ. I have two AZ and I have mapped them in flavor properties accordingly, 1. General 2. SRIOV When I create k8s cluster from the horizon I do select "General" AZ to run my cluster in general AZ but somehow some nodes go to General compute pools and some go to SRIOV pool. It breaks things because of different networking in both pools. For testing, when I launch VMs manually then they land on their desired AZ (or Host Aggregation pool) but only magnum or k8s do not understand AZ. I am clueless and not sure what is going on here. # openstack flavor show gen.c4-m8-d40 +----------------------------+-----------------------------------------------+ | Field | Value | +----------------------------+-----------------------------------------------+ | OS-FLV-DISABLED:disabled | False | | OS-FLV-EXT-DATA:ephemeral | 0 | | access_project_ids | None | | description | None | | disk | 40 | | id | c8088b3f-1e92-405d-b310-a50c25e7040d | | name | gen.c4-m8-d40 | | os-flavor-access:is_public | True | | properties | aggregate_instance_extra_specs:general='true' | | ram | 8000 | | rxtx_factor | 1.0 | | swap | 0 | | vcpus | 4 | +----------------------------+-----------------------------------------------+ I did set property in AZ general=trure # openstack availability zone list +-----------+-------------+ | Zone Name | Zone Status | +-----------+-------------+ | general | available | | internal | available | | sriov | available |
Hi Mohammed, Thank you for the update, so just pass label availability_zone=foo1 in the template and it will pass it to CAPI to make magic happen. I am going to give it a try manually and see how it goes. On Tue, Feb 27, 2024 at 10:54 AM Mohammed Naser <mnaser@vexxhost.com> wrote:
Hi Satish,
I haven't tested this locally yet so waiting for CI to go through. Also, the label will be the one matching Magnum, so it will be availability_zone .
Thanks Mohammed ------------------------------ *From:* Satish Patel <satish.txt@gmail.com> *Sent:* February 27, 2024 10:35 AM *To:* Mohammed Naser <mnaser@vexxhost.com> *Cc:* OpenStack Discuss <openstack-discuss@lists.openstack.org> *Subject:* Re: [magnum][nova] AZ or Host aggregates not working as expected
Thank you Mohammed,
Can I just apply [1] patch manually and give it a try? Assuming after patch I can pass labels to the magnum template using controlPlaneAvailabilityZone=foo1 right?
1. https://github.com/vexxhost/magnum-cluster-api/pull/313/files
On Tue, Feb 27, 2024 at 10:25 AM Mohammed Naser <mnaser@vexxhost.com> wrote:
Hi Satish,
I've pushed a PR with what should be a fix:
https://github.com/vexxhost/magnum-cluster-api/pull/313
We've also taken advantage of this to enable you to create node groups in specific availability zones in the same patch, as well as fixing the control plane AZ.
Thanks! Mohammed ------------------------------ *From:* Mohammed Naser <mnaser@vexxhost.com> *Sent:* February 27, 2024 10:00 AM *To:* Satish Patel <satish.txt@gmail.com>; OpenStack Discuss < openstack-discuss@lists.openstack.org> *Subject:* Re: [magnum][nova] AZ or Host aggregates not working as expected
Hi Satish,
Right now this seems to be a small outstanding issue:
https://github.com/vexxhost/magnum-cluster-api/issues/257
I think you've started discussing with other users of the driver who've faced a similar issue. I'll leave some more details in the issue right now
Thanks, Mohammed ------------------------------ *From:* Satish Patel <satish.txt@gmail.com> *Sent:* February 26, 2024 10:21 PM *To:* OpenStack Discuss <openstack-discuss@lists.openstack.org> *Subject:* Re: [magnum][nova] AZ or Host aggregates not working as expected
Update:
After doing a bunch of testing I found only multi-master nodes not respecting Host Aggregation or AZ rules. Magnum trying to schedule masters on two different AZ (How do I tell magnum to not do that?)
If I build with a single master then everything works and lands on proper AZ.
On Mon, Feb 26, 2024 at 9:42 PM Satish Patel <satish.txt@gmail.com> wrote:
Folks,
I am running the kolla-ansible 2023.1 release of openstack and I have deployed magnum with ClusterAPI and things are working as expected except AZ.
I have two AZ and I have mapped them in flavor properties accordingly, 1. General 2. SRIOV
When I create k8s cluster from the horizon I do select "General" AZ to run my cluster in general AZ but somehow some nodes go to General compute pools and some go to SRIOV pool. It breaks things because of different networking in both pools.
For testing, when I launch VMs manually then they land on their desired AZ (or Host Aggregation pool) but only magnum or k8s do not understand AZ. I am clueless and not sure what is going on here.
# openstack flavor show gen.c4-m8-d40
+----------------------------+-----------------------------------------------+ | Field | Value |
+----------------------------+-----------------------------------------------+ | OS-FLV-DISABLED:disabled | False | | OS-FLV-EXT-DATA:ephemeral | 0 | | access_project_ids | None | | description | None | | disk | 40 | | id | c8088b3f-1e92-405d-b310-a50c25e7040d | | name | gen.c4-m8-d40 | | os-flavor-access:is_public | True | | properties | aggregate_instance_extra_specs:general='true' | | ram | 8000 | | rxtx_factor | 1.0 | | swap | 0 | | vcpus | 4 |
+----------------------------+-----------------------------------------------+
I did set property in AZ general=trure
# openstack availability zone list +-----------+-------------+ | Zone Name | Zone Status | +-----------+-------------+ | general | available | | internal | available | | sriov | available |
Hi Satish, The patch seems to have passed all testing. Can you try on your side and see if it did the trick? 🙂 Thanks, Mohammed ________________________________ From: Satish Patel <satish.txt@gmail.com> Sent: February 27, 2024 11:03 AM To: Mohammed Naser <mnaser@vexxhost.com> Cc: OpenStack Discuss <openstack-discuss@lists.openstack.org> Subject: Re: [magnum][nova] AZ or Host aggregates not working as expected Hi Mohammed, Thank you for the update, so just pass label availability_zone=foo1 in the template and it will pass it to CAPI to make magic happen. I am going to give it a try manually and see how it goes. On Tue, Feb 27, 2024 at 10:54 AM Mohammed Naser <mnaser@vexxhost.com<mailto:mnaser@vexxhost.com>> wrote: Hi Satish, I haven't tested this locally yet so waiting for CI to go through. Also, the label will be the one matching Magnum, so it will be availability_zone. Thanks Mohammed ________________________________ From: Satish Patel <satish.txt@gmail.com<mailto:satish.txt@gmail.com>> Sent: February 27, 2024 10:35 AM To: Mohammed Naser <mnaser@vexxhost.com<mailto:mnaser@vexxhost.com>> Cc: OpenStack Discuss <openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org>> Subject: Re: [magnum][nova] AZ or Host aggregates not working as expected Thank you Mohammed, Can I just apply [1] patch manually and give it a try? Assuming after patch I can pass labels to the magnum template using controlPlaneAvailabilityZone=foo1 right? 1. https://github.com/vexxhost/magnum-cluster-api/pull/313/files On Tue, Feb 27, 2024 at 10:25 AM Mohammed Naser <mnaser@vexxhost.com<mailto:mnaser@vexxhost.com>> wrote: Hi Satish, I've pushed a PR with what should be a fix: https://github.com/vexxhost/magnum-cluster-api/pull/313 We've also taken advantage of this to enable you to create node groups in specific availability zones in the same patch, as well as fixing the control plane AZ. Thanks! Mohammed ________________________________ From: Mohammed Naser <mnaser@vexxhost.com<mailto:mnaser@vexxhost.com>> Sent: February 27, 2024 10:00 AM To: Satish Patel <satish.txt@gmail.com<mailto:satish.txt@gmail.com>>; OpenStack Discuss <openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org>> Subject: Re: [magnum][nova] AZ or Host aggregates not working as expected Hi Satish, Right now this seems to be a small outstanding issue: https://github.com/vexxhost/magnum-cluster-api/issues/257 I think you've started discussing with other users of the driver who've faced a similar issue. I'll leave some more details in the issue right now Thanks, Mohammed ________________________________ From: Satish Patel <satish.txt@gmail.com<mailto:satish.txt@gmail.com>> Sent: February 26, 2024 10:21 PM To: OpenStack Discuss <openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org>> Subject: Re: [magnum][nova] AZ or Host aggregates not working as expected Update: After doing a bunch of testing I found only multi-master nodes not respecting Host Aggregation or AZ rules. Magnum trying to schedule masters on two different AZ (How do I tell magnum to not do that?) If I build with a single master then everything works and lands on proper AZ. On Mon, Feb 26, 2024 at 9:42 PM Satish Patel <satish.txt@gmail.com<mailto:satish.txt@gmail.com>> wrote: Folks, I am running the kolla-ansible 2023.1 release of openstack and I have deployed magnum with ClusterAPI and things are working as expected except AZ. I have two AZ and I have mapped them in flavor properties accordingly, 1. General 2. SRIOV When I create k8s cluster from the horizon I do select "General" AZ to run my cluster in general AZ but somehow some nodes go to General compute pools and some go to SRIOV pool. It breaks things because of different networking in both pools. For testing, when I launch VMs manually then they land on their desired AZ (or Host Aggregation pool) but only magnum or k8s do not understand AZ. I am clueless and not sure what is going on here. # openstack flavor show gen.c4-m8-d40 +----------------------------+-----------------------------------------------+ | Field | Value | +----------------------------+-----------------------------------------------+ | OS-FLV-DISABLED:disabled | False | | OS-FLV-EXT-DATA:ephemeral | 0 | | access_project_ids | None | | description | None | | disk | 40 | | id | c8088b3f-1e92-405d-b310-a50c25e7040d | | name | gen.c4-m8-d40 | | os-flavor-access:is_public | True | | properties | aggregate_instance_extra_specs:general='true' | | ram | 8000 | | rxtx_factor | 1.0 | | swap | 0 | | vcpus | 4 | +----------------------------+-----------------------------------------------+ I did set property in AZ general=trure # openstack availability zone list +-----------+-------------+ | Zone Name | Zone Status | +-----------+-------------+ | general | available | | internal | available | | sriov | available |
Thank Mohammed, How do I quickly apply this patch? I tried to obtain a resource.py file and replace it with an existing one but that didn't go well and threw some strange errors in logs. Maybe I am running a little behind the version of mcap. (v0.13.4) On Tue, Feb 27, 2024 at 3:45 PM Mohammed Naser <mnaser@vexxhost.com> wrote:
Hi Satish,
The patch seems to have passed all testing. Can you try on your side and see if it did the trick? 🙂
Thanks, Mohammed ------------------------------ *From:* Satish Patel <satish.txt@gmail.com> *Sent:* February 27, 2024 11:03 AM *To:* Mohammed Naser <mnaser@vexxhost.com> *Cc:* OpenStack Discuss <openstack-discuss@lists.openstack.org> *Subject:* Re: [magnum][nova] AZ or Host aggregates not working as expected
Hi Mohammed,
Thank you for the update, so just pass label availability_zone=foo1 in the template and it will pass it to CAPI to make magic happen. I am going to give it a try manually and see how it goes.
On Tue, Feb 27, 2024 at 10:54 AM Mohammed Naser <mnaser@vexxhost.com> wrote:
Hi Satish,
I haven't tested this locally yet so waiting for CI to go through. Also, the label will be the one matching Magnum, so it will be availability_zone .
Thanks Mohammed ------------------------------ *From:* Satish Patel <satish.txt@gmail.com> *Sent:* February 27, 2024 10:35 AM *To:* Mohammed Naser <mnaser@vexxhost.com> *Cc:* OpenStack Discuss <openstack-discuss@lists.openstack.org> *Subject:* Re: [magnum][nova] AZ or Host aggregates not working as expected
Thank you Mohammed,
Can I just apply [1] patch manually and give it a try? Assuming after patch I can pass labels to the magnum template using controlPlaneAvailabilityZone=foo1 right?
1. https://github.com/vexxhost/magnum-cluster-api/pull/313/files
On Tue, Feb 27, 2024 at 10:25 AM Mohammed Naser <mnaser@vexxhost.com> wrote:
Hi Satish,
I've pushed a PR with what should be a fix:
https://github.com/vexxhost/magnum-cluster-api/pull/313
We've also taken advantage of this to enable you to create node groups in specific availability zones in the same patch, as well as fixing the control plane AZ.
Thanks! Mohammed ------------------------------ *From:* Mohammed Naser <mnaser@vexxhost.com> *Sent:* February 27, 2024 10:00 AM *To:* Satish Patel <satish.txt@gmail.com>; OpenStack Discuss < openstack-discuss@lists.openstack.org> *Subject:* Re: [magnum][nova] AZ or Host aggregates not working as expected
Hi Satish,
Right now this seems to be a small outstanding issue:
https://github.com/vexxhost/magnum-cluster-api/issues/257
I think you've started discussing with other users of the driver who've faced a similar issue. I'll leave some more details in the issue right now
Thanks, Mohammed ------------------------------ *From:* Satish Patel <satish.txt@gmail.com> *Sent:* February 26, 2024 10:21 PM *To:* OpenStack Discuss <openstack-discuss@lists.openstack.org> *Subject:* Re: [magnum][nova] AZ or Host aggregates not working as expected
Update:
After doing a bunch of testing I found only multi-master nodes not respecting Host Aggregation or AZ rules. Magnum trying to schedule masters on two different AZ (How do I tell magnum to not do that?)
If I build with a single master then everything works and lands on proper AZ.
On Mon, Feb 26, 2024 at 9:42 PM Satish Patel <satish.txt@gmail.com> wrote:
Folks,
I am running the kolla-ansible 2023.1 release of openstack and I have deployed magnum with ClusterAPI and things are working as expected except AZ.
I have two AZ and I have mapped them in flavor properties accordingly, 1. General 2. SRIOV
When I create k8s cluster from the horizon I do select "General" AZ to run my cluster in general AZ but somehow some nodes go to General compute pools and some go to SRIOV pool. It breaks things because of different networking in both pools.
For testing, when I launch VMs manually then they land on their desired AZ (or Host Aggregation pool) but only magnum or k8s do not understand AZ. I am clueless and not sure what is going on here.
# openstack flavor show gen.c4-m8-d40
+----------------------------+-----------------------------------------------+ | Field | Value |
+----------------------------+-----------------------------------------------+ | OS-FLV-DISABLED:disabled | False | | OS-FLV-EXT-DATA:ephemeral | 0 | | access_project_ids | None | | description | None | | disk | 40 | | id | c8088b3f-1e92-405d-b310-a50c25e7040d | | name | gen.c4-m8-d40 | | os-flavor-access:is_public | True | | properties | aggregate_instance_extra_specs:general='true' | | ram | 8000 | | rxtx_factor | 1.0 | | swap | 0 | | vcpus | 4 |
+----------------------------+-----------------------------------------------+
I did set property in AZ general=trure
# openstack availability zone list +-----------+-------------+ | Zone Name | Zone Status | +-----------+-------------+ | general | available | | internal | available | | sriov | available |
Hi Mohammed, After trying this root@k8s-eng-capi-01:~# kubectl -n magnum-system delete clusterclass/magnum-v0.13.4 clusterclass.cluster.x-k8s.io "magnum-v0.13.4" deleted Now I am stuck here: https://paste.opendev.org/show/bOxBY7tqeBmMxkoiYkV9/ On Tue, Feb 27, 2024 at 4:57 PM Satish Patel <satish.txt@gmail.com> wrote:
Thank Mohammed,
How do I quickly apply this patch? I tried to obtain a resource.py file and replace it with an existing one but that didn't go well and threw some strange errors in logs.
Maybe I am running a little behind the version of mcap. (v0.13.4)
On Tue, Feb 27, 2024 at 3:45 PM Mohammed Naser <mnaser@vexxhost.com> wrote:
Hi Satish,
The patch seems to have passed all testing. Can you try on your side and see if it did the trick? 🙂
Thanks, Mohammed ------------------------------ *From:* Satish Patel <satish.txt@gmail.com> *Sent:* February 27, 2024 11:03 AM *To:* Mohammed Naser <mnaser@vexxhost.com> *Cc:* OpenStack Discuss <openstack-discuss@lists.openstack.org> *Subject:* Re: [magnum][nova] AZ or Host aggregates not working as expected
Hi Mohammed,
Thank you for the update, so just pass label availability_zone=foo1 in the template and it will pass it to CAPI to make magic happen. I am going to give it a try manually and see how it goes.
On Tue, Feb 27, 2024 at 10:54 AM Mohammed Naser <mnaser@vexxhost.com> wrote:
Hi Satish,
I haven't tested this locally yet so waiting for CI to go through. Also, the label will be the one matching Magnum, so it will be availability_zone.
Thanks Mohammed ------------------------------ *From:* Satish Patel <satish.txt@gmail.com> *Sent:* February 27, 2024 10:35 AM *To:* Mohammed Naser <mnaser@vexxhost.com> *Cc:* OpenStack Discuss <openstack-discuss@lists.openstack.org> *Subject:* Re: [magnum][nova] AZ or Host aggregates not working as expected
Thank you Mohammed,
Can I just apply [1] patch manually and give it a try? Assuming after patch I can pass labels to the magnum template using controlPlaneAvailabilityZone=foo1 right?
1. https://github.com/vexxhost/magnum-cluster-api/pull/313/files
On Tue, Feb 27, 2024 at 10:25 AM Mohammed Naser <mnaser@vexxhost.com> wrote:
Hi Satish,
I've pushed a PR with what should be a fix:
https://github.com/vexxhost/magnum-cluster-api/pull/313
We've also taken advantage of this to enable you to create node groups in specific availability zones in the same patch, as well as fixing the control plane AZ.
Thanks! Mohammed ------------------------------ *From:* Mohammed Naser <mnaser@vexxhost.com> *Sent:* February 27, 2024 10:00 AM *To:* Satish Patel <satish.txt@gmail.com>; OpenStack Discuss < openstack-discuss@lists.openstack.org> *Subject:* Re: [magnum][nova] AZ or Host aggregates not working as expected
Hi Satish,
Right now this seems to be a small outstanding issue:
https://github.com/vexxhost/magnum-cluster-api/issues/257
I think you've started discussing with other users of the driver who've faced a similar issue. I'll leave some more details in the issue right now
Thanks, Mohammed ------------------------------ *From:* Satish Patel <satish.txt@gmail.com> *Sent:* February 26, 2024 10:21 PM *To:* OpenStack Discuss <openstack-discuss@lists.openstack.org> *Subject:* Re: [magnum][nova] AZ or Host aggregates not working as expected
Update:
After doing a bunch of testing I found only multi-master nodes not respecting Host Aggregation or AZ rules. Magnum trying to schedule masters on two different AZ (How do I tell magnum to not do that?)
If I build with a single master then everything works and lands on proper AZ.
On Mon, Feb 26, 2024 at 9:42 PM Satish Patel <satish.txt@gmail.com> wrote:
Folks,
I am running the kolla-ansible 2023.1 release of openstack and I have deployed magnum with ClusterAPI and things are working as expected except AZ.
I have two AZ and I have mapped them in flavor properties accordingly, 1. General 2. SRIOV
When I create k8s cluster from the horizon I do select "General" AZ to run my cluster in general AZ but somehow some nodes go to General compute pools and some go to SRIOV pool. It breaks things because of different networking in both pools.
For testing, when I launch VMs manually then they land on their desired AZ (or Host Aggregation pool) but only magnum or k8s do not understand AZ. I am clueless and not sure what is going on here.
# openstack flavor show gen.c4-m8-d40
+----------------------------+-----------------------------------------------+ | Field | Value |
+----------------------------+-----------------------------------------------+ | OS-FLV-DISABLED:disabled | False | | OS-FLV-EXT-DATA:ephemeral | 0 | | access_project_ids | None | | description | None | | disk | 40 | | id | c8088b3f-1e92-405d-b310-a50c25e7040d | | name | gen.c4-m8-d40 | | os-flavor-access:is_public | True | | properties | aggregate_instance_extra_specs:general='true' | | ram | 8000 | | rxtx_factor | 1.0 | | swap | 0 | | vcpus | 4 |
+----------------------------+-----------------------------------------------+
I did set property in AZ general=trure
# openstack availability zone list +-----------+-------------+ | Zone Name | Zone Status | +-----------+-------------+ | general | available | | internal | available | | sriov | available |
Can you look at your capo-controller-manager logs (and also capi-controller-manager logs). I suspect that we missed something in the formatting and it’s unhappy about what we sent to it Get Outlook for iOS<https://aka.ms/o0ukef> ________________________________ From: Satish Patel <satish.txt@gmail.com> Sent: Thursday, February 29, 2024 7:43:07 AM To: Mohammed Naser <mnaser@vexxhost.com> Cc: OpenStack Discuss <openstack-discuss@lists.openstack.org> Subject: Re: [magnum][nova] AZ or Host aggregates not working as expected Hi Mohammed, After trying this root@k8s-eng-capi-01:~# kubectl -n magnum-system delete clusterclass/magnum-v0.13.4 clusterclass.cluster.x-k8s.io<http://clusterclass.cluster.x-k8s.io/> "magnum-v0.13.4" deleted Now I am stuck here: https://paste.opendev.org/show/bOxBY7tqeBmMxkoiYkV9/ On Tue, Feb 27, 2024 at 4:57 PM Satish Patel <satish.txt@gmail.com<mailto:satish.txt@gmail.com>> wrote: Thank Mohammed, How do I quickly apply this patch? I tried to obtain a resource.py file and replace it with an existing one but that didn't go well and threw some strange errors in logs. Maybe I am running a little behind the version of mcap. (v0.13.4) On Tue, Feb 27, 2024 at 3:45 PM Mohammed Naser <mnaser@vexxhost.com<mailto:mnaser@vexxhost.com>> wrote: Hi Satish, The patch seems to have passed all testing. Can you try on your side and see if it did the trick? 🙂 Thanks, Mohammed ________________________________ From: Satish Patel <satish.txt@gmail.com<mailto:satish.txt@gmail.com>> Sent: February 27, 2024 11:03 AM To: Mohammed Naser <mnaser@vexxhost.com<mailto:mnaser@vexxhost.com>> Cc: OpenStack Discuss <openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org>> Subject: Re: [magnum][nova] AZ or Host aggregates not working as expected Hi Mohammed, Thank you for the update, so just pass label availability_zone=foo1 in the template and it will pass it to CAPI to make magic happen. I am going to give it a try manually and see how it goes. On Tue, Feb 27, 2024 at 10:54 AM Mohammed Naser <mnaser@vexxhost.com<mailto:mnaser@vexxhost.com>> wrote: Hi Satish, I haven't tested this locally yet so waiting for CI to go through. Also, the label will be the one matching Magnum, so it will be availability_zone. Thanks Mohammed ________________________________ From: Satish Patel <satish.txt@gmail.com<mailto:satish.txt@gmail.com>> Sent: February 27, 2024 10:35 AM To: Mohammed Naser <mnaser@vexxhost.com<mailto:mnaser@vexxhost.com>> Cc: OpenStack Discuss <openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org>> Subject: Re: [magnum][nova] AZ or Host aggregates not working as expected Thank you Mohammed, Can I just apply [1] patch manually and give it a try? Assuming after patch I can pass labels to the magnum template using controlPlaneAvailabilityZone=foo1 right? 1. https://github.com/vexxhost/magnum-cluster-api/pull/313/files On Tue, Feb 27, 2024 at 10:25 AM Mohammed Naser <mnaser@vexxhost.com<mailto:mnaser@vexxhost.com>> wrote: Hi Satish, I've pushed a PR with what should be a fix: https://github.com/vexxhost/magnum-cluster-api/pull/313 We've also taken advantage of this to enable you to create node groups in specific availability zones in the same patch, as well as fixing the control plane AZ. Thanks! Mohammed ________________________________ From: Mohammed Naser <mnaser@vexxhost.com<mailto:mnaser@vexxhost.com>> Sent: February 27, 2024 10:00 AM To: Satish Patel <satish.txt@gmail.com<mailto:satish.txt@gmail.com>>; OpenStack Discuss <openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org>> Subject: Re: [magnum][nova] AZ or Host aggregates not working as expected Hi Satish, Right now this seems to be a small outstanding issue: https://github.com/vexxhost/magnum-cluster-api/issues/257 I think you've started discussing with other users of the driver who've faced a similar issue. I'll leave some more details in the issue right now Thanks, Mohammed ________________________________ From: Satish Patel <satish.txt@gmail.com<mailto:satish.txt@gmail.com>> Sent: February 26, 2024 10:21 PM To: OpenStack Discuss <openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org>> Subject: Re: [magnum][nova] AZ or Host aggregates not working as expected Update: After doing a bunch of testing I found only multi-master nodes not respecting Host Aggregation or AZ rules. Magnum trying to schedule masters on two different AZ (How do I tell magnum to not do that?) If I build with a single master then everything works and lands on proper AZ. On Mon, Feb 26, 2024 at 9:42 PM Satish Patel <satish.txt@gmail.com<mailto:satish.txt@gmail.com>> wrote: Folks, I am running the kolla-ansible 2023.1 release of openstack and I have deployed magnum with ClusterAPI and things are working as expected except AZ. I have two AZ and I have mapped them in flavor properties accordingly, 1. General 2. SRIOV When I create k8s cluster from the horizon I do select "General" AZ to run my cluster in general AZ but somehow some nodes go to General compute pools and some go to SRIOV pool. It breaks things because of different networking in both pools. For testing, when I launch VMs manually then they land on their desired AZ (or Host Aggregation pool) but only magnum or k8s do not understand AZ. I am clueless and not sure what is going on here. # openstack flavor show gen.c4-m8-d40 +----------------------------+-----------------------------------------------+ | Field | Value | +----------------------------+-----------------------------------------------+ | OS-FLV-DISABLED:disabled | False | | OS-FLV-EXT-DATA:ephemeral | 0 | | access_project_ids | None | | description | None | | disk | 40 | | id | c8088b3f-1e92-405d-b310-a50c25e7040d | | name | gen.c4-m8-d40 | | os-flavor-access:is_public | True | | properties | aggregate_instance_extra_specs:general='true' | | ram | 8000 | | rxtx_factor | 1.0 | | swap | 0 | | vcpus | 4 | +----------------------------+-----------------------------------------------+ I did set property in AZ general=trure # openstack availability zone list +-----------+-------------+ | Zone Name | Zone Status | +-----------+-------------+ | general | available | | internal | available | | sriov | available |
Hi Mohammed, capo logs are empty or no activity but I am seeing some strange thing in capi logs [1] 1. https://paste.opendev.org/show/bBJfu7h3IKF2sVUObj52/ On Thu, Feb 29, 2024 at 7:58 AM Mohammed Naser <mnaser@vexxhost.com> wrote:
Can you look at your capo-controller-manager logs (and also capi-controller-manager logs). I suspect that we missed something in the formatting and it’s unhappy about what we sent to it
Get Outlook for iOS <https://aka.ms/o0ukef> ------------------------------ *From:* Satish Patel <satish.txt@gmail.com> *Sent:* Thursday, February 29, 2024 7:43:07 AM *To:* Mohammed Naser <mnaser@vexxhost.com> *Cc:* OpenStack Discuss <openstack-discuss@lists.openstack.org> *Subject:* Re: [magnum][nova] AZ or Host aggregates not working as expected
Hi Mohammed,
After trying this
root@k8s-eng-capi-01:~# kubectl -n magnum-system delete clusterclass/magnum-v0.13.4 clusterclass.cluster.x-k8s.io "magnum-v0.13.4" deleted
Now I am stuck here: https://paste.opendev.org/show/bOxBY7tqeBmMxkoiYkV9/
On Tue, Feb 27, 2024 at 4:57 PM Satish Patel <satish.txt@gmail.com> wrote:
Thank Mohammed,
How do I quickly apply this patch? I tried to obtain a resource.py file and replace it with an existing one but that didn't go well and threw some strange errors in logs.
Maybe I am running a little behind the version of mcap. (v0.13.4)
On Tue, Feb 27, 2024 at 3:45 PM Mohammed Naser <mnaser@vexxhost.com> wrote:
Hi Satish,
The patch seems to have passed all testing. Can you try on your side and see if it did the trick? 🙂
Thanks, Mohammed ------------------------------ *From:* Satish Patel <satish.txt@gmail.com> *Sent:* February 27, 2024 11:03 AM *To:* Mohammed Naser <mnaser@vexxhost.com> *Cc:* OpenStack Discuss <openstack-discuss@lists.openstack.org> *Subject:* Re: [magnum][nova] AZ or Host aggregates not working as expected
Hi Mohammed,
Thank you for the update, so just pass label availability_zone=foo1 in the template and it will pass it to CAPI to make magic happen. I am going to give it a try manually and see how it goes.
On Tue, Feb 27, 2024 at 10:54 AM Mohammed Naser <mnaser@vexxhost.com> wrote:
Hi Satish,
I haven't tested this locally yet so waiting for CI to go through. Also, the label will be the one matching Magnum, so it will be availability_zone .
Thanks Mohammed ------------------------------ *From:* Satish Patel <satish.txt@gmail.com> *Sent:* February 27, 2024 10:35 AM *To:* Mohammed Naser <mnaser@vexxhost.com> *Cc:* OpenStack Discuss <openstack-discuss@lists.openstack.org> *Subject:* Re: [magnum][nova] AZ or Host aggregates not working as expected
Thank you Mohammed,
Can I just apply [1] patch manually and give it a try? Assuming after patch I can pass labels to the magnum template using controlPlaneAvailabilityZone=foo1 right?
1. https://github.com/vexxhost/magnum-cluster-api/pull/313/files
On Tue, Feb 27, 2024 at 10:25 AM Mohammed Naser <mnaser@vexxhost.com> wrote:
Hi Satish,
I've pushed a PR with what should be a fix:
https://github.com/vexxhost/magnum-cluster-api/pull/313
We've also taken advantage of this to enable you to create node groups in specific availability zones in the same patch, as well as fixing the control plane AZ.
Thanks! Mohammed ------------------------------ *From:* Mohammed Naser <mnaser@vexxhost.com> *Sent:* February 27, 2024 10:00 AM *To:* Satish Patel <satish.txt@gmail.com>; OpenStack Discuss < openstack-discuss@lists.openstack.org> *Subject:* Re: [magnum][nova] AZ or Host aggregates not working as expected
Hi Satish,
Right now this seems to be a small outstanding issue:
https://github.com/vexxhost/magnum-cluster-api/issues/257
I think you've started discussing with other users of the driver who've faced a similar issue. I'll leave some more details in the issue right now
Thanks, Mohammed ------------------------------ *From:* Satish Patel <satish.txt@gmail.com> *Sent:* February 26, 2024 10:21 PM *To:* OpenStack Discuss <openstack-discuss@lists.openstack.org> *Subject:* Re: [magnum][nova] AZ or Host aggregates not working as expected
Update:
After doing a bunch of testing I found only multi-master nodes not respecting Host Aggregation or AZ rules. Magnum trying to schedule masters on two different AZ (How do I tell magnum to not do that?)
If I build with a single master then everything works and lands on proper AZ.
On Mon, Feb 26, 2024 at 9:42 PM Satish Patel <satish.txt@gmail.com> wrote:
Folks,
I am running the kolla-ansible 2023.1 release of openstack and I have deployed magnum with ClusterAPI and things are working as expected except AZ.
I have two AZ and I have mapped them in flavor properties accordingly, 1. General 2. SRIOV
When I create k8s cluster from the horizon I do select "General" AZ to run my cluster in general AZ but somehow some nodes go to General compute pools and some go to SRIOV pool. It breaks things because of different networking in both pools.
For testing, when I launch VMs manually then they land on their desired AZ (or Host Aggregation pool) but only magnum or k8s do not understand AZ. I am clueless and not sure what is going on here.
# openstack flavor show gen.c4-m8-d40
+----------------------------+-----------------------------------------------+ | Field | Value |
+----------------------------+-----------------------------------------------+ | OS-FLV-DISABLED:disabled | False | | OS-FLV-EXT-DATA:ephemeral | 0 | | access_project_ids | None | | description | None | | disk | 40 | | id | c8088b3f-1e92-405d-b310-a50c25e7040d | | name | gen.c4-m8-d40 | | os-flavor-access:is_public | True | | properties | aggregate_instance_extra_specs:general='true' | | ram | 8000 | | rxtx_factor | 1.0 | | swap | 0 | | vcpus | 4 |
+----------------------------+-----------------------------------------------+
I did set property in AZ general=trure
# openstack availability zone list +-----------+-------------+ | Zone Name | Zone Status | +-----------+-------------+ | general | available | | internal | available | | sriov | available |
FYI as closure for those who are interested, we've merged the fix that is confirmed working by Satish. It'll be in the next driver release. ________________________________ From: Mohammed Naser <mnaser@vexxhost.com> Sent: February 29, 2024 7:57 AM To: Satish Patel <satish.txt@gmail.com> Cc: OpenStack Discuss <openstack-discuss@lists.openstack.org> Subject: Re: [magnum][nova] AZ or Host aggregates not working as expected Can you look at your capo-controller-manager logs (and also capi-controller-manager logs). I suspect that we missed something in the formatting and it’s unhappy about what we sent to it Get Outlook for iOS<https://aka.ms/o0ukef> ________________________________ From: Satish Patel <satish.txt@gmail.com> Sent: Thursday, February 29, 2024 7:43:07 AM To: Mohammed Naser <mnaser@vexxhost.com> Cc: OpenStack Discuss <openstack-discuss@lists.openstack.org> Subject: Re: [magnum][nova] AZ or Host aggregates not working as expected Hi Mohammed, After trying this root@k8s-eng-capi-01:~# kubectl -n magnum-system delete clusterclass/magnum-v0.13.4 clusterclass.cluster.x-k8s.io<http://clusterclass.cluster.x-k8s.io/> "magnum-v0.13.4" deleted Now I am stuck here: https://paste.opendev.org/show/bOxBY7tqeBmMxkoiYkV9/ On Tue, Feb 27, 2024 at 4:57 PM Satish Patel <satish.txt@gmail.com<mailto:satish.txt@gmail.com>> wrote: Thank Mohammed, How do I quickly apply this patch? I tried to obtain a resource.py file and replace it with an existing one but that didn't go well and threw some strange errors in logs. Maybe I am running a little behind the version of mcap. (v0.13.4) On Tue, Feb 27, 2024 at 3:45 PM Mohammed Naser <mnaser@vexxhost.com<mailto:mnaser@vexxhost.com>> wrote: Hi Satish, The patch seems to have passed all testing. Can you try on your side and see if it did the trick? 🙂 Thanks, Mohammed ________________________________ From: Satish Patel <satish.txt@gmail.com<mailto:satish.txt@gmail.com>> Sent: February 27, 2024 11:03 AM To: Mohammed Naser <mnaser@vexxhost.com<mailto:mnaser@vexxhost.com>> Cc: OpenStack Discuss <openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org>> Subject: Re: [magnum][nova] AZ or Host aggregates not working as expected Hi Mohammed, Thank you for the update, so just pass label availability_zone=foo1 in the template and it will pass it to CAPI to make magic happen. I am going to give it a try manually and see how it goes. On Tue, Feb 27, 2024 at 10:54 AM Mohammed Naser <mnaser@vexxhost.com<mailto:mnaser@vexxhost.com>> wrote: Hi Satish, I haven't tested this locally yet so waiting for CI to go through. Also, the label will be the one matching Magnum, so it will be availability_zone. Thanks Mohammed ________________________________ From: Satish Patel <satish.txt@gmail.com<mailto:satish.txt@gmail.com>> Sent: February 27, 2024 10:35 AM To: Mohammed Naser <mnaser@vexxhost.com<mailto:mnaser@vexxhost.com>> Cc: OpenStack Discuss <openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org>> Subject: Re: [magnum][nova] AZ or Host aggregates not working as expected Thank you Mohammed, Can I just apply [1] patch manually and give it a try? Assuming after patch I can pass labels to the magnum template using controlPlaneAvailabilityZone=foo1 right? 1. https://github.com/vexxhost/magnum-cluster-api/pull/313/files On Tue, Feb 27, 2024 at 10:25 AM Mohammed Naser <mnaser@vexxhost.com<mailto:mnaser@vexxhost.com>> wrote: Hi Satish, I've pushed a PR with what should be a fix: https://github.com/vexxhost/magnum-cluster-api/pull/313 We've also taken advantage of this to enable you to create node groups in specific availability zones in the same patch, as well as fixing the control plane AZ. Thanks! Mohammed ________________________________ From: Mohammed Naser <mnaser@vexxhost.com<mailto:mnaser@vexxhost.com>> Sent: February 27, 2024 10:00 AM To: Satish Patel <satish.txt@gmail.com<mailto:satish.txt@gmail.com>>; OpenStack Discuss <openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org>> Subject: Re: [magnum][nova] AZ or Host aggregates not working as expected Hi Satish, Right now this seems to be a small outstanding issue: https://github.com/vexxhost/magnum-cluster-api/issues/257 I think you've started discussing with other users of the driver who've faced a similar issue. I'll leave some more details in the issue right now Thanks, Mohammed ________________________________ From: Satish Patel <satish.txt@gmail.com<mailto:satish.txt@gmail.com>> Sent: February 26, 2024 10:21 PM To: OpenStack Discuss <openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org>> Subject: Re: [magnum][nova] AZ or Host aggregates not working as expected Update: After doing a bunch of testing I found only multi-master nodes not respecting Host Aggregation or AZ rules. Magnum trying to schedule masters on two different AZ (How do I tell magnum to not do that?) If I build with a single master then everything works and lands on proper AZ. On Mon, Feb 26, 2024 at 9:42 PM Satish Patel <satish.txt@gmail.com<mailto:satish.txt@gmail.com>> wrote: Folks, I am running the kolla-ansible 2023.1 release of openstack and I have deployed magnum with ClusterAPI and things are working as expected except AZ. I have two AZ and I have mapped them in flavor properties accordingly, 1. General 2. SRIOV When I create k8s cluster from the horizon I do select "General" AZ to run my cluster in general AZ but somehow some nodes go to General compute pools and some go to SRIOV pool. It breaks things because of different networking in both pools. For testing, when I launch VMs manually then they land on their desired AZ (or Host Aggregation pool) but only magnum or k8s do not understand AZ. I am clueless and not sure what is going on here. # openstack flavor show gen.c4-m8-d40 +----------------------------+-----------------------------------------------+ | Field | Value | +----------------------------+-----------------------------------------------+ | OS-FLV-DISABLED:disabled | False | | OS-FLV-EXT-DATA:ephemeral | 0 | | access_project_ids | None | | description | None | | disk | 40 | | id | c8088b3f-1e92-405d-b310-a50c25e7040d | | name | gen.c4-m8-d40 | | os-flavor-access:is_public | True | | properties | aggregate_instance_extra_specs:general='true' | | ram | 8000 | | rxtx_factor | 1.0 | | swap | 0 | | vcpus | 4 | +----------------------------+-----------------------------------------------+ I did set property in AZ general=trure # openstack availability zone list +-----------+-------------+ | Zone Name | Zone Status | +-----------+-------------+ | general | available | | internal | available | | sriov | available |
Thank you so much for helping out with this issue. Can't wait for the next release to push in production. On Mon, Mar 4, 2024 at 10:09 AM Mohammed Naser <mnaser@vexxhost.com> wrote:
FYI as closure for those who are interested, we've merged the fix that is confirmed working by Satish. It'll be in the next driver release.
------------------------------ *From:* Mohammed Naser <mnaser@vexxhost.com> *Sent:* February 29, 2024 7:57 AM *To:* Satish Patel <satish.txt@gmail.com> *Cc:* OpenStack Discuss <openstack-discuss@lists.openstack.org> *Subject:* Re: [magnum][nova] AZ or Host aggregates not working as expected
Can you look at your capo-controller-manager logs (and also capi-controller-manager logs). I suspect that we missed something in the formatting and it’s unhappy about what we sent to it
Get Outlook for iOS <https://aka.ms/o0ukef> ------------------------------ *From:* Satish Patel <satish.txt@gmail.com> *Sent:* Thursday, February 29, 2024 7:43:07 AM *To:* Mohammed Naser <mnaser@vexxhost.com> *Cc:* OpenStack Discuss <openstack-discuss@lists.openstack.org> *Subject:* Re: [magnum][nova] AZ or Host aggregates not working as expected
Hi Mohammed,
After trying this
root@k8s-eng-capi-01:~# kubectl -n magnum-system delete clusterclass/magnum-v0.13.4 clusterclass.cluster.x-k8s.io "magnum-v0.13.4" deleted
Now I am stuck here: https://paste.opendev.org/show/bOxBY7tqeBmMxkoiYkV9/
On Tue, Feb 27, 2024 at 4:57 PM Satish Patel <satish.txt@gmail.com> wrote:
Thank Mohammed,
How do I quickly apply this patch? I tried to obtain a resource.py file and replace it with an existing one but that didn't go well and threw some strange errors in logs.
Maybe I am running a little behind the version of mcap. (v0.13.4)
On Tue, Feb 27, 2024 at 3:45 PM Mohammed Naser <mnaser@vexxhost.com> wrote:
Hi Satish,
The patch seems to have passed all testing. Can you try on your side and see if it did the trick? 🙂
Thanks, Mohammed ------------------------------ *From:* Satish Patel <satish.txt@gmail.com> *Sent:* February 27, 2024 11:03 AM *To:* Mohammed Naser <mnaser@vexxhost.com> *Cc:* OpenStack Discuss <openstack-discuss@lists.openstack.org> *Subject:* Re: [magnum][nova] AZ or Host aggregates not working as expected
Hi Mohammed,
Thank you for the update, so just pass label availability_zone=foo1 in the template and it will pass it to CAPI to make magic happen. I am going to give it a try manually and see how it goes.
On Tue, Feb 27, 2024 at 10:54 AM Mohammed Naser <mnaser@vexxhost.com> wrote:
Hi Satish,
I haven't tested this locally yet so waiting for CI to go through. Also, the label will be the one matching Magnum, so it will be availability_zone .
Thanks Mohammed ------------------------------ *From:* Satish Patel <satish.txt@gmail.com> *Sent:* February 27, 2024 10:35 AM *To:* Mohammed Naser <mnaser@vexxhost.com> *Cc:* OpenStack Discuss <openstack-discuss@lists.openstack.org> *Subject:* Re: [magnum][nova] AZ or Host aggregates not working as expected
Thank you Mohammed,
Can I just apply [1] patch manually and give it a try? Assuming after patch I can pass labels to the magnum template using controlPlaneAvailabilityZone=foo1 right?
1. https://github.com/vexxhost/magnum-cluster-api/pull/313/files
On Tue, Feb 27, 2024 at 10:25 AM Mohammed Naser <mnaser@vexxhost.com> wrote:
Hi Satish,
I've pushed a PR with what should be a fix:
https://github.com/vexxhost/magnum-cluster-api/pull/313
We've also taken advantage of this to enable you to create node groups in specific availability zones in the same patch, as well as fixing the control plane AZ.
Thanks! Mohammed ------------------------------ *From:* Mohammed Naser <mnaser@vexxhost.com> *Sent:* February 27, 2024 10:00 AM *To:* Satish Patel <satish.txt@gmail.com>; OpenStack Discuss < openstack-discuss@lists.openstack.org> *Subject:* Re: [magnum][nova] AZ or Host aggregates not working as expected
Hi Satish,
Right now this seems to be a small outstanding issue:
https://github.com/vexxhost/magnum-cluster-api/issues/257
I think you've started discussing with other users of the driver who've faced a similar issue. I'll leave some more details in the issue right now
Thanks, Mohammed ------------------------------ *From:* Satish Patel <satish.txt@gmail.com> *Sent:* February 26, 2024 10:21 PM *To:* OpenStack Discuss <openstack-discuss@lists.openstack.org> *Subject:* Re: [magnum][nova] AZ or Host aggregates not working as expected
Update:
After doing a bunch of testing I found only multi-master nodes not respecting Host Aggregation or AZ rules. Magnum trying to schedule masters on two different AZ (How do I tell magnum to not do that?)
If I build with a single master then everything works and lands on proper AZ.
On Mon, Feb 26, 2024 at 9:42 PM Satish Patel <satish.txt@gmail.com> wrote:
Folks,
I am running the kolla-ansible 2023.1 release of openstack and I have deployed magnum with ClusterAPI and things are working as expected except AZ.
I have two AZ and I have mapped them in flavor properties accordingly, 1. General 2. SRIOV
When I create k8s cluster from the horizon I do select "General" AZ to run my cluster in general AZ but somehow some nodes go to General compute pools and some go to SRIOV pool. It breaks things because of different networking in both pools.
For testing, when I launch VMs manually then they land on their desired AZ (or Host Aggregation pool) but only magnum or k8s do not understand AZ. I am clueless and not sure what is going on here.
# openstack flavor show gen.c4-m8-d40
+----------------------------+-----------------------------------------------+ | Field | Value |
+----------------------------+-----------------------------------------------+ | OS-FLV-DISABLED:disabled | False | | OS-FLV-EXT-DATA:ephemeral | 0 | | access_project_ids | None | | description | None | | disk | 40 | | id | c8088b3f-1e92-405d-b310-a50c25e7040d | | name | gen.c4-m8-d40 | | os-flavor-access:is_public | True | | properties | aggregate_instance_extra_specs:general='true' | | ram | 8000 | | rxtx_factor | 1.0 | | swap | 0 | | vcpus | 4 |
+----------------------------+-----------------------------------------------+
I did set property in AZ general=trure
# openstack availability zone list +-----------+-------------+ | Zone Name | Zone Status | +-----------+-------------+ | general | available | | internal | available | | sriov | available |
participants (2)
-
Mohammed Naser
-
Satish Patel