[Magnum][Kayobe] Magnum Kubernetes clusters failing to be created (bugs?)
Hi guys, I hope you are all keeping safe and well at the moment. I am trying to launch Kubernetes clusters into Openstack Train which has been deployed via Kayobe (Kayobe as I understand is a wrapper for kolla-ansible). There have been a few strange issues here and I've struggled to isolate them. These issues started recently after a fresh Openstack deployment some months ago (around February 2020) to give some context. This Openstack is not "live" as I've been trying to get to the bottom of the issues: Issue 1. When trying to launch a cluster we get error "Resource Create Failed: Forbidden: Resources.Kube Masters.Resources[0].Resources.Kube-Master: Only Volume-Backed Servers Are Allowed For Flavors With Zero Disk. " Issue 2. After successfully creating a cluster of a smaller node size, the "resize cluster" is failing (however update the cluster is working). Some background on this specific environment: Deployed via Kayobe, with these components: Cinder, Designate, iscsid, Magnum, Multipathd, neutron provider networks The Cinder component integrates with iSCSI SAN storage using the Nimble driver. This is the only storage. In order to prevent Openstack from allocating Compute node local HDD as instance storage, I have all flavours configured with root disk / ephemeral disk / swap disk = "0MB". This then results in all instance data being stored on the backend Cinder storage appliance. I was able to get a cluster deployed by first creating the template as needed, then when launching the cluster Horizon prompts you for items already there in the template such as number of nodes, node flavour and labels etc. I re-supplied all of the info (as to duplicate it) and then tried creating the cluster. After many many times trying over the course of a few weeks to a few months it was successful. I was then able to work around the issue #2 above to get it increased in size. When looking at the logs for issue #2, it looks like some content is missing in the API but I am not certain. I will include a link to the pastebin below [1]. When trying to resize the cluster, Horizon gives error: "Error: Unable to resize given cluster id: 99693dbf-160a-40e0-9ed4-93f3370367ee". I then searched the controller node /var/log directory for this ID and found "horizon.log [:error] [pid 25] Not Found: /api/container_infra/clusters/99693dbf-160a-40e0-9ed4-93f3370367ee/resize". Going to the Horizon menu "update cluster" allows you to increase the number of nodes and then save/apply the config which does indeed resize the cluster. Regarding issue #1, we've been unable to deploy a cluster in a new project and the error is hinting it relates to the flavours having 0MB disk specified, though this error is new and we've been successful previously with deploying clusters (albeit with the hit-and-miss experiences) using the flavour with 0MB disk as described above. Again I searched for the (stack) ID after the failure, in the logs on the controller and I obtained not much more than the error already seen with Horizon [2]. I was able to create new flavours with root disk = 15GB and then successfully deploy a cluster on the next immediate try. Update cluster from 3 nodes to 6 nodes was also immediately successful. However I see the compute nodes "used" disk space increasing after increasing the cluster size which is an issue as the compute node has very limited HDD capacity (32GB SD card). At this point I also checked 1) previously installed cluster using the 0MB disk flavour and 2) new instances using the 0MB disk flavour. I notice that the previous cluster is having host storage allocated but while the new instance is not having host storage allocated. So the cluster create success is using flavour with disk = 0MB while the result is compute HDD storage being consumed. So with the above, please may I clarify on the following? 1. It seems that 0MB disk flavours may not be supported with magnum now? Could the experts confirm? :) Is there another way that I should be configuring this so that compute node disk is not being consumed (because it is slow and has limited capacity). 2. The issue #1 looks like a bug to me, is it known? If not, is this mail enough to get it realised? Pastebin links as mentioned [1] http://paste.openstack.org/show/797316/ [2] http://paste.openstack.org/show/797318/ Many thanks, Regards, Tony Pearce
Hi Tony, My comments about your two issues: 1. I'm not sure it's a Magnum issue. Did you try to draft a simple Heat template to use that flavor and same image to create instance? Does it work? 2. When you say "resize cluster" failed, what's the error you got from magnum conductor log? On 1/09/20 9:22 pm, Tony Pearce wrote:
Hi guys, I hope you are all keeping safe and well at the moment.
I am trying to launch Kubernetes clusters into Openstack Train which has been deployed via Kayobe (Kayobe as I understand is a wrapper for kolla-ansible). There have been a few strange issues here and I've struggled to isolate them. These issues started recently after a fresh Openstack deployment some months ago (around February 2020) to give some context. This Openstack is not "live" as I've been trying to get to the bottom of the issues:
Issue 1. When trying to launch a cluster we get error "Resource Create Failed: Forbidden: Resources.Kube Masters.Resources[0].Resources.Kube-Master: Only Volume-Backed Servers Are Allowed For Flavors With Zero Disk. "
Issue 2. After successfully creating a cluster of a smaller node size, the "resize cluster" is failing (however update the cluster is working).
Some background on this specific environment: Deployed via Kayobe, with these components: Cinder, Designate, iscsid, Magnum, Multipathd, neutron provider networks
The Cinder component integrates with iSCSI SAN storage using the Nimble driver. This is the only storage. In order to prevent Openstack from allocating Compute node local HDD as instance storage, I have all flavours configured with root disk / ephemeral disk / swap disk = "0MB". This then results in all instance data being stored on the backend Cinder storage appliance.
I was able to get a cluster deployed by first creating the template as needed, then when launching the cluster Horizon prompts you for items already there in the template such as number of nodes, node flavour and labels etc. I re-supplied all of the info (as to duplicate it) and then tried creating the cluster. After many many times trying over the course of a few weeks to a few months it was successful. I was then able to work around the issue #2 above to get it increased in size.
When looking at the logs for issue #2, it looks like some content is missing in the API but I am not certain. I will include a link to the pastebin below [1]. When trying to resize the cluster, Horizon gives error: "Error: Unable to resize given cluster id: 99693dbf-160a-40e0-9ed4-93f3370367ee". I then searched the controller node /var/log directory for this ID and found "horizon.log [:error] [pid 25] Not Found: /api/container_infra/clusters/99693dbf-160a-40e0-9ed4-93f3370367ee/resize". Going to the Horizon menu "update cluster" allows you to increase the number of nodes and then save/apply the config which does indeed resize the cluster.
Regarding issue #1, we've been unable to deploy a cluster in a new project and the error is hinting it relates to the flavours having 0MB disk specified, though this error is new and we've been successful previously with deploying clusters (albeit with the hit-and-miss experiences) using the flavour with 0MB disk as described above. Again I searched for the (stack) ID after the failure, in the logs on the controller and I obtained not much more than the error already seen with Horizon [2].
I was able to create new flavours with root disk = 15GB and then successfully deploy a cluster on the next immediate try. Update cluster from 3 nodes to 6 nodes was also immediately successful. However I see the compute nodes "used" disk space increasing after increasing the cluster size which is an issue as the compute node has very limited HDD capacity (32GB SD card).
At this point I also checked 1) previously installed cluster using the 0MB disk flavour and 2) new instances using the 0MB disk flavour. I notice that the previous cluster is having host storage allocated but while the new instance is not having host storage allocated. So the cluster create success is using flavour with disk = 0MB while the result is compute HDD storage being consumed.
So with the above, please may I clarify on the following? 1. It seems that 0MB disk flavours may not be supported with magnum now? Could the experts confirm? :) Is there another way that I should be configuring this so that compute node disk is not being consumed (because it is slow and has limited capacity). 2. The issue #1 looks like a bug to me, is it known? If not, is this mail enough to get it realised?
Pastebin links as mentioned [1] http://paste.openstack.org/show/797316/
[2] http://paste.openstack.org/show/797318/
Many thanks,
Regards,
Tony Pearce
-- Cheers & Best regards, Feilong Wang (王飞龙) ------------------------------------------------------ Senior Cloud Software Engineer Tel: +64-48032246 Email: flwang@catalyst.net.nz Catalyst IT Limited Level 6, Catalyst House, 150 Willis Street, Wellington ------------------------------------------------------
Hi Feilong, thank you for replying to my message. "1. I'm not sure it's a Magnum issue. Did you try to draft a simple Heat template to use that flavor and same image to create instance? Does it work?" No I didn't try it and dont think that I know enough about it to try. I am using Magnum which in turn signals Heat but I've never used Heat directly. When I use the new flavour with root disk = 15GB then I dont have any issue with launching the cluster. But I have a future issue of consuming all available disk space on the compute node. "2. When you say "resize cluster" failed, what's the error you got from magnum conductor log?" I did not see any error in conductor log. Only the Magnum API and Horizon log as mentioned. It looks like horizon was calling bad URLs so maybe this is the reason why there was no conductor log? Just to mention again though, that "update cluster" option is working fine to increase the size of the cluster. However my main issue here is with regards to the flavour being used. Can you or anyone confirm about the root disk = 0MB? OR can you or anyone share any information about how to utilise Magnum/Kubernetes without consuming Compute node HDD storage? I've been unable to achieve this and the docs do not give any information about this specifically (unless of course I have missed it?). The documentation says I can use any flavour [1]. [1] https://docs.openstack.org/magnum/latest/user/ Regards, Tony Pearce On Wed, 2 Sep 2020 at 15:44, feilong <feilong@catalyst.net.nz> wrote:
Hi Tony,
My comments about your two issues:
1. I'm not sure it's a Magnum issue. Did you try to draft a simple Heat template to use that flavor and same image to create instance? Does it work?
2. When you say "resize cluster" failed, what's the error you got from magnum conductor log?
On 1/09/20 9:22 pm, Tony Pearce wrote:
Hi guys, I hope you are all keeping safe and well at the moment.
I am trying to launch Kubernetes clusters into Openstack Train which has been deployed via Kayobe (Kayobe as I understand is a wrapper for kolla-ansible). There have been a few strange issues here and I've struggled to isolate them. These issues started recently after a fresh Openstack deployment some months ago (around February 2020) to give some context. This Openstack is not "live" as I've been trying to get to the bottom of the issues:
Issue 1. When trying to launch a cluster we get error "Resource Create Failed: Forbidden: Resources.Kube Masters.Resources[0].Resources.Kube-Master: Only Volume-Backed Servers Are Allowed For Flavors With Zero Disk. "
Issue 2. After successfully creating a cluster of a smaller node size, the "resize cluster" is failing (however update the cluster is working).
Some background on this specific environment: Deployed via Kayobe, with these components: Cinder, Designate, iscsid, Magnum, Multipathd, neutron provider networks
The Cinder component integrates with iSCSI SAN storage using the Nimble driver. This is the only storage. In order to prevent Openstack from allocating Compute node local HDD as instance storage, I have all flavours configured with root disk / ephemeral disk / swap disk = "0MB". This then results in all instance data being stored on the backend Cinder storage appliance.
I was able to get a cluster deployed by first creating the template as needed, then when launching the cluster Horizon prompts you for items already there in the template such as number of nodes, node flavour and labels etc. I re-supplied all of the info (as to duplicate it) and then tried creating the cluster. After many many times trying over the course of a few weeks to a few months it was successful. I was then able to work around the issue #2 above to get it increased in size.
When looking at the logs for issue #2, it looks like some content is missing in the API but I am not certain. I will include a link to the pastebin below [1]. When trying to resize the cluster, Horizon gives error: "Error: Unable to resize given cluster id: 99693dbf-160a-40e0-9ed4-93f3370367ee". I then searched the controller node /var/log directory for this ID and found "horizon.log [:error] [pid 25] Not Found: /api/container_infra/clusters/99693dbf-160a-40e0-9ed4-93f3370367ee/resize". Going to the Horizon menu "update cluster" allows you to increase the number of nodes and then save/apply the config which does indeed resize the cluster.
Regarding issue #1, we've been unable to deploy a cluster in a new project and the error is hinting it relates to the flavours having 0MB disk specified, though this error is new and we've been successful previously with deploying clusters (albeit with the hit-and-miss experiences) using the flavour with 0MB disk as described above. Again I searched for the (stack) ID after the failure, in the logs on the controller and I obtained not much more than the error already seen with Horizon [2].
I was able to create new flavours with root disk = 15GB and then successfully deploy a cluster on the next immediate try. Update cluster from 3 nodes to 6 nodes was also immediately successful. However I see the compute nodes "used" disk space increasing after increasing the cluster size which is an issue as the compute node has very limited HDD capacity (32GB SD card).
At this point I also checked 1) previously installed cluster using the 0MB disk flavour and 2) new instances using the 0MB disk flavour. I notice that the previous cluster is having host storage allocated but while the new instance is not having host storage allocated. So the cluster create success is using flavour with disk = 0MB while the result is compute HDD storage being consumed.
So with the above, please may I clarify on the following? 1. It seems that 0MB disk flavours may not be supported with magnum now? Could the experts confirm? :) Is there another way that I should be configuring this so that compute node disk is not being consumed (because it is slow and has limited capacity). 2. The issue #1 looks like a bug to me, is it known? If not, is this mail enough to get it realised?
Pastebin links as mentioned [1] http://paste.openstack.org/show/797316/
[2] http://paste.openstack.org/show/797318/
Many thanks,
Regards,
Tony Pearce
-- Cheers & Best regards, Feilong Wang (王飞龙) ------------------------------------------------------ Senior Cloud Software Engineer Tel: +64-48032246 Email: flwang@catalyst.net.nz Catalyst IT Limited Level 6, Catalyst House, 150 Willis Street, Wellington ------------------------------------------------------
Hi Tony, Let me answer #2 first. Did you try to use CLI? Please make sure using the latest python-magnumclient version. It should work. As for the dashboard issue, please try to use the latest version of magnum-ui. I encourage using resize because the node update is not recommended to use. As for #1, I probably missed something. If the root disk=0MB, where the will operating system be installed? It would be nice if you can share your original requirement to help me understand the issue. e.g why do you have concern the node disk being used? On 2/09/20 8:12 pm, Tony Pearce wrote:
Hi Feilong, thank you for replying to my message.
"1. I'm not sure it's a Magnum issue. Did you try to draft a simple Heat template to use that flavor and same image to create instance? Does it work?"
No I didn't try it and dont think that I know enough about it to try. I am using Magnum which in turn signals Heat but I've never used Heat directly. When I use the new flavour with root disk = 15GB then I dont have any issue with launching the cluster. But I have a future issue of consuming all available disk space on the compute node.
"2. When you say "resize cluster" failed, what's the error you got from magnum conductor log?"
I did not see any error in conductor log. Only the Magnum API and Horizon log as mentioned. It looks like horizon was calling bad URLs so maybe this is the reason why there was no conductor log? Just to mention again though, that "update cluster" option is working fine to increase the size of the cluster.
However my main issue here is with regards to the flavour being used. Can you or anyone confirm about the root disk = 0MB? OR can you or anyone share any information about how to utilise Magnum/Kubernetes without consuming Compute node HDD storage? I've been unable to achieve this and the docs do not give any information about this specifically (unless of course I have missed it?). The documentation says I can use any flavour [1].
[1] https://docs.openstack.org/magnum/latest/user/
Regards,
Tony Pearce
On Wed, 2 Sep 2020 at 15:44, feilong <feilong@catalyst.net.nz <mailto:feilong@catalyst.net.nz>> wrote:
Hi Tony,
My comments about your two issues:
1. I'm not sure it's a Magnum issue. Did you try to draft a simple Heat template to use that flavor and same image to create instance? Does it work?
2. When you say "resize cluster" failed, what's the error you got from magnum conductor log?
On 1/09/20 9:22 pm, Tony Pearce wrote:
Hi guys, I hope you are all keeping safe and well at the moment.
I am trying to launch Kubernetes clusters into Openstack Train which has been deployed via Kayobe (Kayobe as I understand is a wrapper for kolla-ansible). There have been a few strange issues here and I've struggled to isolate them. These issues started recently after a fresh Openstack deployment some months ago (around February 2020) to give some context. This Openstack is not "live" as I've been trying to get to the bottom of the issues:
Issue 1. When trying to launch a cluster we get error "Resource Create Failed: Forbidden: Resources.Kube Masters.Resources[0].Resources.Kube-Master: Only Volume-Backed Servers Are Allowed For Flavors With Zero Disk. "
Issue 2. After successfully creating a cluster of a smaller node size, the "resize cluster" is failing (however update the cluster is working).
Some background on this specific environment: Deployed via Kayobe, with these components: Cinder, Designate, iscsid, Magnum, Multipathd, neutron provider networks
The Cinder component integrates with iSCSI SAN storage using the Nimble driver. This is the only storage. In order to prevent Openstack from allocating Compute node local HDD as instance storage, I have all flavours configured with root disk / ephemeral disk / swap disk = "0MB". This then results in all instance data being stored on the backend Cinder storage appliance.
I was able to get a cluster deployed by first creating the template as needed, then when launching the cluster Horizon prompts you for items already there in the template such as number of nodes, node flavour and labels etc. I re-supplied all of the info (as to duplicate it) and then tried creating the cluster. After many many times trying over the course of a few weeks to a few months it was successful. I was then able to work around the issue #2 above to get it increased in size.
When looking at the logs for issue #2, it looks like some content is missing in the API but I am not certain. I will include a link to the pastebin below [1]. When trying to resize the cluster, Horizon gives error: "Error: Unable to resize given cluster id: 99693dbf-160a-40e0-9ed4-93f3370367ee". I then searched the controller node /var/log directory for this ID and found "horizon.log [:error] [pid 25] Not Found: /api/container_infra/clusters/99693dbf-160a-40e0-9ed4-93f3370367ee/resize". Going to the Horizon menu "update cluster" allows you to increase the number of nodes and then save/apply the config which does indeed resize the cluster.
Regarding issue #1, we've been unable to deploy a cluster in a new project and the error is hinting it relates to the flavours having 0MB disk specified, though this error is new and we've been successful previously with deploying clusters (albeit with the hit-and-miss experiences) using the flavour with 0MB disk as described above. Again I searched for the (stack) ID after the failure, in the logs on the controller and I obtained not much more than the error already seen with Horizon [2].
I was able to create new flavours with root disk = 15GB and then successfully deploy a cluster on the next immediate try. Update cluster from 3 nodes to 6 nodes was also immediately successful. However I see the compute nodes "used" disk space increasing after increasing the cluster size which is an issue as the compute node has very limited HDD capacity (32GB SD card).
At this point I also checked 1) previously installed cluster using the 0MB disk flavour and 2) new instances using the 0MB disk flavour. I notice that the previous cluster is having host storage allocated but while the new instance is not having host storage allocated. So the cluster create success is using flavour with disk = 0MB while the result is compute HDD storage being consumed.
So with the above, please may I clarify on the following? 1. It seems that 0MB disk flavours may not be supported with magnum now? Could the experts confirm? :) Is there another way that I should be configuring this so that compute node disk is not being consumed (because it is slow and has limited capacity). 2. The issue #1 looks like a bug to me, is it known? If not, is this mail enough to get it realised?
Pastebin links as mentioned [1] http://paste.openstack.org/show/797316/
[2] http://paste.openstack.org/show/797318/
Many thanks,
Regards,
Tony Pearce
-- Cheers & Best regards, Feilong Wang (王飞龙) ------------------------------------------------------ Senior Cloud Software Engineer Tel: +64-48032246 Email: flwang@catalyst.net.nz <mailto:flwang@catalyst.net.nz> Catalyst IT Limited Level 6, Catalyst House, 150 Willis Street, Wellington ------------------------------------------------------
-- Cheers & Best regards, Feilong Wang (王飞龙) ------------------------------------------------------ Senior Cloud Software Engineer Tel: +64-48032246 Email: flwang@catalyst.net.nz Catalyst IT Limited Level 6, Catalyst House, 150 Willis Street, Wellington ------------------------------------------------------
Hi Feilong, "Let me answer #2 first. Did you try to use CLI? Please make sure using the latest python-magnumclient version. It should work. As for the dashboard issue, please try to use the latest version of magnum-ui. I encourage using resize because the node update is not recommended to use." I did not attempt the resize via CLI but I can try it. Thank you for your guidance on this :) "As for #1, I probably missed something. If the root disk=0MB, where the will operating system be installed? t would be nice if you can share your original requirement to help me understand the issue. e.g why do you have concern the node disk being used?" Sure, although I'd like to understand why you have no concern that the node disk is being used :) I may be missing something here... In this environment I have this setup: controller node compute node network storage appliance, integrated with Cinder iscsi. All VM/Instance data needs to be on the network storage appliance for the reasons; - it's faster than node storage (Flash storage backed array of disks, provides write-cache and read-cache) - resilience built into the array - has much higher storage capacity - is designed for multi-access (ie many connections from hosts) There are other reasons as well, such as deploying compute nodes as disposable services. Example, a compute node dies resulting in a new node being deployed. Instances are not locked to any node and can be started again on other nodes. Going back to 2016 when I deployed Openstack Pike, when running post-deployment tests I noticed that the node storage was being consumed even though I have this network storage array. I done some research online and came to the understanding that the reason was the flavors having "root disk" (and swap) having some positive value other than 0MB. So since 2016 I have been using all flavors with disk = 0MB to force the network storage to be used for instance disks and storage. This is working since 2016 Pike, Queens and Train for launching instances (but not Magnum). The requirement is to utilise network storage (not node storage) - is there some other way that this is achieved today? I dont understand the point of shared storage options in Openstack if node storage is being consumed for instances. Could you help me understand if this specific environment is just not considered by the Openstack devs? Or some other reason unknown to me? For example, in my (limited) experience with other Virtualisation systems (vmware, ovirt for example) they avoid consuming compute storage for a number of similar reasons to mine. So to summarise on this one, I'm not stating that "I am right" here but I am politely asking for more info on the same so I can better understand what I am possibly doing wrong with this deployment or other reasons. Lastly thank you again for your time to reply to me, I really appreciate this. Regards, Tony Pearce On Wed, 2 Sep 2020 at 16:54, feilong <feilong@catalyst.net.nz> wrote:
Hi Tony,
Let me answer #2 first. Did you try to use CLI? Please make sure using the latest python-magnumclient version. It should work. As for the dashboard issue, please try to use the latest version of magnum-ui. I encourage using resize because the node update is not recommended to use.
As for #1, I probably missed something. If the root disk=0MB, where the will operating system be installed? It would be nice if you can share your original requirement to help me understand the issue. e.g why do you have concern the node disk being used?
On 2/09/20 8:12 pm, Tony Pearce wrote:
Hi Feilong, thank you for replying to my message.
"1. I'm not sure it's a Magnum issue. Did you try to draft a simple Heat template to use that flavor and same image to create instance? Does it work?"
No I didn't try it and dont think that I know enough about it to try. I am using Magnum which in turn signals Heat but I've never used Heat directly. When I use the new flavour with root disk = 15GB then I dont have any issue with launching the cluster. But I have a future issue of consuming all available disk space on the compute node.
"2. When you say "resize cluster" failed, what's the error you got from magnum conductor log?"
I did not see any error in conductor log. Only the Magnum API and Horizon log as mentioned. It looks like horizon was calling bad URLs so maybe this is the reason why there was no conductor log? Just to mention again though, that "update cluster" option is working fine to increase the size of the cluster.
However my main issue here is with regards to the flavour being used. Can you or anyone confirm about the root disk = 0MB? OR can you or anyone share any information about how to utilise Magnum/Kubernetes without consuming Compute node HDD storage? I've been unable to achieve this and the docs do not give any information about this specifically (unless of course I have missed it?). The documentation says I can use any flavour [1].
[1] https://docs.openstack.org/magnum/latest/user/
Regards,
Tony Pearce
On Wed, 2 Sep 2020 at 15:44, feilong <feilong@catalyst.net.nz> wrote:
Hi Tony,
My comments about your two issues:
1. I'm not sure it's a Magnum issue. Did you try to draft a simple Heat template to use that flavor and same image to create instance? Does it work?
2. When you say "resize cluster" failed, what's the error you got from magnum conductor log?
On 1/09/20 9:22 pm, Tony Pearce wrote:
Hi guys, I hope you are all keeping safe and well at the moment.
I am trying to launch Kubernetes clusters into Openstack Train which has been deployed via Kayobe (Kayobe as I understand is a wrapper for kolla-ansible). There have been a few strange issues here and I've struggled to isolate them. These issues started recently after a fresh Openstack deployment some months ago (around February 2020) to give some context. This Openstack is not "live" as I've been trying to get to the bottom of the issues:
Issue 1. When trying to launch a cluster we get error "Resource Create Failed: Forbidden: Resources.Kube Masters.Resources[0].Resources.Kube-Master: Only Volume-Backed Servers Are Allowed For Flavors With Zero Disk. "
Issue 2. After successfully creating a cluster of a smaller node size, the "resize cluster" is failing (however update the cluster is working).
Some background on this specific environment: Deployed via Kayobe, with these components: Cinder, Designate, iscsid, Magnum, Multipathd, neutron provider networks
The Cinder component integrates with iSCSI SAN storage using the Nimble driver. This is the only storage. In order to prevent Openstack from allocating Compute node local HDD as instance storage, I have all flavours configured with root disk / ephemeral disk / swap disk = "0MB". This then results in all instance data being stored on the backend Cinder storage appliance.
I was able to get a cluster deployed by first creating the template as needed, then when launching the cluster Horizon prompts you for items already there in the template such as number of nodes, node flavour and labels etc. I re-supplied all of the info (as to duplicate it) and then tried creating the cluster. After many many times trying over the course of a few weeks to a few months it was successful. I was then able to work around the issue #2 above to get it increased in size.
When looking at the logs for issue #2, it looks like some content is missing in the API but I am not certain. I will include a link to the pastebin below [1]. When trying to resize the cluster, Horizon gives error: "Error: Unable to resize given cluster id: 99693dbf-160a-40e0-9ed4-93f3370367ee". I then searched the controller node /var/log directory for this ID and found "horizon.log [:error] [pid 25] Not Found: /api/container_infra/clusters/99693dbf-160a-40e0-9ed4-93f3370367ee/resize". Going to the Horizon menu "update cluster" allows you to increase the number of nodes and then save/apply the config which does indeed resize the cluster.
Regarding issue #1, we've been unable to deploy a cluster in a new project and the error is hinting it relates to the flavours having 0MB disk specified, though this error is new and we've been successful previously with deploying clusters (albeit with the hit-and-miss experiences) using the flavour with 0MB disk as described above. Again I searched for the (stack) ID after the failure, in the logs on the controller and I obtained not much more than the error already seen with Horizon [2].
I was able to create new flavours with root disk = 15GB and then successfully deploy a cluster on the next immediate try. Update cluster from 3 nodes to 6 nodes was also immediately successful. However I see the compute nodes "used" disk space increasing after increasing the cluster size which is an issue as the compute node has very limited HDD capacity (32GB SD card).
At this point I also checked 1) previously installed cluster using the 0MB disk flavour and 2) new instances using the 0MB disk flavour. I notice that the previous cluster is having host storage allocated but while the new instance is not having host storage allocated. So the cluster create success is using flavour with disk = 0MB while the result is compute HDD storage being consumed.
So with the above, please may I clarify on the following? 1. It seems that 0MB disk flavours may not be supported with magnum now? Could the experts confirm? :) Is there another way that I should be configuring this so that compute node disk is not being consumed (because it is slow and has limited capacity). 2. The issue #1 looks like a bug to me, is it known? If not, is this mail enough to get it realised?
Pastebin links as mentioned [1] http://paste.openstack.org/show/797316/
[2] http://paste.openstack.org/show/797318/
Many thanks,
Regards,
Tony Pearce
-- Cheers & Best regards, Feilong Wang (王飞龙) ------------------------------------------------------ Senior Cloud Software Engineer Tel: +64-48032246 Email: flwang@catalyst.net.nz Catalyst IT Limited Level 6, Catalyst House, 150 Willis Street, Wellington ------------------------------------------------------
-- Cheers & Best regards, Feilong Wang (王飞龙)
Senior Cloud Software Engineer Tel: +64-48032246 Email: flwang@catalyst.net.nz Catalyst IT Limited Level 6, Catalyst House, 150 Willis Street, Wellington ------------------------------------------------------
participants (2)
-
feilong
-
Tony Pearce