[tempest] Extending tempest for mixed-architecture stacks
Recently I had been running Tempest on my setup testing a mixed-architecture deployment (x86_64 ans ppc64le compute nodes at the same time). It seems that some of the migration and affinity tests will check if there's more than one Compute node before they run. However, it would seem that's as far as they check, without checking if they are in fact compatible or even of the same architecture. (my test cluster is very small, and normally includes two ppc64le Compute nodes, and sometimes one x86_64 Compute node. Currently one ppc64le machine is down for repair). Because the two compute nodes are different architectures, I am getting failures in various migration and affinity tests, maybe more if I tested a larger subset. Now granted my particular setup is a special case, but it does bring to mind some extensions that may be needed for Tempest in the future. I could see it being possible to have x86_64 and ARM mixed together in one stack, maybe even tossing in RISC-v someday. I'm thinking we need to start adding in extra test images, flavors, etc into the Tempest configurations (as in defining multiple options so that each architecture can have test images, etc assigned to it, rather than the current primary and alt image for just one architecture) Additionally, there should be testcases taking into account the architectures involved (such as seeing that an instance on one arch cannot be migrated to the other, as an example). I know this involves a bit of refactoring, I didn't know if it had even been considered yet. -- James LaBarre Software Engineer, OpenStack MultiArch Red Hat <https://www.redhat.com> jlabarre@redhat.com <mailto:jlabarre@redhat.com> <https://www.redhat.com>
Hi James, thank you for the suggestion. During Wallaby PTG we have considered having different images (than just cirros ones), see "Use different guest image for gate jobs to run tempest tests" topic in [1], although it wasn't pursued at the end (we've had more pressing topics to deal with) and the action item got closed in Xena cycle [2]. I think we could start by creating a new option which would allow us to skip the failing tests on a different architecture. If we had at least an experimental job in the gates, which would run a different architecture, we could add a new test exercising that as you suggested. Then let's see where that gets us. [1] https://etherpad.opendev.org/p/qa-wallaby-ptg [2] https://etherpad.opendev.org/p/qa-xena-priority Regards, On Mon, 13 Dec 2021 at 22:49, James LaBarre <jlabarre@redhat.com> wrote:
Recently I had been running Tempest on my setup testing a mixed-architecture deployment (x86_64 ans ppc64le compute nodes at the same time). It seems that some of the migration and affinity tests will check if there's more than one Compute node before they run. However, it would seem that's as far as they check, without checking if they are in fact compatible or even of the same architecture. (my test cluster is very small, and normally includes two ppc64le Compute nodes, and sometimes one x86_64 Compute node. Currently one ppc64le machine is down for repair).
Because the two compute nodes are different architectures, I am getting failures in various migration and affinity tests, maybe more if I tested a larger subset. Now granted my particular setup is a special case, but it does bring to mind some extensions that may be needed for Tempest in the future. I could see it being possible to have x86_64 and ARM mixed together in one stack, maybe even tossing in RISC-v someday.
I'm thinking we need to start adding in extra test images, flavors, etc into the Tempest configurations (as in defining multiple options so that each architecture can have test images, etc assigned to it, rather than the current primary and alt image for just one architecture) Additionally, there should be testcases taking into account the architectures involved (such as seeing that an instance on one arch cannot be migrated to the other, as an example). I know this involves a bit of refactoring, I didn't know if it had even been considered yet.
--
James LaBarre
Software Engineer, OpenStack MultiArch
Red Hat <https://www.redhat.com>
jlabarre@redhat.com <https://www.redhat.com>
-- Martin Kopec Senior Software Quality Engineer Red Hat EMEA
---- On Mon, 03 Jan 2022 04:35:08 -0600 Martin Kopec <mkopec@redhat.com> wrote ----
Hi James, thank you for the suggestion. During Wallaby PTG we have considered having different images (than just cirros ones), see "Use different guest image for gate jobs to run tempest tests" topic in [1], although it wasn't pursued at the end (we've had more pressing topics to deal with) and the action item got closed in Xena cycle [2]. I think we could start by creating a new option which would allow us to skip the failing tests on a different architecture. If we had at least an experimental job in the gates, which would run a different architecture, we could add a new test exercising that as you suggested. Then let's see where that gets us.
We can add more image tests in CI with separate jobs which will be straight forward to configure in zuul job (how many tests fails is another things to see). But skipping tests, I am not sure. What is the actual operation result for such arch? does real operation fails? or there is a different way to perform those operation in those arch than what test is doing? If they are failing in real world then test failing is valid things and exclude such tests while running will be right way instead of skipping the test. -gmann
[1] https://etherpad.opendev.org/p/qa-wallaby-ptg[2] https://etherpad.opendev.org/p/qa-xena-priority
Regards,
On Mon, 13 Dec 2021 at 22:49, James LaBarre <jlabarre@redhat.com> wrote: Recently I had been running Tempest on my setup testing a mixed-architecture deployment (x86_64 ans ppc64le compute nodes at the same time). It seems that some of the migration and affinity tests will check if there's more than one Compute node before they run. However, it would seem that's as far as they check, without checking if they are in fact compatible or even of the same architecture. (my test cluster is very small, and normally includes two ppc64le Compute nodes, and sometimes one x86_64 Compute node. Currently one ppc64le machine is down for repair). Because the two compute nodes are different architectures, I am getting failures in various migration and affinity tests, maybe more if I tested a larger subset. Now granted my particular setup is a special case, but it does bring to mind some extensions that may be needed for Tempest in the future. I could see it being possible to have x86_64 and ARM mixed together in one stack, maybe even tossing in RISC-v someday.
I'm thinking we need to start adding in extra test images, flavors, etc into the Tempest configurations (as in defining multiple options so that each architecture can have test images, etc assigned to it, rather than the current primary and alt image for just one architecture) Additionally, there should be testcases taking into account the architectures involved (such as seeing that an instance on one arch cannot be migrated to the other, as an example). I know this involves a bit of refactoring, I didn't know if it had even been considered yet.
-- James LaBarre Software Engineer, OpenStack MultiArch Red Hat jlabarre@redhat.com
-- Martin Kopec Senior Software Quality Engineer Red Hat EMEA
participants (3)
-
Ghanshyam Mann
-
James LaBarre
-
Martin Kopec