[openstack-qa] In connection with speed-up and future design of tempest

Daryl Walleck daryl.walleck at RACKSPACE.COM
Tue Jan 22 06:17:21 UTC 2013


Hi Attila,
Lots of good questions here! I've been out of the loop for the last 6 weeks so some of my opinions might be missing context from conversations that have been going in in the meantime.

1. testtools/speedup via parallelization

I haven't specifically been involved in this effort recently, and I know it's a goal many would like to see reached. I haven't been using testr/testtools in the prototypes I've worked on, but I believe there's some common ground to be reached here. By removing any dependencies on Nose and instead relying on capabilities provided by the base Python unittest library, I think that would open the door for groups to use any runner they wanted to as long as it runs standard unittest tests.

2. Limits

2.1 Machine size

I've run into similar problems when running tests against Devstack instances that I have hosted on my RAX VMs. I'm not sure giving Tempest the intelligence to determine how many threads to run in parallel is the right place to solve the problem. I've been handling this by up till now by throttling the number of threads used by my tests in my Jenkins configuration based on the job type and environment.

2.2 Quotas

For better or worse, I've solved this problem up till now by creating projects with very large quotas and using those for primary testing, except in cases where I am specifically testing quotas.

2. Isolation and side effects, test splitting

Again, this is the solution I've been using, so take this with a grain of salt. Given that I avoid issues with quotas by the method mentioned above, I haven't ran into any issues with side effects mostly due to test design. One of my primary test design concerns is to not rely on fixtures/data/state generated by other tests. This is why I've avoided module/package level fixtures, and things get a bit strange when working on tests in parallel. I'm only relying on class level fixtures and running test classes in parallel, which is an approach which has worked for us well. Any test that relies heavily on current system state (say tests for quotas/rate limits) is run in isolation in its own Jenkins job.

3. Resource reuse

I think the trick here depends on the threshold of pain/false positives a team is willing to deal with. I think for the OpenStack gating jobs this might be different, but for internal usage/gating of deployments, my team's threshold for false positives and test failure noise is very low, which is why I've always been very vocal in this regard. Still, I think we can find a solution agreeable to everyone. The idea of a test level resource manager has been tossed around before, and I think an implementation would allow us to be flexible about what test data is used based on needs (ie. configure the pool to reuse resources as often as possible for speed, configure it to destroy and regenerate resources for fresh resources).

3.1 Challenges with parallel execution and resource reuse together

I've been tossing around some thread-safe implementations in my head and with some other people of my team. If this is a route we ever want to go, I think a thread-safe implementation would not add too much complexity.

3.3 Manual ordering

One of the interesting capabilities of using the bare level Pyton unittests implementation is the load_tests function that can be added to test classes. With this, you can enforce order within a test class. I'm not sure if this would solve everyone's issues, but we've found in useful in the rare cases where ordering is desirable/necessary.

You had quite a few great points that I didn't have a chance to reply to, but I wanted to get a few responses in.

Daryl
________________________________________
From: Attila Fazekas [afazekas at redhat.com]
Sent: Monday, January 21, 2013 6:46 AM
To: All Things QA.
Subject: [openstack-qa] In connection with speed-up and future design of        tempest

Hi All,

I have several things where I need some clarification.
I heard a lot of different opinion on IRC related to the below topics, but I do not know which approaches are good for the majority and which are just good for the minority.
All of the below statement are just opinion or just question or just a suggestion, even if it looks differently.
I just want to discuss these topics.

I would like to hear everybody's opinion in these questions. It will help me to see the possible future steps.


1. testtools
    I have seen various attempts in refactoring tempest to be compatible with testr (testrepository, testresources, testtools), but I did not see any detailed plans for doing it, and even did not read anything about the longer term goals.

    In the https://blueprints.launchpad.net/tempest/+spec/speed-up-tempest the full specification is point to the blueprint edit instead of wiki page.
    Am I missed something ?

    I saw many cool features in these tools. Probably I would be the first who say do it yesterday if I could see how it exactly will impure the performance, without additional side effects or resource starvation or even without deadlock or synchronization issues.

    I just seen testr in parallelization context I assume we just considering a major refactoring and switching to testtools just because of the parallel execution.
    Please FIXME.

    I think just for parallel execution, this is not the cheapest solution, in term of work hours.


2. Limits must be considered in any a new design

2.1 Limited by machine size

    FIXME Tempest primary function is the gate jobs, secondary is providing test tools for various other test environment and even for production environments.
    We should only make performance sacrifices on higher level goal, if it has significant benefits in other cases, and the side effect is not significant in the primary case. (Configurable things are good :))

    The gate jobs nowadays done by small VM's with minimal resources, I do not know the exact numbers, but it could be about 1 vCPU, 4GB RAM, 20GB storage.

    Tempest is just waiting for I/O from another services, but tempest sharing on the CPU and I/O resources with another processes.
    In small environments the performance could be even worse when you try to do it in parallel.

2.2 Quota limits (default 10 from most resources)

    The default quota limitation should not be a new design blocker.

    Since tempest knows the admin password, it can create tenants with higher/unlimited quota.
    I do not see why we need to be limitation by the default quota limitation.
    Tempest may create tenants with lower limitation for quota testing,  or it can be done in periodic test even by a shell script.

    On the gate the maximum VM number is limited by the devstack VM size. Do not forget, even if we are using small memory guests (64MB RAM), we should not run more than 5/CPU_thread (not idle) VM.

2. Isolation and side effects, test splitting

    The real test isolation would be install all services to a different machine and run just single test case and reinstall everything and start again.
    I think nobody wants to go in this direction.

    Even now, test cases can fail just because of the heavy machine load.

    We should not isolate test cases by newly created tenants by default. We need to isolate only the test cases which otherwise would fail or causes others to fail.
    I am saying this because the resource creation or deletion can be really expensive, and we can use them just in a single tenant.
    We are slowed down primary by "real world" events, probably we can gain more performance by tricks which makes the real world event's cost smaller.


3. Resource reuse

    I heard many concerns about the resource reusing, but I think we can point out which test case made dirty the resource with proper logging and reuse strategy.
    The OpenStack API provides basic and advance information about the resource state, we can decide is the resource in good shape before starting a test code.
    If the above concept not working we found a real bug, or the API does not provide enough information, which is also a bug IMHO.

    I think we should try to go in resource reuse way it has great benefits even on a single tempest thread, but we need to consider a lot of thing if we even want to do it in parallel.

3.1 Challenges with parallel execution and resource reuse together

    Now it is difficult to know how match resource will be used, when we start a test case. But it is very important question in scheduling.

    Probably we need to add some attribute to the test functions and/or classes about the planned resource usage.
    Do not forget the resource deallocation, is not instant in term of both system resources and quota usage.
    We should do delete request once, and wait (we may try delete requests instead of list, and if its not found it is deleted) for termination, just before we need a new resource instance using the same quota/system resource.

    Many test case allocates resources (like server), with certain attributes, and verifies the operation.
    Many other cases just needs any server or just sensitive to several properties.
    Some of the test sensitive parameters can be changed in the resource lifetime but others are permanent.

    If we just label the test cases with the type of resource needs, like saying "2 active server and 2 active volume from same tenant" is not enough in all cases.

    Servers can be allocated by both XML and json and EC2 API call, however XML and json will know the same server id the EC2 will see a different one.
    Now the OS API can show the servers EC2 id as well, but in other cases (image) we might need to use "whitebox" DB query.

    Just saying in test fixture it "needs a server", not enough. Sometimes we require a special server. But all server using the same RAM pool and CPU pool, and it limited by the hardware.

    Test fixtures with multiple resource need can cause deadlock or unexpected failure, when we let them start before we can grantee the necessary resources.

    In a multi thread environment, all thread should know the same _consistent_ resource information at the same time, we might need locking or IPC. (consider at least threading/multiprocessing if we just speaking about cPython)

    The Test case ordering has side effects too, if the test executor decides starting a test case which uses resource "A" and only the last test case will use it again, it might occupies resource for a long time.
    Wrong ordering can prevent better parallel resource utilization.

    An example corner case: our server number limitation is 3, we have a test case which needs 3 server with spacial only creation time changeable attribute, and the already allocated "A" server has a different one.

    The actual ordering even can be test runner version dependent..

3.2 All or nothing

    If we leverage any system wide general parallel resource reuse solution, probably we need to do it everywhere once.
    However probably for "one active server" tests we can do some ad-hoc solution, without significant impact.
      We start one server when it first needed and we kill it when nothing else needs it anymore. We sacrifice just one server slot, however after boot it will not eat too match CPU..
      We might be able to have the XML and json tests to use the same setUpClass (I am not speaking about, just little better OOP style refactoring, however it is possible)

    If we just pick a good resource reuse solution and does not consider, how it behaves in multi-thread environment we might be in a big trouble.
    If the gate VM could not get significantly more cpu power(core), probably we can have just minimal benefits from parallel execution.

3.3 Manual ordering

    As you can see the problem set is big. Probably I missed a lot of other thing.
    I would not be surprised, if we could achieve better performance more easily by "manual" performance tuning.
    ie.  manual test case ordering (to multiple threads) while considering resource reusing.

    I can even live without a unittest framework, if it has significant benefits and someone can show me a very great plan about how to do it.
    Minimal requirement:
        - Return is everything was OK or not
        - On failure tells what was not OK exactly and very verbosely (first failure might be enough)
        - Ability to skip the failed part, and test the rest of the system

4. Fear, uncertainty and doubt

    Looks like it is not clear for everyone. Do we rejecting patches, because it is not testr/testtools ready or because it is as nose dependent as the others.

    As you can see really good solution, has a lot of challenges, and I would like to see we are going in the right direction before doing a major refactoring.
    In my opinion we should try to follow the "old rule", "new test case should be similar to the existing ones", until otherwise announced on this mailing list.
    The announcement should happen 7 days before, the new rules are enforce.
    I do not expect, major changes within a month, but who knows :)
    Now we should spend more time on clean up, in order to help any possible major change.

5. Test Images

5.1 cli_htl VM

    Most of the servers are newer connected in test cases and we does an operation, which does not need a working VM, just an ACTIVE VM. Probably we have ~3 not skipped case which is sensitive to have a working VM.
    We should consider using very very small test image which halts the VM in state when it does not consumes CPU resources (the VM will not try to reboot in order to fix the problem).
    Would be great if we could find a code which working on most architectures, but first just the X86 can be enough.

5.2 buildroot and cirros
    The cirros images AFAIK does not supports all feature we need to test, we should create a list about the minimum VM requirements.
    cirros is built by buildroot, in the distant past I have used buildroot, and I liked it.
    Probably we could create faster booting VM (I guess just changing the image compression method to lzo can help), with the additional necessary features.


6. Client Library or RestClient

   I see both client library and rest API test cases in tempest.

   Using a client library is good because:
     - We can cover the client's code as well
     - We can reuse existing code

   Using client library is not good because:
     - We are not verifying the API correctness, just the functionality.
     - We might not see an unwanted API changes

   Which is the direction we need to move forward in this question ?

   We could test the very same feature in many ways:
    - CLI tools (multiple API version)
    - Client library (json) (multiple API version)
    - XML API  (multiple API version)
    - json API (multiple API version)
    - boto library/ EC2
    - by our reinvented EC2 client (not exists at the moment)

   Client libraries are utilized by the CLI tools, we probably cover more code if we are using them as a CLI tools.
   AFAIK for CLI testing the devstack exercises is the recommended location.
   We should do some minimal smoke tests with client libraries anyway.

   Another possible combination is doing the XML tests with RestClient and doing the json tests with the client libraries.



I need lot of feed backs about the above items in order to know what to add/remove to/from tempest.

Best Regards,
Attila


PS: I hope now I have less typo than usual. I can rephrase any unclear part, feel free to ask it, even on IRC.
I hope at least I will know what I intended to say :)

_______________________________________________
openstack-qa mailing list
openstack-qa at lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-qa


More information about the openstack-qa mailing list