[openstack-dev] [Openstack] [Swift] LFS patch (Ia32c9c34)

Monty Taylor mordred at inaugust.com
Tue Jul 24 18:35:40 UTC 2012


Hi!

On 07/24/2012 12:24 PM, Caitlin Bestler wrote:
> 
> Stefano Marfuli wrote:
> 
>> John raises an important point below that I think deserves
>> attention not only from SWIFT developers but also from the INFRA
>> team. Testing patches for specific systems raises the need for
>> testing environments, for Solaris and ZFS.
> 
>> INFRA-Team: what's your take on this?
> 
> An open source project needs to be extensible beyond universally
> available environments.
> 
> How else could device drivers ever be developed?
> 
> How could new processors be supported by any operating system? Do you
> limit an OS to using at most n cores because you cannot ask testers
> to acquire test machines with more than n cores?

Totally agree.

> People do test that environment specific enhancements have no impact
> on routine environments. They also *read* the code to protect against
> it become to riddled with #ifdefs (or even ifs) dealing with special
> environments.
> 
> But ultimately, testing of the patch in the environment that it was
> intended for is limited to those who work with that environment. Open
> source operating systems deal with that reality all the time. I don't
> see why OpenStack should not as well.

This is, in fact, EXACTLY what we're trying to do.

> The LFS patches have from the beginning sought to allow variable
> performance from the local file systems in a way that allows enhanced
> results without penalizing the performance for the general case or
> the readability of the core code.
> 
> The first approach was to define a low level plugin. In response to
> reviewer feedback we shifted to a middleware approach instead. We
> remain open to suggestions on an improved interface.
> 
> But it is important to recognize that this is not just about
> Solaris/Illumos or even ZFS. The LFS patches enable deployment of
> enhanced storage servers that can offer customers better solutions to
> how data integrity is implemented. And rather than merely replacing
> Swift as a whole, we have tried to work within Swift to preserve as
> much of the basic Swift operation as possible.
> 
> The concept introduced into the ring builder is that a local file
> system can have a feature where it provides self-healing data
> protection that is equivalent of an extra network replica. This is as
> technology neutral as possible. Insisting that this be invisible to
> the ring builder amounts to refusing to give credit to a local file
> system that does provide any form of enhanced local data protection.
> It says that no matter what you do, Swift will deploy the same amount
> of network-based data protection. Which would really make providing
> extra data protection independently of Swift driven network
> replication a wasted effort. It would be like adding a second health
> insurance policy for your data, without even increasing the
> deductible on your existing health coverage.
> 
> The feature proposed is not specific to ZFS, it covers *any* forms of
> local file systems providing data protection.
> 
> So the patch should be evaluated on how well it accomplishes the goal
> of *enabling* third party extensions without compromising the core
> code in performance or readability. Which is the basis on which any
> extension enabling code should be evaluated.

Yeah. There are two things to be tested here. You need to test the core,
and that the core works without the existence of any of the extension
functionality. And then you need to test that the extension works.

The core project is easy - we do that constantly. If you propose this
patch and it is accepted, the normal testing already done should verify
that your changes do not break core.

The extension is the part that you, as the hardware/environment-specific
extension provider are kind of on the hook for ensure works. As the
central infrastructure team, we can't test it, because we don't have
access to the stuff to test it - and as you said, it would be massively
unscalable to attempt to do so as a barrier for entry.

The approach we've put together for dealing with this is to have a
decentralized approach to all non-gating tests. You can see an example
of this on nova with the smokestack project. It's not a set of gating
tests, but it does run tests on commits, and it does report results
back, and developers generally care about those results.

We have some documentation on how you can go about setting up a testing
environment that hooks in to the central one at:

http://ci.openstack.org/third_party.html

We're also always on #openstack-infra if you want more immediate or
higher-bandwidth help getting connected in.

There are several lovely things about you running your own testing lab
and reporting the results back:

- you have a vested interest in support for your environment being
solid, so you'll likely do a good job
- it's purely informative, so if you decide you do not have the
resources to run tests on every proposed commit, you can run them
whenever you do feel like it, and can send in those responses whenever

The only real _Requirements_ from the project's perspective, are that if
you are going to run a test somewhere and write the results back into
the gerrit review, that the build records and results of that test are
published somewhere that's readable. (after all, if your jenkins is
behind a firewall and posts a link to http://10.31.4.4/ - it's not very
useful)

Dan Prince runs SmokeStack and Canonical has also been running
bare-metal integration testing in their lab which hopefully will start
responding back.

> There is a major procedural issue as to how all associated products
> will be kept as current as possible with the core, without making the
> core QA teams responsible for testing each extension.

The hardest part is getting the extension folks to actually take the
first step - and it seems the even harder part is for them to be able to
find someone in their org who can make them a location outside of their
corporate VPN where they can run tests. Of all of the things we can fix,
that's probably out of our hands - but if you can get a jenkins server
with a public IP, then I think we're in business.


> I believe the following guidelines are applicable:
> 
> * Anyone offering an extension SHOULD enable others to test it. Even
> if that means making proprietary hardware available at cost. Nexenta
> does not have to provide hardware because our product is deployable 
> as a Virtual Machine. We will provide a VM image suitable for testing
> that does not require registration.

That would be stellar. I thing Should is the right word. Another thing
that an extension provider can provide is assurance that sufficient
manpower will be put to the project if the environment is suitably hard
enough to provide copies of. Producing trust that you'll stick behind
such an offer will just take time. (people come to openstack and say
things all the time - some of them actually show up and do what they say)

> * Those offering an extension
> should provide regular integration testing of new core software to
> guard against new code inadvertently breaking existing extensions. I
> think this is a major area for discussion. How often should an
> extension sponsor be expected to do this testing, and at what
> frequency with what expected deadlines. On one hand those making new
> patches reasonably expect those who are Negatively impacted by those
> patches to provide somewhat prompt feedback. But no third party can 
> be expected to provide full time QA teams to run tests on demand for
> third parties. Getting QA testing scheduled for in-house development
> is enough of a challenge for most of us already. Ultimately there
> will have to be a window where we expect sponsors of associated
> projects to run QA on new core patches that is near the end of each
> cycle. How long does this window need to be? How many cycles of
> testing would be adequate for each general release?

The project runs 24x7 fully automated testing, and we do it
automatically on every patch. Our systems are fully replicatable, down
to re-usable puppet modules you can use to completely re-create our
entire environment. We're also more than happy to assist in getting an
automated testing environment set up that you run - basically all you
need is a few machines that run your environment.

So from my perspective, if you want to get your extension code into
core, you need to commit to actually regularly testing that code.
Otherwise, I will not believe that you are serious about wanting the
code to exist in the first place.  We already had this experience in
Nova with hyperv, which resulting in us deleting the code. Now, we
finally have a group who are spinning up a testing rig to actually test
hyperv support, and once that's up and running, I think we'll all be
thrilled to let that code back in the tree, because we'll have
assurances that it's being tested.

In any case- work with us to get some sort of jenkins or something else
spun up locally and connected to the gerrit event stream. Even if it's
only running _one_ integration test, the hurdle is getting the pieces in
place so that the process works end to end. At that point, there are
things to point at and a clear place to put additional tests. Or?

Monty



More information about the OpenStack-dev mailing list