[Openstack-operators] A Hypervisor supporting containers

Narayan Desai narayan.desai at gmail.com
Fri May 2 12:47:04 UTC 2014


<This ended up being a bit of a primal scream -- I am writing this from a
position of concern about the strategy that openstack is taking>

tl;dr: openstack is starting to feel like a tv show called "when developers
attack"

The openstack dev community is starting to feel more and more like software
engineering fundamentalists. Testing is important. But frankly, there are a
bunch of things that are as or more important. And matter much more to
people that want to run, not just develop the software.

We've seen features proposed for removal (not just in nova) because of lack
of testing coverage. Features that have been integrated for years, that
we've been using in production *for years without any problems*.

Getting new code integrated is a nightmare. Take a look at this:
https://review.openstack.org/#/c/65113/
for an example. You have a well established member of the openstack
community, proposing code for a set of features that everyone wants
(uniform integration of storage across glance, ephemeral storage, and
cinder using ceph), and it gets blocked because devstack isn't up to snuff.
Talk about cutting off your nose to spite your face.

Feedback from operators is regularly ignored in favor of clean (though
clearly flawed) software architecture. There was a large discussion
recently here about the fundamental flaws is the quota system, as currently
designed. We chimed in, along with Tim Bell, Jay Pipes, and a few other
people. It was one of the more detailed discussions that we've had here,
and I thought did a good job of capturing issues. When these issues were
brought up on IRC with nova devs, we got the response they couldn't be
bothered to read the whole thread on operators, and several people
continued to argue that we didn't need what we said we needed for quite
some time. I'm not saying that operators should be deferred to in all areas
here, but we do understand how the system works in practice and at scale
quite a bit better than the developers.

The feedback loops from users/ops continue to be broken. Tim's efforts on
behalf of the user committee are important steps in the right direction,
but the developer culture is openstack culture in a deep way. Operators
continue to be on the outside.

As another illustration of this, I was contacted a few months ago by
developers interested in scheduling. Now, I have a lot of experience in
scheduling, and have done research in the area for the last 10 years, so
this is a good start. So, they are interested in breaking scheduling out to
its own project. This may or may not be a good idea; taking that approach
makes some things easier, like coordination of strategies, but comes at a
higher coordination cost. Having worked through this transition with a
different scheduler, i don't think this is a decision you make lightly. At
any rate, they were looking for a person to push the effort forward, which
would consist of 3-6 months of refactoring to get the code into a better
state. This might sound basically reasonable, but any discussion of gap
analysis was completely missing. The state of the scheduling (placement,
actually, not scheduling really) is pretty underwhelming, and causes us
operational problems all of the time, but that isn't on the radar. These
guys had the best of intentions, are operating with a different set of
incentives and experiences that cause them to prioritize things in a way
that unintentionally clashes with ops folks. I understand why this happens,
but it is unclear how to fix it.

To be clear, I don't think that there is any bad intent here, but the
differences in goals, experiences, and incentives means this problem isn't
going to fix itself. Devs need to make sure they maintain code quality, and
have a reasonable immune system to protect from bad code and ideas. We just
need to make sure we don't develop the process equivalent of lupus.

Case in point. In the absence of a budget, unit testing is better than not,
but integration testing ends up being more important in my experience. The
thing that trumps both of them is real experience in actual large scale
systems. Problems there will never be adequately captured by either of
those processes. Making huge investments in the first two venues as gating
criteria while doing the third informally seems like an overemphasis of the
wrong things to me.
 -nld




On Fri, May 2, 2014 at 1:13 AM, Michael Still <mikal at stillhq.com> wrote:

> On Fri, May 2, 2014 at 2:32 PM, matt <matt at nycresistor.com> wrote:
>
> > I am all for enforcing CI.  But, my understanding of the workflow is code
> > doesn't go in without unit tests.  Frankly you guys removing sections of
> > code for not having proper unit testing is downright terrifying.  Doubly
> so
> > when it's major feature sets.
>
> You are misunderstanding what we mean by CI in this case. We have unit
> tests (although the coverage isn't always great, but its pretty much
> on par with every other software project), but what we're talking
> about here is tempest tests -- which are scenario tests. Things like
> does booting a virtual machine actually work. Does getconsolelog()
> actually return a console. etc etc.
>
> We're raising the bar on testing. That's always a good thing. Worrying
> about what we had in the past doesn't really help, because there's
> nothing I can do about that.
>
> Michael
>
> --
> Rackspace Australia
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20140502/13c418f9/attachment.html>


More information about the OpenStack-operators mailing list