[Openstack-operators] A Hypervisor supporting containers

Sean Dague sean at dague.net
Mon May 5 11:32:01 UTC 2014


On 05/03/2014 05:48 PM, Narayan Desai wrote:
> Hey Sean :)
> 
> 
> On Fri, May 2, 2014 at 4:15 PM, Sean Dague <sean at dague.net
> <mailto:sean at dague.net>> wrote:
> 
> 
>     <snip>
> 
>     The immune response is there for a reason.
> 
> 
> Agreed. There is good reason. And big downside risk to the project. The
> cost of it is a suppression of contributions from people that aren't
> incentivised to go the extra mile, operators being a prime example. 
>
> Tactically, I'm not sure that I think that functionally gating on
> deployment capability in devstack, which is billed as opinionated is the
> best idea though. 

Maybe. I guess what are the alternatives? We need a tool that can
install and configure an openstack cloud from 28 git trees from a
basically blank server environment in < 10 minutes. Installing from git
is not entirely straight forward, as you can see with the latest devstack.

Realistically we are also opinionated on what we do integration tests
on. Because even if you just look at nova, there are over 600 config
options. The matrix of all options is enormous right now. And we're
setting aside some time to talk about it in Atlanta -
http://junodesignsummit.sched.org/event/fd84ec7ddc3252270fb73e8e9e09cfba#.U2dkzXWx17Q

So even if CEPH was added to devstack, that doesn't mean it would be
considered default enough to be part of integration testing. Especially
as we're still exposing bugs in libvirt and the kernel during normal
runs some times. When you run 25,000 Tempest runs a week, and have
600,000 guest starts a week, you see a lot of interesting things at the
edges.

>     At the same time I understand that people do want these things. So how
>     do we find a way to keep the upstream code something that's maintained
>     and working for people. Plenty of Fedora folks complain devstack is
>     always broken on Fedora, and it is, because nothing automatically checks
>     that code.
> 
>     > Case in point. In the absence of a budget, unit testing is better than
>     > not, but integration testing ends up being more important in my
>     > experience. The thing that trumps both of them is real experience in
>     > actual large scale systems.
> 
>     I agree, with a caveat. The real experience captures the state of
>     working today. Which is great. It doesn't, however, help us keep things
>     working tomorrow.
> 
>     There are, currently, 391 patches up for review in Nova, right now. Any
>     of those are capable of breaking OpenStack for everyone. Human eyes are
>     good, but completely foulable. Human eyes + integration tests are much
>     better.
> 
> 
> That is fair.
>  
> 
>     We should *definitely* also figure out how to get more large sale
>     experience injected back in. I think it's clear Summit is not that
>     venue, so the next question is where might that venue exist, if it's a
>     physical place, or a virtual one. The Linux Foundation addressed this
>     sort of issue around Linux with the End Users Summit as a completely
>     different kind of gathering event mostly to bridge these divides.
> 
>     But maybe a real world event doesn't work well here. What about some
>     better format to get operator stories back into the hands of the
>     development community. I'd love nothing more than a regular voice/video
>     presentation by various operators discussing their installations and
>     major pain points, in a level of detail where we could start to figure
>     out parts / pieces that can be tackled in the near term (current cycle).
> 
> 
> I'm a bit jaded about user stories; I'm not sure it is possible extract
> the right information without more of a discussion sort of format. 
> 
> Lorin Hochstein had a great idea a while ago. He proposed a shadowing
> program, where devs spend time with ops folks in person:
> http://lorinhochstein.wordpress.com/2013/11/30/adopt-an-op/

Yep, I thought that was a great idea. More communication is good.

I agree 1 way information sharing on user stories is pretty much
useless, because it doesn't allow for the dialog to dive into the issues.

A really great demonstration of this was XML and the User Survey. I
looked through that data, though the toolkits out there, and found it
completely impossible that the 35% of users use the OpenStack XML API
could in any way be a real number (especially when orgs like Rackspace
have measured numbers of only about 1%).

I asked multiple times, through various channels, if I could have a
conversation with anyone that checked that box so I could understand
what they meant. Nothing. So we gave up trying to get feedback and we
just made the policy decision to deprecate all of that without it.

> Through this discussion, I've started to appreciate the depth of the
> (social) scalability challenges you guys are facing. It is pretty likely
> that we're hitting amdahl's law one way or another here. What do you
> think the limiting factor is?

Honestly, non core eyes on reviews is a big missing piece right now.

But consider this: we are getting 10k new patches proposed every ~40
days. I think the after patch iteration before landing is ~4 (average,
there are tons of trivials that fly through, and some complicated ones
that go to 70 revisions). So we're looking at ~1000 pieces of code that
need to be reviewed daily somewhere across the project(s).

That's a lot.

Our inbound code rate is increasing much faster than our inbound
reviewer increase. Which is why the review queues keep growing in time.

This impacts "cost" on a lot of other parts of the project.

> People keep bringing up the linux kernel "mainline isn't for users"
> approach. I think that one of the sticking points for us is that there
> isn't any appropriate downstream integration point. We run the ubuntu
> releases that we patch at our site. But I wouldn't consider trying to
> get code integrated through that path. The reasonable limit for distros
> is probably filing bugs, and probably only for bugs as opposed to
> feature requests or patches, particularly if you aren't a paying customer.

The Linux Kernel model of "mainline isn't for users", doesn't mean that
you integrate at the distro level though. Most of the distros kick you
back to contributing upstream before they'll pull in a kernel feature.

> This has actually a long term issue. We started running the anso
> packages, back in the day, and have had this difficult process of
> picking which bits to run for a long time. I wonder if it would help to
> have some set of releases that are intended for users, with some ability
> (and effort allocated from the project side) to triage issues more
> closely, or maybe work operator relevant patches through the system. It
> might be worth a shot to try building some activities explicitly with
> hybrid goals.

I'm honestly not sure what this means, and I feel like you'd need to
really map it out in detail to figure out where the secondary effects
would be. Complexity is the enemy of scale, not just for compute code,
but for people interactions as well.

One of the reasons the -specs process was carved out was to make it
easier for non developers to have a says in the design. That I think is
a step in the right direction.

I also think we need to realize there are escape valves for important
fixes that people feel are getting lost. Every core project team has a
regular meeting (https://wiki.openstack.org/wiki/Meetings). Putting
something on the agenda there and showing up for discussion puts you
into the top 5% of patches. Definitely worth the effort.

If there are a larger set of important issues to the operator community
at large, I'd love some way of collecting those together. The user
survey is largely not useful for this at all because it's anonymized, so
no follow up is possible. The ability to follow up is key.

Anyway, I think we're way off topic for the original thread. :)

Narayan, hope you'll be in Atlanta. Missed you in HK, and would be great
to chat through more of these ideas face to face.

	-Sean

-- 
Sean Dague
http://dague.net

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 482 bytes
Desc: OpenPGP digital signature
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20140505/80ad3b1c/attachment.pgp>


More information about the OpenStack-operators mailing list