[openstack-dev] Nova PTL candidate questionnaire.

Michael Still mikal at stillhq.com
Thu Mar 7 12:54:11 UTC 2013


On Thu, Mar 7, 2013 at 12:41 AM, Joshua Harlow <harlowja at yahoo-inc.com> wrote:
> Just some initial questions that might help me formulate who is a good
> candidate (in my opinion of good).

Josh, thanks for this. I think there are some interesting questions here.

> What do you think during your term are 3 things that you would like to make
> better in nova?

I think I covered this a little in my candidacy email, but to reiterate:

 - security is a big deal, we need to get better at it
 - our bug backlog is too big
 - we need to get better at doing timely code reviews

> How will you use customer feedback (people deploying openstack and/or users
> using openstack) to influence those decisions?

It seems to me that one of our best sources of feedback from users at
the moment is they bugs they file. For whatever reason we seem to be
very good at having people report issues they're having, which I think
is something that many other projects would envy. However, we often
don't triage those bugs in a timely manner, and even worse they often
sit around for a really long time before they get fixed. If we don't
intend to fix a bug we should be more honest about it and explain to
the user why we think we're on the right path and their bug is
invalid. No one likes to just be ignored.

> What are 3 things that you believe nova is doing right and wrong?

3 things that are awesome:

 - the flexibility of choice in hypervisor vendor
 - we scale well from small installations up to very large ones
 - we're getting a lot better at making it easier for operators to
tell what's wrong (instance UUIDs are consistently in log messages now
for example)

> How would you make those 3 things better?

To cover each of those points in turn:

 - on the hypervisor front, there has been a new vmware driver in
review for a long time. We can do a better job of working with vmware
to get that code landed.

 - scaling as well as we do comes at the cost of complexity. We need a
lot of tunable flags for those large installations, but we need to
make sure the defaults are reasonable for first time users. In general
we do an ok job at that, but we don't do a great job of documenting
what things you might want to tune as your installation grows. Users
shouldn't have to wait to experience a problem to know what needs
tuning next.

 - the operators part is hard -- I spent the last seven years as an
Ops guy at a couple of companies, and I feel for OpenStack operators.
However, without bug reports and discussions on the ops mailing list
its hard for developers to tell what the pain points are. Now, we need
to get better at listening to our bugs, but we also need to encourage
operators to not give up on us and to keep filing them.

> Nova adds a lot of new features each release, do you think that is good,
> bad, or in between?

I think its both good and unavoidable. I know it causes pain for
people with environments that have to remain stable and understood,
but that's why we support older releases for so long. I personally
spent several days last week backporting patches to essex, which isn't
fun as a developer, but is our way of supporting people who need a
stable world.

Adding features is good for a few reasons -- we're learning about our
users and their deployments as we go along. If we can't add features
we can't address the mistakes we made in the past. Additionally,
adding features is how we're going to become the default cloud choice
for the world -- we need to have a coherant and complete set of
features that meet new user's needs.

Finally, most of the developers are funded by companies that are
focussed on adding features they need. If we block features, we stand
a good chance of losing a bunch of developers from the project.

> What are your thoughts on new features vs. stability and how will you
> address that during your term? Do you believe this is a problem to begin
> with?

Any change to a complex system has risks. I think having a feature
freeze is a good thing, and I'm glad we're in it now. Additionally,
having companies like RackSpace that are willing to run very close to
trunk is super important. Developers generally have small test
environments (devstack or a few lab machines). We need those bigger
deployments to help out with the testing, because its unlikely that
every dev will ever have access to test environments with a few
hundred nodes.

I don't have a specific plan to address how to improve stability
though... I'm going to have to think on that a bit more.

> Scale is a big question that is always interesting, how have you used
> openstack at scale, and what lessons have you learned that can make nova
> better during your term

Oh, this is interesting. I have been personally responsible for a
couple of OpenStack deployments in the order of tens of machines with
high hundreds of instances. That's not really scale though. My
employer has installations with much large machine counts (I'd have to
check if I'm allowed to give order of magnitude numbers!). While I
work in the group that runs the public cloud, I am actually focussed
on private cloud at the moment, which tends to be smaller
installations. I do get feedback from the public cloud team though,
and they occasionally let me play with their shiny toys.

On the flip side, I've worked directly on some very large proprietary
clustering systems. While at Google I worked on two relevant teams:
the cluster turnup team used a python automation system to
automagically build the software installation for new Google compute
clusters; I also worked on Mobile Search, which was an application
which ran on hundreds of machines inside these compute clusters. While
neither of these was OpenStack, I have seen clustering at scale and
feel I have relevant experience in that field.

Michael



More information about the OpenStack-dev mailing list