[openstack-dev] [nova] Proposal for an Experiment
harlowja at outlook.com
Wed Jul 15 15:31:40 UTC 2015
I do like experiments!
What about going even farther and trying to integrate somehow into mesos?
Replace the hadooop executor, MPI executor with a 'VM executor' and
perhaps we could eliminate a large part of the scheduler code (just a
I think a bunch of other ideas were also written down @
https://review.openstack.org/#/c/191914/ maybe u can try some of those to :)
Ed Leafe wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
> Changing the architecture of a complex system such as Nova is never
> easy, even when we know that the design isn't working as well as we
> need it to. And it's even more frustrating because when the change is
> complete, it's hard to know if the improvement, if any, was worth it.
> So I had an idea: what if we ran a test of that architecture change
> out-of-tree? In other words, create a separate deployment, and rip out
> the parts that don't work well, replacing them with an alternative
> design. There would be no Gerrit reviews or anything that would slow
> down the work or add load to the already overloaded reviewers. Then we
> could see if this modified system is a significant-enough improvement
> to justify investing the time in implementing it in-tree. And, of
> course, if the test doesn't show what was hoped for, it is scrapped
> and we start thinking anew.
> The important part in this process is defining up front what level of
> improvement would be needed to make considering actually making such a
> change worthwhile, and what sort of tests would demonstrate whether or
> not whether this level was met. I'd like to discuss such an experiment
> next week at the Nova mid-cycle.
> What I'd like to investigate is replacing the current design of having
> the compute nodes communicating with the scheduler via message queues.
> This design is overly complex and has several known scalability
> issues. My thought is to replace this with a Cassandra  backend.
> Compute nodes would update their state to Cassandra whenever they
> change, and that data would be read by the scheduler to make its host
> selection. When the scheduler chooses a host, it would post the claim
> to Cassandra wrapped in a lightweight transaction, which would ensure
> that no other scheduler has tried to claim those resources. When the
> host has built the requested VM, it will delete the claim and update
> Cassandra with its current state.
> One main motivation for using Cassandra over the current design is
> that it will enable us to run multiple schedulers without increasing
> the raciness of the system. Another is that it will greatly simplify a
> lot of the internal plumbing we've set up to implement in Nova what we
> would get out of the box with Cassandra. A third is that if this
> proves to be a success, it would also be able to be used further down
> the road to simplify inter-cell communication (but this is getting
> ahead of ourselves...). I've worked with Cassandra before and it has
> been rock-solid to run and simple to set up. I've also had preliminary
> technical reviews with the engineers at DataStax , the company
> behind Cassandra, and they agreed that this was a good fit.
> At this point I'm sure that most of you are filled with thoughts on
> how this won't work, or how much trouble it will be to switch, or how
> much more of a pain it will be, or how you hate non-relational DBs, or
> any of a zillion other negative thoughts. FWIW, I have them too. But
> instead of ranting, I would ask that we acknowledge for now that:
> a) it will be disruptive and painful to switch something like this at
> this point in Nova's development
> b) it would have to provide *significant* improvement to make such a
> change worthwhile
> So what I'm asking from all of you is to help define the second part:
> what we would want improved, and how to measure those benefits. In
> other words, what results would you have to see in order to make you
> reconsider your initial "nah, this'll never work" reaction, and start
> to think that this is will be a worthwhile change to make to Nova.
> I'm also asking that you refrain from talking about why this can't
> work for now. I know it'll be difficult to do that, since nobody likes
> ranting about stuff more than I do, but right now it won't be helpful.
> There will be plenty of time for that later, assuming that this
> experiment yields anything worthwhile. Instead, think of the current
> pain points in the scheduler design, and what sort of improvement you
> would have to see in order to seriously consider undertaking this
> change to Nova.
> I've gotten the OK from my management to pursue this, and several
> people in the community have expressed support for both the approach
> and the experiment, even though most don't have spare cycles to
> contribute. I'd love to have anyone who is interested become involved.
> I hope that this will be a positive discussion at the Nova mid-cycle
> next week. I know it will be a lively one. :)
>  http://cassandra.apache.org/
>  http://www.datastax.com/
> - --
> - -- Ed Leafe
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2
> Comment: GPGTools - https://gpgtools.org
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
> -----END PGP SIGNATURE-----
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
More information about the OpenStack-dev