[openstack-dev] [nova] Proposal for an Experiment

Joshua Harlow harlowja at outlook.com
Wed Jul 15 15:31:40 UTC 2015


I do like experiments!

What about going even farther and trying to integrate somehow into mesos?

https://mesos.apache.org/documentation/latest/mesos-architecture/

Replace the hadooop executor, MPI executor with a 'VM executor' and 
perhaps we could eliminate a large part of the scheduler code (just a 
thought)...

I think a bunch of other ideas were also written down @ 
https://review.openstack.org/#/c/191914/ maybe u can try some of those to :)

Ed Leafe wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
>
> Changing the architecture of a complex system such as Nova is never
> easy, even when we know that the design isn't working as well as we
> need it to. And it's even more frustrating because when the change is
> complete, it's hard to know if the improvement, if any, was worth it.
>
> So I had an idea: what if we ran a test of that architecture change
> out-of-tree? In other words, create a separate deployment, and rip out
> the parts that don't work well, replacing them with an alternative
> design. There would be no Gerrit reviews or anything that would slow
> down the work or add load to the already overloaded reviewers. Then we
> could see if this modified system is a significant-enough improvement
> to justify investing the time in implementing it in-tree. And, of
> course, if the test doesn't show what was hoped for, it is scrapped
> and we start thinking anew.
>
> The important part in this process is defining up front what level of
> improvement would be needed to make considering actually making such a
> change worthwhile, and what sort of tests would demonstrate whether or
> not whether this level was met. I'd like to discuss such an experiment
> next week at the Nova mid-cycle.
>
> What I'd like to investigate is replacing the current design of having
> the compute nodes communicating with the scheduler via message queues.
> This design is overly complex and has several known scalability
> issues. My thought is to replace this with a Cassandra [1] backend.
> Compute nodes would update their state to Cassandra whenever they
> change, and that data would be read by the scheduler to make its host
> selection. When the scheduler chooses a host, it would post the claim
> to Cassandra wrapped in a lightweight transaction, which would ensure
> that no other scheduler has tried to claim those resources. When the
> host has built the requested VM, it will delete the claim and update
> Cassandra with its current state.
>
> One main motivation for using Cassandra over the current design is
> that it will enable us to run multiple schedulers without increasing
> the raciness of the system. Another is that it will greatly simplify a
> lot of the internal plumbing we've set up to implement in Nova what we
> would get out of the box with Cassandra. A third is that if this
> proves to be a success, it would also be able to be used further down
> the road to simplify inter-cell communication (but this is getting
> ahead of ourselves...). I've worked with Cassandra before and it has
> been rock-solid to run and simple to set up. I've also had preliminary
> technical reviews with the engineers at DataStax [2], the company
> behind Cassandra, and they agreed that this was a good fit.
>
> At this point I'm sure that most of you are filled with thoughts on
> how this won't work, or how much trouble it will be to switch, or how
> much more of a pain it will be, or how you hate non-relational DBs, or
> any of a zillion other negative thoughts. FWIW, I have them too. But
> instead of ranting, I would ask that we acknowledge for now that:
>
> a) it will be disruptive and painful to switch something like this at
> this point in Nova's development
> b) it would have to provide *significant* improvement to make such a
> change worthwhile
>
> So what I'm asking from all of you is to help define the second part:
> what we would want improved, and how to measure those benefits. In
> other words, what results would you have to see in order to make you
> reconsider your initial "nah, this'll never work" reaction, and start
> to think that this is will be a worthwhile change to make to Nova.
>
> I'm also asking that you refrain from talking about why this can't
> work for now. I know it'll be difficult to do that, since nobody likes
> ranting about stuff more than I do, but right now it won't be helpful.
> There will be plenty of time for that later, assuming that this
> experiment yields anything worthwhile. Instead, think of the current
> pain points in the scheduler design, and what sort of improvement you
> would have to see in order to seriously consider undertaking this
> change to Nova.
>
> I've gotten the OK from my management to pursue this, and several
> people in the community have expressed support for both the approach
> and the experiment, even though most don't have spare cycles to
> contribute. I'd love to have anyone who is interested become involved.
>
> I hope that this will be a positive discussion at the Nova mid-cycle
> next week. I know it will be a lively one. :)
>
> [1] http://cassandra.apache.org/
> [2] http://www.datastax.com/
> - --
>
> - -- Ed Leafe
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2
> Comment: GPGTools - https://gpgtools.org
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQIcBAEBCgAGBQJVpmvCAAoJEKMgtcocwZqLSNYP/0b8s7pZnXaF3tTYF+WtNppr
> lHyQMHSXLQ3CESoS4961ZWOCMtV2hCxvcioXem+PJzOdZED143XMJ3LR6+dZ012q
> RGSp43Co+vUfuTtaTg030sLyDlXZKEenkPXy0202WpPaK4RYSonrnrxs0kmv+ZpH
> yamsZP2/gReZseBsKiww0FkqWGkIJxD7bi1r8DdLa/HLvwYUD+U2zrcUvT4cMXMR
> uHocf+Rs76lNHsMd/aOeCHcCvqcXJBjVVgmu6jnBFsAdgzfyfzsF6NKxC5Fnb5sH
> 0yabUU/mPVn+JZRbp4QXtBqkEoJFME3qyPQOKDfGjzy7lzRJhKsZwJPrQB19NWmO
> iXIIIg7WouEzyYw21yij0VUDCAOwzjPlJe4hG3SwSvUPOaLTUTkQGHPs2MjsJ3aw
> YtuD81phsnEZ8jOw9FlMHvDToffNFr8QooJAMVZwSkGypOcQ81BJbutla95BXhO9
> x032EOJMUX3lte873Rt5qABgi0SDHzVom7wmuxek0GMkOupB+OxyGwfGB6qYw/Tq
> w3A2Zo779bUKJ2vZtP1tp3Z1aZ2NRuaDm6ukwYYD5tAax3S9dYGm1OvnlWv9seh8
> aEvsrGfJZrXSt/LX03MgKm01QHS+pxVPoPFaKq8rQW2z2WCHX32qJwolh8++RiYl
> MOOs5vT8LWiPIEel+3Cs
> =br/i
> -----END PGP SIGNATURE-----
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list