Open Stack

Mon Jul 20 14:48:46 UTC 2015

On 7/15/15, 9:18 AM, "Ed Leafe" <ed at leafe.com> wrote:

>-----BEGIN PGP SIGNED MESSAGE-----
>Hash: SHA512
>
>Changing the architecture of a complex system such as Nova is never
>easy, even when we know that the design isn't working as well as we
>need it to. And it's even more frustrating because when the change is
>complete, it's hard to know if the improvement, if any, was worth it.
>
>So I had an idea: what if we ran a test of that architecture change
>out-of-tree? In other words, create a separate deployment, and rip out
>the parts that don't work well, replacing them with an alternative
>design. There would be no Gerrit reviews or anything that would slow
>down the work or add load to the already overloaded reviewers. Then we
>could see if this modified system is a significant-enough improvement
>to justify investing the time in implementing it in-tree. And, of
>course, if the test doesn't show what was hoped for, it is scrapped
>and we start thinking anew.

+1
>
>The important part in this process is defining up front what level of
>improvement would be needed to make considering actually making such a
>change worthwhile, and what sort of tests would demonstrate whether or
>not whether this level was met. I'd like to discuss such an experiment
>next week at the Nova mid-cycle.
>
>What I'd like to investigate is replacing the current design of having
>the compute nodes communicating with the scheduler via message queues.
>This design is overly complex and has several known scalability
>issues. My thought is to replace this with a Cassandra [1] backend.
>Compute nodes would update their state to Cassandra whenever they
>change, and that data would be read by the scheduler to make its host
>selection. When the scheduler chooses a host, it would post the claim
>to Cassandra wrapped in a lightweight transaction, which would ensure
>that no other scheduler has tried to claim those resources. When the
>host has built the requested VM, it will delete the claim and update
>Cassandra with its current state.
>
>One main motivation for using Cassandra over the current design is
>that it will enable us to run multiple schedulers without increasing
>the raciness of the system. Another is that it will greatly simplify a
>lot of the internal plumbing we've set up to implement in Nova what we
>would get out of the box with Cassandra. A third is that if this
>proves to be a success, it would also be able to be used further down
>the road to simplify inter-cell communication (but this is getting
>ahead of ourselves...). I've worked with Cassandra before and it has
>been rock-solid to run and simple to set up. I've also had preliminary
>technical reviews with the engineers at DataStax [2], the company
>behind Cassandra, and they agreed that this was a good fit.
>
>At this point I'm sure that most of you are filled with thoughts on
>how this won't work, or how much trouble it will be to switch, or how
>much more of a pain it will be, or how you hate non-relational DBs, or
>any of a zillion other negative thoughts. FWIW, I have them too. But
>instead of ranting, I would ask that we acknowledge for now that:

Call me an optimist, I think this can work :)

I would prefer a solution that avoids state management all together and
instead depends on each individual making rule-based decisions using their
limited observations of their perceived environment. Of course, this has
certain emergent behaviors you have to learn from, but on the upside, no
more braiding state throughout the system. I don¹t like the assumption
that it has to be a global state management problem when it doesn¹t have
to be. That being said, I¹m not opposed to trying a solution like you
described using Cassandra or something similar. I generally support
improvements :)

>
>a) it will be disruptive and painful to switch something like this at
>this point in Nova's development
>b) it would have to provide *significant* improvement to make such a
>change worthwhile
>
>So what I'm asking from all of you is to help define the second part:
>what we would want improved, and how to measure those benefits. In
>other words, what results would you have to see in order to make you
>reconsider your initial "nah, this'll never work" reaction, and start
>to think that this is will be a worthwhile change to make to Nova.

I¹d like to see n build requests within 1 second each be successfully
scheduled to a host that has spare capacity with only say a total system
capacity of n * 1.10 where n >= 10000, each cell having ~100 hosts, the
number of hosts is >= n * 0.10 and <= n * 0.90, and the number of
schedulers is >= 2.

For example:

Build requests: 10000 in 1 second
Slots for flavor requested: 11000
Hosts that can build flavor: 7500
Number of schedulers: 3
Number of cells: 75 (each with 100 hosts)

>
>I'm also asking that you refrain from talking about why this can't
>work for now. I know it'll be difficult to do that, since nobody likes
>ranting about stuff more than I do, but right now it won't be helpful.
>There will be plenty of time for that later, assuming that this
>experiment yields anything worthwhile. Instead, think of the current
>pain points in the scheduler design, and what sort of improvement you
>would have to see in order to seriously consider undertaking this
>change to Nova.
>
>I've gotten the OK from my management to pursue this, and several
>people in the community have expressed support for both the approach
>and the experiment, even though most don't have spare cycles to
>contribute. I'd love to have anyone who is interested become involved.

Thanks for working on this. I¹ll try to help when and where I can. I¹m
very interested in improving scheduling.

>
>I hope that this will be a positive discussion at the Nova mid-cycle
>next week. I know it will be a lively one. :)
>
>[1] http://cassandra.apache.org/
>[2] http://www.datastax.com/
>- -- 
>
>- -- Ed Leafe
>-----BEGIN PGP SIGNATURE-----
>Version: GnuPG v2
>Comment: GPGTools - https://gpgtools.org
>Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
>iQIcBAEBCgAGBQJVpmvCAAoJEKMgtcocwZqLSNYP/0b8s7pZnXaF3tTYF+WtNppr
>lHyQMHSXLQ3CESoS4961ZWOCMtV2hCxvcioXem+PJzOdZED143XMJ3LR6+dZ012q
>RGSp43Co+vUfuTtaTg030sLyDlXZKEenkPXy0202WpPaK4RYSonrnrxs0kmv+ZpH
>yamsZP2/gReZseBsKiww0FkqWGkIJxD7bi1r8DdLa/HLvwYUD+U2zrcUvT4cMXMR
>uHocf+Rs76lNHsMd/aOeCHcCvqcXJBjVVgmu6jnBFsAdgzfyfzsF6NKxC5Fnb5sH
>0yabUU/mPVn+JZRbp4QXtBqkEoJFME3qyPQOKDfGjzy7lzRJhKsZwJPrQB19NWmO
>iXIIIg7WouEzyYw21yij0VUDCAOwzjPlJe4hG3SwSvUPOaLTUTkQGHPs2MjsJ3aw
>YtuD81phsnEZ8jOw9FlMHvDToffNFr8QooJAMVZwSkGypOcQ81BJbutla95BXhO9
>x032EOJMUX3lte873Rt5qABgi0SDHzVom7wmuxek0GMkOupB+OxyGwfGB6qYw/Tq
>w3A2Zo779bUKJ2vZtP1tp3Z1aZ2NRuaDm6ukwYYD5tAax3S9dYGm1OvnlWv9seh8
>aEvsrGfJZrXSt/LX03MgKm01QHS+pxVPoPFaKq8rQW2z2WCHX32qJwolh8++RiYl
>MOOs5vT8LWiPIEel+3Cs
>=br/i
>-----END PGP SIGNATURE-----
>
>__________________________________________________________________________
>OpenStack Development Mailing List (not for usage questions)
>Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Open Stack

[openstack-dev] [nova] Proposal for an Experiment

OpenStack

Community

Documentation

Branding & Legal