[openstack-dev] [tc] [all] OpenStack moving both too fast and too slow at the same time
harlowja at fastmail.com
Wed May 10 16:00:38 UTC 2017
Dmitry Tantsur wrote:
> On 05/09/2017 07:59 PM, Joshua Harlow wrote:
>> Matt Riedemann wrote:
>>> On 5/8/2017 1:10 PM, Octave J. Orgeron wrote:
>>>> I do agree that scalability and high-availability are definitely issues
>>>> for OpenStack when you dig deeper into the sub-components. There is a
>>>> lot of re-inventing of the wheel when you look at how distributed
>>>> services are implemented inside of OpenStack and deficiencies. For some
>>>> services you have a scheduler that can scale-out, but the conductor or
>>>> worker process doesn't. A good example is cinder, where cinder-volume
>>>> doesn't scale-out in a distributed manner and doesn't have a good
>>>> mechanism for recovering when an instance fails. All across the
>>>> you see different methods for coordinating requests and tasks such as
>>>> rabbitmq, redis, memcached, tooz, mysql, etc. So for an operator, you
>>>> have to sift through those choices and configure the per-requisite
>>>> infrastructure. This is a good example of a problem that should be
>>>> solved with a single architecturally sound solution that all services
>>>> can standardize on.
>>> There was an architecture workgroup specifically designed to understand
>>> past architectural decisions in OpenStack, and what the differences are
>>> in the projects, and how to address some of those issues, but from lack
>>> of participation the group dissolved shortly after the Barcelona summit.
>>> This is, again, another example of if you want to make these kinds of
>>> massive changes, it's going to take massive involvement and leadership.
>> I agree on the 'massive changes, it's going to take massive
>> involvement and leadership.' though I am not sure how such changes and
>> involvement actually happens; especially nowadays where companies
>> which such leadership are moving on to something else (k8s, mesos, or
>> So knowing that what are the options to actually make some kind of
>> change occur? IMHO it must be driven by PTLs (yes I know they are
>> always busy, to bad, so sad, lol). I'd like all the PTLs to get
>> together and restart the arch-wg and make it a *requirement* that PTLs
>> actually show up (and participate) in that group/meeting vs it just
>> being a bunch of senior(ish) folks, such as myself, that showed up.
>> Then if PTLs do not show up, I would start to say that the next time
>> around they are running for PTL said lack of participation in the
>> wider openstack vision should be known and potentially cause them to
>> get kicked out (voted out?) of being a PTL in the future.
> How we have whom to blame. Problem solved?
Not likely just yet problem solved, but sometimes tough love (IMHO) is
needed. I believe it is, you may disagree and that's cool, but then I
might give you some tough love also, lol.
>>>> The problem in a lot of those cases comes down to development being
>>>> detached from the actual use cases customers and operators are going to
>>>> use in the real world. Having a distributed control plane with multiple
>>>> instances of the api, scheduler, coordinator, and other processes is
>>>> typically not testable without a larger hardware setup. When you get to
>>>> large scale deployments, you need an active/active setup for the
>>>> plane. It's definitely not something you could develop for or test
>>>> against on a single laptop with devstack. Especially, if you want to
>>>> more than a handful of the OpenStack services.
>> I've heard *crazy* things about actual use cases customers and
>> operators are doing because of the scaling limits that projects have
>> (ie nova has a limit of 300 compute nodes so ABC customer will then
>> setup X * 300 clouds to reach Y compute nodes because of that limit).
>> IMHO I'm not even sure I would want to target said use-cases in the
>> first place, because they feel messed up in the first place (and it
>> seems bad/dumb? to go down the rabbit hole of targeting use-cases that
>> were deployed to band-aid over the initial problems that created those
>> use-cases/deployments in the first place).
>>> I think we can all agree with this. Developers don't have a lab with
>>> 1000 nodes lying around to hack on. There was OSIC but that's gone. I've
>>> been requesting help in Nova from companies to do scale testing and help
>>> us out with knowing what the major issues are, and report those back in
>>> a form so we can work on those issues. People will report there are
>>> issues, but not do the profiling, or at least not report the results of
>>> profiling, upstream to help us out. So again, this is really up to
>>> companies that have the resources to do this kind of scale testing and
>>> report back and help fix the issues upstream in the community. That
>>> doesn't require OpenStack 2.0.
>> So how do we close that gap? The only way I really know is by having
>> people that can see the problems from the get-go, instead of having to
>> discover it at some later point (when it falls over and ABC customer
>> starts to start having Y clouds just to reach the target number of
>> compute nodes they want to reach). Now maybe the skill level in
>> openstack (especially in regards to distributed systems) is just to
>> low and the only real way to gather data is by having companies do
>> scale testing (ie some kind of architecting things to work after they
>> are deployed); if so that's sad...
>> OpenStack Development Mailing List (not for usage questions)
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
More information about the OpenStack-dev