[openstack-dev] [tc] [all] OpenStack moving both too fast and too slow at the same time
    Dmitry Tantsur 
    dtantsur at redhat.com
       
    Wed May 10 11:40:39 UTC 2017
    
    
  
On 05/09/2017 07:59 PM, Joshua Harlow wrote:
> Matt Riedemann wrote:
>> On 5/8/2017 1:10 PM, Octave J. Orgeron wrote:
>>> I do agree that scalability and high-availability are definitely issues
>>> for OpenStack when you dig deeper into the sub-components. There is a
>>> lot of re-inventing of the wheel when you look at how distributed
>>> services are implemented inside of OpenStack and deficiencies. For some
>>> services you have a scheduler that can scale-out, but the conductor or
>>> worker process doesn't. A good example is cinder, where cinder-volume
>>> doesn't scale-out in a distributed manner and doesn't have a good
>>> mechanism for recovering when an instance fails. All across the services
>>> you see different methods for coordinating requests and tasks such as
>>> rabbitmq, redis, memcached, tooz, mysql, etc. So for an operator, you
>>> have to sift through those choices and configure the per-requisite
>>> infrastructure. This is a good example of a problem that should be
>>> solved with a single architecturally sound solution that all services
>>> can standardize on.
>>
>> There was an architecture workgroup specifically designed to understand
>> past architectural decisions in OpenStack, and what the differences are
>> in the projects, and how to address some of those issues, but from lack
>> of participation the group dissolved shortly after the Barcelona summit.
>> This is, again, another example of if you want to make these kinds of
>> massive changes, it's going to take massive involvement and leadership.
> 
> I agree on the 'massive changes, it's going to take massive involvement and 
> leadership.' though I am not sure how such changes and involvement actually 
> happens; especially nowadays where companies which such leadership are moving on 
> to something else (k8s, mesos, or other...)
> 
> So knowing that what are the options to actually make some kind of change occur? 
> IMHO it must be driven by PTLs (yes I know they are always busy, to bad, so sad, 
> lol). I'd like all the PTLs to get together and restart the arch-wg and make it 
> a *requirement* that PTLs actually show up (and participate) in that 
> group/meeting vs it just being a bunch of senior(ish) folks, such as myself, 
> that showed up. Then if PTLs do not show up, I would start to say that the next 
> time around they are running for PTL said lack of participation in the wider 
> openstack vision should be known and potentially cause them to get kicked out 
> (voted out?) of being a PTL in the future.
How we have whom to blame. Problem solved?
> 
>>>
>>> The problem in a lot of those cases comes down to development being
>>> detached from the actual use cases customers and operators are going to
>>> use in the real world. Having a distributed control plane with multiple
>>> instances of the api, scheduler, coordinator, and other processes is
>>> typically not testable without a larger hardware setup. When you get to
>>> large scale deployments, you need an active/active setup for the control
>>> plane. It's definitely not something you could develop for or test
>>> against on a single laptop with devstack. Especially, if you want to use
>>> more than a handful of the OpenStack services.
> 
> I've heard *crazy* things about actual use cases customers and operators are 
> doing because of the scaling limits that projects have (ie nova has a limit of 
> 300 compute nodes so ABC customer will then setup X * 300 clouds to reach Y 
> compute nodes because of that limit).
> 
> IMHO I'm not even sure I would want to target said use-cases in the first place, 
> because they feel messed up in the first place (and it seems bad/dumb? to go 
> down the rabbit hole of targeting use-cases that were deployed to band-aid over 
> the initial problems that created those use-cases/deployments in the first place).
> 
>>
>> I think we can all agree with this. Developers don't have a lab with
>> 1000 nodes lying around to hack on. There was OSIC but that's gone. I've
>> been requesting help in Nova from companies to do scale testing and help
>> us out with knowing what the major issues are, and report those back in
>> a form so we can work on those issues. People will report there are
>> issues, but not do the profiling, or at least not report the results of
>> profiling, upstream to help us out. So again, this is really up to
>> companies that have the resources to do this kind of scale testing and
>> report back and help fix the issues upstream in the community. That
>> doesn't require OpenStack 2.0.
>>
> 
> So how do we close that gap? The only way I really know is by having people that 
> can see the problems from the get-go, instead of having to discover it at some 
> later point (when it falls over and ABC customer starts to start having Y clouds 
> just to reach the target number of compute nodes they want to reach). Now maybe 
> the skill level in openstack (especially in regards to distributed systems) is 
> just to low and the only real way to gather data is by having companies do scale 
> testing (ie some kind of architecting things to work after they are deployed); 
> if so that's sad...
> 
> -Josh
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
    
    
More information about the OpenStack-dev
mailing list