[openstack-dev] [tc] [all] OpenStack moving both too fast and too slow at the same time

Octave J. Orgeron octave.orgeron at oracle.com
Mon May 8 17:10:34 UTC 2017


Hi Kevin,

I agree that OpenStack may need to re-architect and rethink certain 
design choices. That eventually has to happen to any open source or 
commercial product where it becomes to bloated and complex. k8s is a 
good example of something that has grown rapidly and is simple at this 
point. But it's inevitable that k8s will become more complicated as 
features such as networking (firewalls, load balancing, SDN, etc.) get 
thrown into the mix. To keep the bloat and complexity down, it takes 
good architecture and governance just like anything else in the IT world.

I do agree that scalability and high-availability are definitely issues 
for OpenStack when you dig deeper into the sub-components. There is a 
lot of re-inventing of the wheel when you look at how distributed 
services are implemented inside of OpenStack and deficiencies. For some 
services you have a scheduler that can scale-out, but the conductor or 
worker process doesn't. A good example is cinder, where cinder-volume 
doesn't scale-out in a distributed manner and doesn't have a good 
mechanism for recovering when an instance fails. All across the services 
you see different methods for coordinating requests and tasks such as 
rabbitmq, redis, memcached, tooz, mysql, etc. So for an operator, you 
have to sift through those choices and configure the per-requisite 
infrastructure. This is a good example of a problem that should be 
solved with a single architecturally sound solution that all services 
can standardize on.

The problem in a lot of those cases comes down to development being 
detached from the actual use cases customers and operators are going to 
use in the real world. Having a distributed control plane with multiple 
instances of the api, scheduler, coordinator, and other processes is 
typically not testable without a larger hardware setup. When you get to 
large scale deployments, you need an active/active setup for the control 
plane. It's definitely not something you could develop for or test 
against on a single laptop with devstack. Especially, if you want to use 
more than a handful of the OpenStack services.

An OpenStack v2.0 may be the right way to address those issues and do 
the architecture work to get OpenStack to scale, reduce complexity, and 
make it easier for things like upgrades.

Octave

On 5/5/2017 5:44 PM, Fox, Kevin M wrote:
> Note, when I say OpenStack below, I'm talking about nova/glance/cinder/neutron/horizon/heat/octavia/designate. No offence to the other projects intended. just trying to constrain the conversation a bit... Those parts are fairly comparable to what k8s provides.
>
> I think part of your point is valid, that k8s isn't as feature rich in some ways, (networking for example), and will get more complex in time. But it has a huge amount of functionality for significantly less effort compared to an OpenStack deployment with similar functionality today.
>
> I think there are some major things different between the two projects that are really paying off for k8s over OpenStack right now. We can use those as learning opportunities moving forward or the gap will continue to widen, as will the user migrations away from OpenStack. These are mostly architectural things.
>
> Versions:
>   * The real core of OpenStack is essentially version 1 + iterative changes.
>   * k8s is essentially the third version of Borg. Plenty of room to ditch bad ideas/decisions.
>
> That means OpenStack's architecture has essentially grown organically rather then being as carefully thought out. The backwards compatibility has been a good goal, but its so hard to upgrade most places burn it down and stand up something new anyway so a lot of work with a lot less payoff then you would think. Maybe it is time to consider OpenStack version 2...
>
> I think OpenStack's greatest strength is its standardized api's. Thus far we've been changing the api's over time and keeping the implementation mostly the same... maybe we should consider keeping the api the same and switch some of the implementations out... It might take a while to get back to where we are now, but I suspect the overall solution would be much better now that we have so much experience with building the first one.
>
> k8s and OpenStack do largely the same thing. get in user request, schedule the resource onto some machines and allow management/lifecycle of the thing.
>
> Why then does k8s scalability goal target 5000 nodes and OpenStack really struggle with more then 300 nodes without a huge amount of extra work? I think its architecture. OpenStack really abuses rabbit, does a lot with relational databases that maybe are better done elsewhere, and forces isolation between projects that maybe is not the best solution.
>
> Part of it I think is combined services. They don't have separate services for cinder-api/nova-api,neutron-api/heat-api/etc. Just kube-apiserver. same with the *-schedulers, just kube-scheduler. This means many fewer things to manage for ops, and allows for faster communication times (lower latency). In theory OpenStack could scale out much better with the finer grained services. but I'm not sure thats really ever shown true in practice.
>
> Layered/eating own dogfood:
>   * OpenStack assumes the operator will install "all the things".
>   * K8s uses k8s to deploy lots of itself.
>
> You use kubelet with the same yaml file format normally used to deploy stuff to deploy etcd/kube-apiserver/kube-scheduler/kube-controller-manager to get a working base system.
> You then use the base system to launch sdn, ingress, service-discovery, the ui, etc.
>
> This means the learning required is substantially less when it comes to debugging problems, performing upgrades, etc, because its the same for the most part for k8s as it is for any other app running on it. The learning costs is way lower.
>
> Versioning:
>   * With OpenStack, Upgrades are hard, mismatched version servers/agents are hard/impossible.
>   * K8s, they support the controllers being 2 versions ahead of the clients.
>
> Its hard to bolt this on after the fact, but its also harder when you have multiple communications channels to do it with. Having to do it in http, sql, and rabbit messages makes it so much harder. Having only one place talk to the single datastore (etcd) makes that easier, as is only having one place everything interacts with the servers kube-apiserver.
>
> Some amount of distribution:
>   * OpenStack components are generally expected to come from distro's.
>   * K8s, Core pieces like kube-apiserver are distributed prebuilt and ready to go in container images if you chose to use them.
>
> Minimal silo's:
>   * the various OpenStack projects are very silo'ed.
>   * Most of the k8s subsystems currently are all tightly integrated with each other and are managed by the same teams.
>
> This has lead to architectural decisions that take more of the bigger picture into account. Under OpenStack's current model, each project does their own thing without too much design into how it all comes together in the end. A lot of problems spring from this. I'm sure k8s will get more and more silo'ed as it grows and matures. But right now, the lack of silo's really are speeding its development.
>
> Anyway, I think I'm done rambling for now... I do hope we can reflect on some of the differences between the projects and see if we can figure out how to pull in some of the good ideas from k8s.
>
> Thanks,
> Kevin
>
> ________________________________________
> From: Michał Jastrzębski [inc007 at gmail.com]
> Sent: Friday, May 05, 2017 3:20 PM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [tc] [all] OpenStack moving both too fast and too slow at the same time
>
> You are talking about OpenStack being hard because it's complex and at
> the same time you're talking about using "non-linux-mainstream" tools
> around. It's either flexibility or ease guys... Prescriptive is easy,
> flexible is hard. When you want to learn about linux you're not
> starting from compiling gentoo, you're installing ubuntu, click "next"
> until it's finished and just trust it's working, after some time you
> grow skills in linux and customize it to your needs.
>
> We are talking about software that runs physics, hpc-like clusters,
> mobile phone communication and wordpresses across thousands of
> companies. It won't ever be simple and prescriptive. Best we can do is
> as you said, hide complexity of it and allow smoother entry until
> someone learns complexity. No tooling will ever replace experienced
> operator, tooling can make easier time to gain this experience.
>
> You mentioned Kubernetes as good example, Kubernetes is still
> relatively young project and doesn't support some of things that you
> yourself said you need. As it grows, as options becomes available, it
> too will become more and more complex.
>
> On 5 May 2017 at 14:52, Octave J. Orgeron <octave.orgeron at oracle.com> wrote:
>> +1
>>
>> On 5/5/2017 3:46 PM, Alex Schultz wrote:
>>>
>>>> Sooo... I always get a little triggered when I hear that OpenStack is
>>>> hard to deploy. We've spent last few years fixing it and I think it's
>>>> pretty well fixed now. Even as we speak I'm deploying 500+ vms on
>>>> OpenStack cluster I deployed last week within one day.
>>>>
>>> No, you've written a complex tool (that you understand) to do it.
>>> That's not the same someone who is not familiar with OpenStack trying
>>> to deploy OpenStack. I too could quickly deploy a decently scaled
>>> infrastructure with some of the tools (fuel/tripleo/puppet/etc), but
>>> the reality is that each one of these tools is inherently hiding the
>>> complexities of OpenStack.  Each (including yours) has their own
>>> flavor of assumptions baked in to make it work.  That is also
>>> confusing for the end user who tries to switch between them and only
>>> gets some of the flexibility of each but then runs face first into
>>> each tool's short comings.  Rather than assuming a tool has solved it
>>> (which it hasn't or we'd all be using the same one by now), how about
>>> we take some time to understand why we've had to write these tools in
>>> the first place and see if there's something we improve on?  Learning
>>> the tool to deploy OpenStack is not the same as deploying OpenStack,
>>> managing it, and turning it around for the true cloud end user to
>>> consume.
>>>
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




More information about the OpenStack-dev mailing list