[openstack-dev] [neutron] Neutron scaling datapoints?

joehuang joehuang at huawei.com
Fri Apr 17 07:46:12 UTC 2015


Hi, Attila,

only address the issue of agent status/liveness management is not enough for Neutron scalability. The concurrent dynamic load impact on large scale ( for example 100k managed nodes with the dynamic load like security group rule update, routers_updated, etc ) should also be taken into account too. So even if is agent status/liveness management improved in Neutron, that doesn't mean the scalability issue totally being addressed.

And on the other hand, Nova already supports several segregation concepts, for example, Cells, Availability Zone... If there are 100k nodes to be managed by one OpenStack instances, it's impossible to work without hardware resources segregation. It's weird to put agent liveness manager in availability zone(AZ in short) 1, but all managed agents in AZ 2. If AZ 1 is power off, then all agents in AZ2 lost management. 

The benchmark is already here for scalability "test report for million ports scalability of Neutron " 
http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascading-solution-to-support-1-million-v-ms-in-100-data-centers

The cascading may be not perfect, but at least it provides a feasible way if we really want scalability.
========================
I am also working to evolve OpenStack to a world no need to worry about "OpenStack Scalability Issue" based on cascading:

"Tenant level virtual OpenStack service over hybrid or federated or multiple OpenStack based clouds":

There are lots of OpenStack based clouds, each tenant will be allocated with one cascading OpenStack as the virtual OpenStack service, and single OpenStack API endpoint served for this tenant. The tenant's resources can be distributed or dynamically scaled to multi-OpenStack based clouds, these clouds may be federated with KeyStone, or using shared KeyStone, or  even some OpenStack clouds built in AWS or Azure, or VMWare vSphere.

Under this deployment scenario, unlimited scalability in a cloud can be achieved, no unified cascading layer, tenant level resources orchestration among multi-OpenStack clouds fully distributed(even geographically). The database and load for one casacding OpenStack is very very small, easy for disaster recovery or backup. Multiple tenant may share one cascading OpenStack to reduce resource waste, but the principle is to keep the cascading OpenStack as thin as possible.

You can find the information here:
https://wiki.openstack.org/wiki/OpenStack_cascading_solution#Use_Case

Best Regards
Chaoyi Huang ( joehuang )

-----Original Message-----
From: Attila Fazekas [mailto:afazekas at redhat.com] 
Sent: Thursday, April 16, 2015 3:06 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?





----- Original Message -----
> From: "joehuang" <joehuang at huawei.com>
> To: "OpenStack Development Mailing List (not for usage questions)" 
> <openstack-dev at lists.openstack.org>
> Sent: Sunday, April 12, 2015 3:46:24 AM
> Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?
> 
> 
> 
> As Kevin talking about agents, I want to remind that in TCP/IP stack, 
> port ( not Neutron Port ) is a two bytes field, i.e. port ranges from 
> 0 ~ 65535, supports maximum 64k port number.
> 
> 
> 
> " above 100k managed node " means more than 100k L2 agents/L3 
> agents... will be alive under Neutron.
> 
> 
> 
> Want to know the detail design how to support 99.9% possibility for 
> scaling Neutron in this way, and PoC and test would be a good support for this idea.
> 

Would you consider something as PoC which uses the technology in similar way, with a similar port - security problem, but with a lower level API than neutron using currently ?

Is it an acceptable flaw:
If you kill -9 the q-svc 10000 times at the `right` millisec the rabbitmq memory usage increases by ~1MiB ? (Rabbit usually eats ~10GiB under pressure) The memory can be freed without broker restart, it also gets freed on agent restart.


> 
> 
> "I'm 99.9% sure, for scaling above 100k managed node, we do not really 
> need to split the openstack to multiple smaller openstack, or use 
> significant number of extra controller machine."
> 
> 
> 
> Best Regards
> 
> 
> 
> Chaoyi Huang ( joehuang )
> 
> 
> 
> From: Kevin Benton [blak111 at gmail.com]
> Sent: 11 April 2015 12:34
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?
> 
> Which periodic updates did you have in mind to eliminate? One of the 
> few remaining ones I can think of is sync_routers but it would be 
> great if you can enumerate the ones you observed because eliminating 
> overhead in agents is something I've been working on as well.
> 
> One of the most common is the heartbeat from each agent. However, I 
> don't think we can't eliminate them because they are used to determine 
> if the agents are still alive for scheduling purposes. Did you have 
> something else in mind to determine if an agent is alive?
> 
> On Fri, Apr 10, 2015 at 2:18 AM, Attila Fazekas < afazekas at redhat.com 
> >
> wrote:
> 
> 
> I'm 99.9% sure, for scaling above 100k managed node, we do not really 
> need to split the openstack to multiple smaller openstack, or use 
> significant number of extra controller machine.
> 
> The problem is openstack using the right tools SQL/AMQP/(zk), but in a 
> wrong way.
> 
> For example.:
> Periodic updates can be avoided almost in all cases
> 
> The new data can be pushed to the agent just when it needed.
> The agent can know when the AMQP connection become unreliable (queue 
> or connection loose), and needs to do full sync.
> https://bugs.launchpad.net/neutron/+bug/1438159
> 
> Also the agents when gets some notification, they start asking for 
> details via the AMQP -> SQL. Why they do not know it already or get it 
> with the notification ?
> 
> 
> ----- Original Message -----
> > From: "Neil Jerram" < Neil.Jerram at metaswitch.com >
> > To: "OpenStack Development Mailing List (not for usage questions)" < 
> > openstack-dev at lists.openstack.org >
> > Sent: Thursday, April 9, 2015 5:01:45 PM
> > Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?
> > 
> > Hi Joe,
> > 
> > Many thanks for your reply!
> > 
> > On 09/04/15 03:34, joehuang wrote:
> > > Hi, Neil,
> > > 
> > > From theoretic, Neutron is like a "broadcast" domain, for example, 
> > > enforcement of DVR and security group has to touch each regarding 
> > > host where there is VM of this project resides. Even using SDN 
> > > controller, the "touch" to regarding host is inevitable. If there 
> > > are plenty of physical hosts, for example, 10k, inside one 
> > > Neutron, it's very hard to overcome the "broadcast storm" issue 
> > > under concurrent operation, that's the bottleneck for scalability of Neutron.
> > 
> > I think I understand that in general terms - but can you be more 
> > specific about the broadcast storm? Is there one particular message 
> > exchange that involves broadcasting? Is it only from the server to 
> > agents, or are there 'broadcasts' in other directions as well?
> > 
> > (I presume you are talking about control plane messages here, i.e.
> > between Neutron components. Is that right? Obviously there can also 
> > be broadcast storm problems in the data plane - but I don't think 
> > that's what you are talking about here.)
> > 
> > > We need layered architecture in Neutron to solve the "broadcast domain"
> > > bottleneck of scalability. The test report from OpenStack 
> > > cascading shows that through layered architecture "Neutron 
> > > cascading", Neutron can supports up to million level ports and 
> > > 100k level physical hosts. You can find the report here:
> > > http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cas
> > > cading-solution-to-support-1-million-v-ms-in-100-data-centers
> > 
> > Many thanks, I will take a look at this.
> > 
> > > "Neutron cascading" also brings extra benefit: One cascading 
> > > Neutron can have many cascaded Neutrons, and different cascaded 
> > > Neutron can leverage different SDN controller, maybe one is ODL, 
> > > the other one is OpenContrail.
> > > 
> > > ----------------Cascading Neutron------------------- / \ 
> > > --cascaded Neutron-- --cascaded Neutron-----
> > > | | 
> > > ---------ODL------ ----OpenContrail--------
> > > 
> > > 
> > > And furthermore, if using Neutron cascading in multiple data 
> > > centers, the DCI controller (Data center inter-connection 
> > > controller) can also be used under cascading Neutron, to provide 
> > > NaaS ( network as a service ) across data centers.
> > > 
> > > ---------------------------Cascading 
> > > Neutron-------------------------- / | \ --cascaded Neutron-- -DCI 
> > > controller- --cascaded Neutron-----
> > > | | | 
> > > ---------ODL------ | ----OpenContrail--------
> > > | 
> > > --(Data center 1)-- --(DCI networking)-- --(Data center 2)--
> > > 
> > > Is it possible for us to discuss this in OpenStack Vancouver summit?
> > 
> > Most certainly, yes. I will be there from mid Monday afternoon 
> > through to end Friday. But it will be my first summit, so I have no 
> > idea yet as to how I might run into you - please can you suggest!
> > 
> > > Best Regards
> > > Chaoyi Huang ( Joe Huang )
> > 
> > Regards,
> > Neil
> > 
> > ____________________________________________________________________
> > ______ OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: 
> > OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > 
> 
> ______________________________________________________________________
> ____ OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: 
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> 
> --
> Kevin Benton
> 
> ______________________________________________________________________
> ____ OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: 
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list