[openstack-dev] [neutron] Neutron scaling datapoints?

Attila Fazekas afazekas at redhat.com
Fri Apr 17 15:42:33 UTC 2015





----- Original Message -----
> From: "joehuang" <joehuang at huawei.com>
> To: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org>
> Sent: Friday, April 17, 2015 9:46:12 AM
> Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?
> 
> Hi, Attila,
> 
> only address the issue of agent status/liveness management is not enough for
> Neutron scalability. The concurrent dynamic load impact on large scale ( for
> example 100k managed nodes with the dynamic load like security group rule
> update, routers_updated, etc ) should also be taken into account too. So
> even if is agent status/liveness management improved in Neutron, that
> doesn't mean the scalability issue totally being addressed.
> 

This story is not about the heartbeat.
https://bugs.launchpad.net/neutron/+bug/1438159

What I am looking for is managing lot of nodes, with minimal `controller` resources.

The actual required system changes like (for example regarding to vm boot) per/sec
is relative low, even if you have many nodes and vms. - Consider the instances average lifetime -

The `bug` is for the resources what the agents are related and querying many times,
BTW: I am thinking about several alternatives and other variants.

In neutron case a `system change` can affect multiple agents
like security group rule change.

It seams possible to have all agents to `query` a resource only once,
and being notified by any subsequent change `for free`. (IP, sec group rule, new neighbor) 

This is the scenario when the message brokers can shine and scale,
and it also offloads lot of work from the DB.


> And on the other hand, Nova already supports several segregation concepts,
> for example, Cells, Availability Zone... If there are 100k nodes to be
> managed by one OpenStack instances, it's impossible to work without hardware
> resources segregation. It's weird to put agent liveness manager in
> availability zone(AZ in short) 1, but all managed agents in AZ 2. If AZ 1 is
> power off, then all agents in AZ2 lost management.
> 
>
> The benchmark is already here for scalability "test report for million ports
> scalability of Neutron "
> http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cascading-solution-to-support-1-million-v-ms-in-100-data-centers
> 
> The cascading may be not perfect, but at least it provides a feasible way if
> we really want scalability.
> ========================
> I am also working to evolve OpenStack to a world no need to worry about
> "OpenStack Scalability Issue" based on cascading:
> 
> "Tenant level virtual OpenStack service over hybrid or federated or multiple
> OpenStack based clouds":
> 
> There are lots of OpenStack based clouds, each tenant will be allocated with
> one cascading OpenStack as the virtual OpenStack service, and single
> OpenStack API endpoint served for this tenant. The tenant's resources can be
> distributed or dynamically scaled to multi-OpenStack based clouds, these
> clouds may be federated with KeyStone, or using shared KeyStone, or  even
> some OpenStack clouds built in AWS or Azure, or VMWare vSphere.
>
> 
> Under this deployment scenario, unlimited scalability in a cloud can be
> achieved, no unified cascading layer, tenant level resources orchestration
> among multi-OpenStack clouds fully distributed(even geographically). The
> database and load for one casacding OpenStack is very very small, easy for
> disaster recovery or backup. Multiple tenant may share one cascading
> OpenStack to reduce resource waste, but the principle is to keep the
> cascading OpenStack as thin as possible.
>
> You can find the information here:
> https://wiki.openstack.org/wiki/OpenStack_cascading_solution#Use_Case
> 
> Best Regards
> Chaoyi Huang ( joehuang )
> 
> -----Original Message-----
> From: Attila Fazekas [mailto:afazekas at redhat.com]
> Sent: Thursday, April 16, 2015 3:06 PM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?
> 
> 
> 
> 
> 
> ----- Original Message -----
> > From: "joehuang" <joehuang at huawei.com>
> > To: "OpenStack Development Mailing List (not for usage questions)"
> > <openstack-dev at lists.openstack.org>
> > Sent: Sunday, April 12, 2015 3:46:24 AM
> > Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?
> > 
> > 
> > 
> > As Kevin talking about agents, I want to remind that in TCP/IP stack,
> > port ( not Neutron Port ) is a two bytes field, i.e. port ranges from
> > 0 ~ 65535, supports maximum 64k port number.
> > 
> > 
> > 
> > " above 100k managed node " means more than 100k L2 agents/L3
> > agents... will be alive under Neutron.
> > 
> > 
> > 
> > Want to know the detail design how to support 99.9% possibility for
> > scaling Neutron in this way, and PoC and test would be a good support for
> > this idea.
> > 
> 
> Would you consider something as PoC which uses the technology in similar way,
> with a similar port - security problem, but with a lower level API than
> neutron using currently ?
> 
> Is it an acceptable flaw:
> If you kill -9 the q-svc 10000 times at the `right` millisec the rabbitmq
> memory usage increases by ~1MiB ? (Rabbit usually eats ~10GiB under
> pressure) The memory can be freed without broker restart, it also gets freed
> on agent restart.
> 
> 
> > 
> > 
> > "I'm 99.9% sure, for scaling above 100k managed node, we do not really
> > need to split the openstack to multiple smaller openstack, or use
> > significant number of extra controller machine."
> > 
> > 
> > 
> > Best Regards
> > 
> > 
> > 
> > Chaoyi Huang ( joehuang )
> > 
> > 
> > 
> > From: Kevin Benton [blak111 at gmail.com]
> > Sent: 11 April 2015 12:34
> > To: OpenStack Development Mailing List (not for usage questions)
> > Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?
> > 
> > Which periodic updates did you have in mind to eliminate? One of the
> > few remaining ones I can think of is sync_routers but it would be
> > great if you can enumerate the ones you observed because eliminating
> > overhead in agents is something I've been working on as well.
> > 
> > One of the most common is the heartbeat from each agent. However, I
> > don't think we can't eliminate them because they are used to determine
> > if the agents are still alive for scheduling purposes. Did you have
> > something else in mind to determine if an agent is alive?
> > 
> > On Fri, Apr 10, 2015 at 2:18 AM, Attila Fazekas < afazekas at redhat.com
> > >
> > wrote:
> > 
> > 
> > I'm 99.9% sure, for scaling above 100k managed node, we do not really
> > need to split the openstack to multiple smaller openstack, or use
> > significant number of extra controller machine.
> > 
> > The problem is openstack using the right tools SQL/AMQP/(zk), but in a
> > wrong way.
> > 
> > For example.:
> > Periodic updates can be avoided almost in all cases
> > 
> > The new data can be pushed to the agent just when it needed.
> > The agent can know when the AMQP connection become unreliable (queue
> > or connection loose), and needs to do full sync.
> > https://bugs.launchpad.net/neutron/+bug/1438159
> > 
> > Also the agents when gets some notification, they start asking for
> > details via the AMQP -> SQL. Why they do not know it already or get it
> > with the notification ?
> > 
> > 
> > ----- Original Message -----
> > > From: "Neil Jerram" < Neil.Jerram at metaswitch.com >
> > > To: "OpenStack Development Mailing List (not for usage questions)" <
> > > openstack-dev at lists.openstack.org >
> > > Sent: Thursday, April 9, 2015 5:01:45 PM
> > > Subject: Re: [openstack-dev] [neutron] Neutron scaling datapoints?
> > > 
> > > Hi Joe,
> > > 
> > > Many thanks for your reply!
> > > 
> > > On 09/04/15 03:34, joehuang wrote:
> > > > Hi, Neil,
> > > > 
> > > > From theoretic, Neutron is like a "broadcast" domain, for example,
> > > > enforcement of DVR and security group has to touch each regarding
> > > > host where there is VM of this project resides. Even using SDN
> > > > controller, the "touch" to regarding host is inevitable. If there
> > > > are plenty of physical hosts, for example, 10k, inside one
> > > > Neutron, it's very hard to overcome the "broadcast storm" issue
> > > > under concurrent operation, that's the bottleneck for scalability of
> > > > Neutron.
> > > 
> > > I think I understand that in general terms - but can you be more
> > > specific about the broadcast storm? Is there one particular message
> > > exchange that involves broadcasting? Is it only from the server to
> > > agents, or are there 'broadcasts' in other directions as well?
> > > 
> > > (I presume you are talking about control plane messages here, i.e.
> > > between Neutron components. Is that right? Obviously there can also
> > > be broadcast storm problems in the data plane - but I don't think
> > > that's what you are talking about here.)
> > > 
> > > > We need layered architecture in Neutron to solve the "broadcast domain"
> > > > bottleneck of scalability. The test report from OpenStack
> > > > cascading shows that through layered architecture "Neutron
> > > > cascading", Neutron can supports up to million level ports and
> > > > 100k level physical hosts. You can find the report here:
> > > > http://www.slideshare.net/JoeHuang7/test-report-for-open-stack-cas
> > > > cading-solution-to-support-1-million-v-ms-in-100-data-centers
> > > 
> > > Many thanks, I will take a look at this.
> > > 
> > > > "Neutron cascading" also brings extra benefit: One cascading
> > > > Neutron can have many cascaded Neutrons, and different cascaded
> > > > Neutron can leverage different SDN controller, maybe one is ODL,
> > > > the other one is OpenContrail.
> > > > 
> > > > ----------------Cascading Neutron------------------- / \
> > > > --cascaded Neutron-- --cascaded Neutron-----
> > > > | | 
> > > > ---------ODL------ ----OpenContrail--------
> > > > 
> > > > 
> > > > And furthermore, if using Neutron cascading in multiple data
> > > > centers, the DCI controller (Data center inter-connection
> > > > controller) can also be used under cascading Neutron, to provide
> > > > NaaS ( network as a service ) across data centers.
> > > > 
> > > > ---------------------------Cascading
> > > > Neutron-------------------------- / | \ --cascaded Neutron-- -DCI
> > > > controller- --cascaded Neutron-----
> > > > | | | 
> > > > ---------ODL------ | ----OpenContrail--------
> > > > | 
> > > > --(Data center 1)-- --(DCI networking)-- --(Data center 2)--
> > > > 
> > > > Is it possible for us to discuss this in OpenStack Vancouver summit?
> > > 
> > > Most certainly, yes. I will be there from mid Monday afternoon
> > > through to end Friday. But it will be my first summit, so I have no
> > > idea yet as to how I might run into you - please can you suggest!
> > > 
> > > > Best Regards
> > > > Chaoyi Huang ( Joe Huang )
> > > 
> > > Regards,
> > > Neil
> > > 
> > > ____________________________________________________________________
> > > ______ OpenStack Development Mailing List (not for usage questions)
> > > Unsubscribe:
> > > OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > > 
> > 
> > ______________________________________________________________________
> > ____ OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> > OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > 
> > 
> > 
> > --
> > Kevin Benton
> > 
> > ______________________________________________________________________
> > ____ OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> > OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 



More information about the OpenStack-dev mailing list