[openstack-dev] [all] Million level scalability test report from cascading

Robert Collins robertc at robertcollins.net
Wed Apr 1 03:18:52 UTC 2015

On 31 March 2015 at 22:05, joehuang <joehuang at huawei.com> wrote:
> Hi, all,
> During the last cross project meeting[1][2] for the next step of OpenStack cascading solution[3], the conclusion of the meeting is "OpenStack isn't ready for the project, and if he want's it ready sooner than later, joehuang needs to help make it ready by working on scaling being coded now", and the scaling is on the first priority for OpenStack community.
> We just finished the 1 million VMs semi-simulation test report[4] for OpenStack cascading solution, the most interesting findings during the test is, the cascading architecture can support million level ports in Neutron, and also million level VMs in Nova. And the test report also shows that OpenStack cascading solution can manage up to 100k physical hosts without challenge. Some scaling issues were found during the test and listed in the report.
> The conclusion of the report is:
> "According to the Phase I and Phase II test data analysis, due to the hardware resources limitation, the OpenStack cascading solution with current configuration can supports a maximum of 1 million virtual machines and is capable of handling 500 concurrent API request if L3 (DVR) mode is included or, 1000 concurrent API request if only L2 networking needed. It's up to deployment policy to use OpenStack cascading solution inside one site ( one data center) or multi-sites (multi-data centers), the maximal sites (data centers) supported are 100, i.e., 100 cascaded OpenStack instances."
> The test report is shared first, let's discuss the next step later.

Wow thats beautiful stuff.

The next time someone does a report like this, I'd like to suggest
some extra metrics to capture.
API failure rate: what % of API errors occur.
VM failure rate: what % of operations lead to a failed VM (e.g. not
deleted on delete, or not started on create, or didn't boot correctly)
block device failure rate similarly.

Looking in your results, I observe significant load in the
steady-state mode for most of the DB's. Thats a little worrying, if as
I assume steady-state means 'no new API calls being made'.


Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Converged Cloud

More information about the OpenStack-dev mailing list