[openstack-dev] [tripleo][ironic] Hardware provisioning testing for Ocata

Joe Talerico jtaleric at redhat.com
Tue Jun 13 22:48:18 UTC 2017


On Fri, Jun 9, 2017 at 5:25 AM, Dmitry Tantsur <dtantsur at redhat.com> wrote:
> On 06/08/2017 02:21 PM, Justin Kilpatrick wrote:
>>
>> Morning everyone,
>>
>> I've been working on a performance testing tool for TripleO hardware
>> provisioning operations off and on for about a year now and I've been
>> using it to try and collect more detailed data about how TripleO
>> performs in scale and production use cases. Perhaps more importantly
>> YODA (Yet Openstack Deployment Tool, Another) automates the task
>> enough that days of deployment testing is a set it and forget it
>> operation. >
>> You can find my testing tool here [0] and the test report [1] has
>> links to raw data and visualization. Just scroll down, click the
>> capcha and click "go to kibana". I  still need to port that machine
>> from my own solution over to search guard.
>>
>> If you have too much email to consider clicking links I'll copy the
>> results summary here.
>>
>> TripleO inspection workflows have seen massive improvements from
>> Newton with a failure rate for 50 nodes with the default workflow
>> falling from 100% to <15%. Using patches slated for Pike that spurious
>> failure rate reaches zero.
>
>
> \o/
>
>>
>> Overcloud deployments show a significant improvement of deployment
>> speed in HA and stack update tests.
>>
>> Ironic deployments in the overcloud allow the use of Ironic for bare
>> metal scale out alongside more traditional VM compute. Considering a
>> single conductor starts to struggle around 300 nodes it will be
>> difficult to push a multi conductor setup to it's limits.
>
>
> This number of "300", does it come from your testing or from other sources?

Dmitry - The "300" comes from my testing on different environments.

Most recently, here is what I saw at CNCF -
https://snapshot.raintank.io/dashboard/snapshot/Sp2wuk2M5adTpqfXMJenMXcSlCav2PiZ

The undercloud was "idle" during this period.

> If the former, which driver were you using?

pxe_ipmitool.

> What exactly problems have you seen approaching this number?

I would have to restart ironic-conductor before every scale-up, which
here is what ironic-conductor looks like after a restart
https://snapshot.raintank.io/dashboard/snapshot/Im3AxP6qUfMnTeB97kryUcQV6otY0bHP
. Without restarting ironic, the scale up would fail due to ironic (I
do not have the exact error we would encounter documented).

>
>>
>> Finally Ironic node cleaning, shows a similar failure rate to
>> inspection and will require similar attention in TripleO workflows to
>> become painless.
>
>
> Could you please elaborate? (a bug could also help). What exactly were you
> doing?
>
>>
>> [0] https://review.openstack.org/#/c/384530/
>> [1]
>> https://docs.google.com/document/d/194ww0Pi2J-dRG3-X75mphzwUZVPC2S1Gsy1V0K0PqBo/
>>
>> Thanks for your time!
>
>
> Thanks for YOUR time, this work is extremely valuable!
>
>
>>
>> - Justin
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list