[openstack-dev] [tripleo][ironic] Hardware provisioning testing for Ocata

Joe Talerico jtaleric at redhat.com
Tue Jun 13 23:12:12 UTC 2017


On Fri, Jun 9, 2017 at 7:28 AM, Justin Kilpatrick <jkilpatr at redhat.com> wrote:
> On Fri, Jun 9, 2017 at 5:25 AM, Dmitry Tantsur <dtantsur at redhat.com> wrote:
>> This number of "300", does it come from your testing or from other sources?
>> If the former, which driver were you using? What exactly problems have you
>> seen approaching this number?
>
> I haven't encountered this issue personally, but talking to Joe
> Talerico and some operators at summit around this number a single
> conductor begins to fall behind polling all of the out of band
> interfaces for the machines that it's responsible for. You start to
> see what you would expect from polling running behind, like incorrect
> power states listed for machines and a general inability to perform
> machine operations in a timely manner.
>
> Having spent some time at the Ironic operators form this is pretty
> normal and the correct response is just to scale out conductors, this
> is a problem with TripleO because we don't really have a scale out
> option with a single machine design. Fortunately just increasing the
> time between interface polling acts as a pretty good stopgap for this
> and lets Ironic catch up.
>
> I may get some time on a cloud of that scale in the future, at which
> point I will have hard numbers to give you. One of the reasons I made
> YODA was the frustrating prevalence of anecdotes instead of hard data
> when it came to one of the most important parts of the user
> experience. If it doesn't deploy people don't use it, full stop.
>
>> Could you please elaborate? (a bug could also help). What exactly were you
>> doing?
>
> https://bugs.launchpad.net/ironic/+bug/1680725

Additionally, I would like to see more verbose output from the
cleaning : https://bugs.launchpad.net/ironic/+bug/1670893

>
> Describes exactly what I'm experiencing. Essentially the problem is
> that nodes can and do fail to pxe, then cleaning fails and you just
> lose the nodes. Users have to spend time going back and babysitting
> these nodes and there's no good instructions on what to do with failed
> nodes anyways. The answer is move them to manageable and then to
> available at which point they go back into cleaning until it finally
> works.
>
> Like introspection was a year ago this is a cavalcade of documentation
> problems and software issues. I mean really everything *works*
> technically but the documentation acts like cleaning will work all the
> time and so does the software, leaving the user to figure out how to
> accommodate the realities of the situation without so much as a
> warning that it might happen.
>
> This comes out as more of a ux issue than a software one, but we can't
> just ignore these.
>
> - Justin
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list