[openstack-dev] [TripleO] Tuskar CLI after architecture changes

Jay Pipes jaypipes at gmail.com
Thu Dec 19 23:17:31 UTC 2013


On 12/19/2013 04:55 AM, Radomir Dopieralski wrote:
> On 14/12/13 16:51, Jay Pipes wrote:
>
> [snip]
>
>> Instead of focusing on locking issues -- which I agree are very
>> important in the virtualized side of things where resources are
>> "thinner" -- I believe that in the bare-metal world, a more useful focus
>> would be to ensure that the Tuskar API service treats related group
>> operations (like "deploy an undercloud on these nodes") in a way that
>> can handle failures in a graceful and/or atomic way.
>
> Atomicity of operations can be achieved by intoducing critical sections.
> You basically have two ways of doing that, optimistic and pessimistic.
> Pessimistic critical section is implemented with a locking mechanism
> that prevents all other processes from entering the critical section
> until it is finished.

I'm familiar with the traditional non-distributed software concept of a 
mutex (or in Windows world, a critical section). But we aren't dealing 
with traditional non-distributed software here. We're dealing with 
highly distributed software where components involved in the 
"transaction" may not be running on the same host or have much awareness 
of each other at all.

And, in any case (see below), I don't think that this is a problem that 
needs to be solved in Tuskar.

> Perhaps you have some other way of making them atomic that I can't think of?

I should not have used the term atomic above. I actually do not think 
that the things that Tuskar/Ironic does should be viewed as an atomic 
operation. More below.

>> For example, if the construction or installation of one compute worker
>> failed, adding some retry or retry-after-wait-for-event logic would be
>> more useful than trying to put locks in a bunch of places to prevent
>> multiple sysadmins from trying to deploy on the same bare-metal nodes
>> (since it's just not gonna happen in the real world, and IMO, if it did
>> happen, the sysadmins/deployers should be punished and have to clean up
>> their own mess ;)
>
> I don't see why they should be punished, if the UI was assuring them
> that they are doing exactly the thing that they wanted to do, at every
> step, and in the end it did something completely different, without any
> warning. If anyone deserves punishment in such a situation, it's the
> programmers who wrote the UI in such a way.

The issue I am getting at is that, in the real world, the problem of 
multiple users of Tuskar attempting to deploy an undercloud on the exact 
same set of bare metal machines is just not going to happen. If you 
think this is actually a real-world problem, and have seen two sysadmins 
actively trying to deploy an undercloud on bare-metal machines at the 
same time without unbeknownst to each other, then I feel bad for the 
sysadmins that found themselves in such a situation, but I feel its 
their own fault for not knowing about what the other was doing.

Trying to make a complex series of related but distributed actions -- 
like the underlying actions of the Tuskar -> Ironic API calls -- into an 
atomic operation is just not a good use of programming effort, IMO. 
Instead, I'm advocating that programming effort should instead be spent 
coding a workflow/taskflow pipeline that can gracefully retry failed 
operations and report the state of the total taskflow back to the user.

Hope that makes more sense,
-jay



More information about the OpenStack-dev mailing list