[openstack-dev] [TripleO] Kicking TripleO up a notch

Dan Prince dprince at redhat.com
Thu Oct 3 22:00:05 UTC 2013


Hi Robert,

In general I buy the vision laid out in this email. I think "Starting with the customer story" will keep us on the right track as to what features to implement, working on most important stuff, etc. A CD tripleO setup sounds just grand. For me though most of what you've laid out here is like kicking it up two notches (not just one). Not saying I don't think we should work towards this... just wondering if it is the most important thing at the moment. What I mean is... with all the instability in keeping TripleO working on a weekly basis would we be better served putting our resources on the CI front first. And once we have stability... then we kick things up another notch or two? Or perhaps we do both of these in parallel?

I like the idea of multiple lines of defense... but given limited resources I wonder if some simpler CI doesn't trump CD at this point.

Dan

----- Original Message -----
> From: "Robert Collins" <robertc at robertcollins.net>
> To: "OpenStack Development Mailing List" <openstack-dev at lists.openstack.org>
> Sent: Tuesday, October 1, 2013 4:37:16 AM
> Subject: [openstack-dev] [TripleO] Kicking TripleO up a notch
> 
> Warning, this is a little long, but it's the distillation of a
> 2.mumble hour call I had late last week with Devananda and Clint. It's
> a proposal: please do comment and critique it.
> 
> The tl;dr read is:
>  - we've been doing good work
>  - but most of us are currently focused on the tech rather than the
> customer stories
>  - lets fix that
>  - Start with customer story and work back to minimum work needed
>    - and we can actually be *delivering* that story [we have hardware
> for it, thanks to HP]
>  - Focus on efficiency and reducing firefighting before new features
>  - https://trello.com/tripleo as an experimental kanban for this [1]
> 
> Now for a less condensed, and hopefully more useful version :)
> 
> The night before the call I finished reading
> http://www.amazon.com/The-Phoenix-Project-Business-ebook/dp/B00AZRBLHO/ref=sr_1_9?s=digital-text&ie=UTF8&qid=1380182909&sr=1-9&keywords=the+goal
> which is a devops casting of 'The Goal', a seminal work in the LEAN
> manufacturing space. (It's terrible writing in a lot of ways, but it
> also does do a pretty good job IMO of highlighting the
> systems-thinking aspects of CD... but it doesn't drill into the
> detailed analysis of each aspect so some followup reading required to
> get chapter and verse on e.g. 'single item flow is ideal').
> 
> It reminded me very strongly of things I used to hold as very
> important, but I've been sidetracked into playing with the tech -
> which I love - and not focusing on ... 'The goal'. I grabbed Clint,
> and Deva, and tried to grab Joe - to get a cross section of focus
> areas : Heat, Ironic/NovaBM/Nova - to sanity check what was in my head
> :).
> 
> Our goal is to deliver a continuously deployed version of OpenStack.
> Right now, we're working on plumbing to build a /good/ version of
> that. Note the difference: 'deliver an X', 'building stuff to let us
> deliver a good X'.
> 
> This is key: we've managed to end up focusing on bottom-up work,
> rather than optimising our ability to deliver the thing, and
> iteratively improving it. The former is necessary but not sufficient.
> Tuskar has been working top down, and (as usual) this results in very
> fast progress; the rest of TripleO has provided a really solid
> foundation, but with many gaps and super rough spots...
> 
> So, I'd like to invert our priorities and start with the deliverable,
> however slipshod, and then iterate to make it better and better along
> the design paths we've already thought about. This could extend all
> the way to Tuskar, or we could start with the closest thing within
> reach, which is the existing 'no-state-for-tripleo' style CLI + API
> based tooling.
> 
> In the call we had, we agreed that this approach makes a lot of sense,
> and spent a bunch of time talking through the ramifications on TripleO
> and Ironic, and sketched out one way to slice and dice things;
> https://docs.google.com/drawings/d/1kgBlHvkW8Kj_ynCA5oCILg4sPqCUvmlytY5p1p9AjW0/edit?usp=sharing
> is the diagram we came up with.
> 
> The basic approach is to actually deliver the thing we want to deliver
> - a live working CD overcloud *ourselves* and iterate on that to make
> upgrades of that preserve state, then start tackling CD of it's
> infrastructure, then remove the seed.
> 
> Ramifications:
>  - long term a much better project health and responsiveness to
> changing user needs.
>  - may cause disruption in the short term as we do whats needed to get
> /something/ working.
>  - will need community buy-in and support to make it work : two of the
> key things about working Lean are keeping WIP - inventory - low and
> ensuring that bottlenecks are not used for anything other than
> bottleneck tasks. Both of these things impact what can be done at any
> point in time within the project: we may need to say 'no' to proposed
> work to permit driving more momentum as a whole... at least in the
> short term.
>  - highlights that we'll need much better communication about what
> work is suitable to tackle now vs what work is a distraction at this
> point
>    - Which implies much more work from someone in the group on
> surfacing work to do and where it's blocked. {I'll happily take this
> bullet, for now, for TripleO}
>  - May need more hardware :)
>  - We'll need to change how we pick work to work on, how we decide
> whether to accept new work or not, and how we prioritise things.
> 
> Basic principles:
>  - unblock bottlenecks first, then unblock everyone else.
>  - folk are still self directed - it's open source - but clear
> visibility of work needed and it's relevance to the product *right
> now* is crucial info for people to make good choices. (and similar
> Mark McLoughlin was asking about what bugs to work on which is a
> symptom of me/us failing to provide clear visibility)
>  - clear communication about TripleO and plans / strategy and priority
> (Datacentre ops? Continuous deployment story?)
> 
> Earlier this year, within HP, we setup a rack using TripleO of the
> time, for a customer, and the experience was fantastic: we made much
> more forward progress towards whats needed than we had in the month or
> two leading up to it... but then we went back to business as usual,
> and things went back to the prior pace.
> 
> Implementing this:
> For TripleO we've broken down the long term vision in a few phases
> that *start* with an end user deliverable and then backfill to add
> sophistication and polish.
> 
> We're suggesting that at any point in time the following should be the
> heuristics for TripleO contributors for what to work on:
> 1) Firedrill ‘something we've delivered broke’: Aim to avoid this but
> do if it happens it takes priority.
> 2) Things to make things we've delivered and are maintaining more
> reliable / less likely to break: Things that reduce category 1 work.
> 3) Things to make the things we've delivered better *or* things to
> make something new exist/get delivered.
> 
> Our long term steady state should be a small amount of category 2 work
> and a lot of category 3 with no category 1; but to get there we have
> to go through a crucible where it will be all category 1 and category
> 2: we should expect all forward momentum to stop while we get our
> stuff lined up and live. After that though we'll have a small stable
> *end product* base, and we can expand that out featurewise and depth
> (reliability/performance/reduce firedrills..)wise.
> 
> To surface WIP + current planned work, I find Kanban works super well.
> So I am proposing the following structure:
>  - Current work the team is focused on will be represented as Kanban cards
>  - Those cards can be standalone, or link to an etherpad, or a bug, or
> a blueprint as appropriate
>    - standalone cards should be those that don't fit as bugs or
> blueprints; we shouldn't replace those other tracking systems
>  - As a team we all commit to picking up work based on the heuristics above
>  - The kanban exposes the category of work directly, making it easy to choose
>  - if there is someone working on a higher category of work than us,
> we should bias to *helping them* rather than continuing on our own way
> or picking up a new lower category card: it's better to unblock the
> system as a whole than push forward something we can't use yet.
> 
> Clint and I have setup a draft Kanban, so we can concretely discuss
> how this looks and feels.
> 
> Seeking-yr-thoughts-ly,
> Rob
> 
> Notes:
> 1 - trello is not a super good or bad Kanban, it has the significant
> advantages for an experiment that its free and already operational.
> Should we decide this works, we'd want to work with -infra to get a
> Kanban suitable for dealing with much or all of OpenStack lined up
> sooner rather than later. In particular, with the large number of
> developers OpenStack has, any outages or defects in a system can have
> a huge negative multiplier when they crop up.
> 
> --
> Robert Collins <rbtcollins at hp.com>
> Distinguished Technologist
> HP Converged Cloud
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 



More information about the OpenStack-dev mailing list