[openstack-dev] [Fuel][Fuel-Library] Fuel CI issues

Andrew Woodward xarses at gmail.com
Wed Mar 9 22:56:04 UTC 2016


Today we had a sync up call and discussed this. To summarize

Attendees:
Aleksandr Didenko
Alex Schultz
Andrew Woodward
Alexey Shtokolov
Bartek Kupidura
Bogdan Dobrelya
Denis Egorenko
Ivan Berezovskiy
Kyrylo Galanov
Maksim Malchuk
Matthew Mosesohn
Max Yatsenko
Oleg Gelbukh
Oleksiy Molchanov
Petr Zhurba
Sergey Kolekonov
Sergey Vasilenko
Sergii Golovatiuk
Stanislav Makar
Stanislaw Bogatkin
Vladimir Eremin
Vladimir Kuklin

Issue: moving to puppet-openstack on master has exposed fuel-library to
breakage and there are many concerns about changes landing that can break
it.

Alex S. Proposed that we continue the course, and finish setting up Check
voting on the relevant puppet-openstack modules - The participants agreed
with this

Action: Sergii G & Aleksandra Fedorova will propose needed changes to
project-config to add tests

Issue: closing the regressions gap until fuel-ci votes on puppet-openstack
check

It was proposed that we invent a system that holds back the versions
nightly, and after completion of automated testing; It can automatically
move it forward.

Action: There was no consensus on this and should be discussed here further
on this thread.



On Sun, Mar 6, 2016 at 11:33 PM Dmitry Borodaenko <dborodaenko at mirantis.com>
wrote:

> Aleksandra,
>
> Very good point on separating the concerns about integration tests for
> Fuel as a whole and verifying commits to a single component such as
> fuel-library. In theory, it could support the right balance between
> stable CI and up-to-date code, but only if we resolve the two remaining
> problems: one small and technical and the other large and social.
>
> You've already pointed out the first problem: update of fuel-library CI
> environment is not yet fully automated, and so the environment is liable
> to lag behind all involved components for days if not weeks.
>
> This by itself is simple enough, if labourous, to work around (update it
> manually every day, or after every successful BVT), but still leaves us
> with the problem of motivation.
>
> We've been discussing the CI duty for fuel-library integration with
> puppet-openstack since more than a month ago [0], and it has
> continuously failed to materialize. Within days of getting an action
> item in that IRC meeting to arrange it, Andrew Maksimov has responded
> privately that nobody in his team has time for this. And we all know
> what "I don't have time" actually means [1]. Two weeks later, we were
> ready to launch the integration and the question of CI duty came up
> again [2], with the same result.
>
> [0]
> http://eavesdrop.openstack.org/meetings/fuel/2016/fuel.2016-02-04-16.02.log.html#l-66
> [1]
> http://lifehacker.com/5892948/instead-of-saying-i-dont-have-time-say-its-not-a-priority
> [2]
> http://eavesdrop.openstack.org/meetings/fuel/2016/fuel.2016-02-18-16.00.log.html#l-190
>
> Here we are two more weeks later, the integration is on, and the first
> reaction from fuel-library core reviewers is "we don't have time to deal
> with this, turn it back off right now". And I'm not just summarizing
> Vladimir's email, on Friday we had a long thread on an internal mailing
> list with exactly this in the subject line (my apologies, but my disgust
> at the fact that it was started behind closed doors drowns any qualms
> about dragging it back into the open).
>
> After we change Fuel CI to use fixed, most recent to have passed BVT,
> revisions of puppet-openstack modules, first thing that will happen is
> that BVT on Fuel ISO will start failing again, while fuel-library CI
> will continue to work. Without the pressure of failing commit
> verification CI, fuel-library developers will have even less incentive
> to keep fuel-library up to date with puppet-openstack (not to mention
> pro-actively reviewing puppet-openstack commits to catch potential
> regressions before they happen), and very soon Fuel QA team will get fed
> up with not having a stable ISO for the swarm test, and will demand that
> we go back to using fixed puppet-openstack revisions for the ISO, too.
>
> Both here and on the internal thread, many technical and organizational
> concerns were raised, and I'll get to them in a bit, but a concern
> without the will to resolve it is only an excuse, we won't get far if we
> don't want to make it work.
>
> So why don't fuel-library developers want to spend time on
> puppet-openstack integration?
>
> I see two dimensions to this problem. On one axis, there's the
> cost/benefit balance: how much work does it take, and what do we gain
> from doing it? On the other is the question of who benefits and who
> carries the costs?
>
> Without tracking HEAD of puppet-openstack in fuel-library, the primary
> cost is carried by puppet-openstack developers who maintain the upstream
> modules in the first place, and a small fraction of fuel-library
> contributors (5+ out of 50+ [3][4]) who periodically have to spend
> significant amount of effort to bring fuel-library up to date with the
> current state of puppet-openstack. Even though the conversion to
> librarian has made the upstream sync simpler and safer, preparing the
> update to Mitaka still took a full month of work for 5-7 people.
>
> [3]
> http://stackalytics.com/?module=puppet%20openstack-group&company=mirantis&metric=commits
> [4]
> http://stackalytics.com/?module=fuel-library&company=mirantis&metric=commits
>
> Secondary costs are carried by Fuel Infra and QA teams who have to
> support CI based on two OpenStack releases in parallel during that
> month, fuel-library and puppet-openstack developers who have to deal
> with a spike in code churn, all Fuel contributors who are blocked by
> merge freeze during transition, and once again Fuel QA team who
> occasionally get blocked by bugs that were fixed in upstream and not yet
> pulled into fuel-library.
>
> In short, under that model, most fuel-library developers don't have to
> do much to gain the benefit of being up to date with upstream, such us
> getting support of the next OpenStack release. The integration cost,
> around 7-10 man-months per release, is carried mostly by other people.
>
> Transition to full integration with upstream via tracking HEAD of
> puppet-openstack in fuel-library dramatically alters this balance.
> Massive upstream sync is gone, and so are the associated costs of
> parallel CI, transition merge freeze, and missing upstream bugfixes. The
> code churn is still there, but more evenly spread over time.
>
> Instead, the primary cost becomes the CI duty that requires a
> fuel-library developer to watch upstream commits for Fuel CI failures
> and prevent those from impacting fuel-library. According to the same
> internal thread, that's "over 50% of one developer's time every day", so
> 3-5 man-months per release, or roughly half of the cost of the periodic
> sync.
>
> The secondary cost is the risk of upstream commits causing regressions
> that block the whole fuel-library team for several hours at a time. Is
> this risk a good excuse to revert the change that reduces the cost of
> supporting a new OpenStack release by half and reduces Fuel's lag behind
> puppet-openstack by a month? Only if we can't mitigate it.
>
> The problem is, most fuel-library developers don't stand to gain
> anything from this change: they now have to participate in something
> that was previously taken care of, however inefficiently, by other
> people. And that is why, instead of constructive proposals about
> mitigating the risk of regressions, we see demands to go back to the
> time when they didn't need to bother.
>
> As promised, moving on to specific concerns and questions.
>
> On Tue, Mar 01, 2016 at 02:21:48PM +0300, Vladimir Kuklin wrote:
> > Dmitry, could you please point me at the person who will be strictly
> > responsible for creating this 'ketchup' commit? Do you know that this
> > may take up the whole day (couple of hours to do RCA, couple of hours
> > on writing and debugging and couple of hours for FUEL CI tests run)
> > and block the entire Fuel project from having ANY code merged?
>
> It's not reasonable to expect a single person, or even a small team, to
> do this every day all year around. That's why we've been discussing CI
> duty. Even if it takes all day every day, between 50+ fuel-library
> developers that's just one week per person per year, not that much of a
> burden.
>
> And it doesn't have to block anyone from merging code to Fuel
> repositories, there are many ways to mitigate that, like the ones that
> Sergey and Aleksandra have proposed in this thread. We just need to
> start discussing these ways instead of arguing about why we shouldn't
> bother.
>
> > I have always thought that buliding software is about verification
> > being more important than 'trust'. There should not be any
> > humanitarian stuff invloved - we are not in a relationship with
> > Puppet-OpenStack folks,
>
> I have explained above why motivation is the blocking issue here, and
> not the technical concerns. Of course we are in a relationship with
> Puppet OpenStack: both projects are part of OpenStack Big Tent, we have
> the same six-month release cycle, and on the code level their modules
> are so tightly coupled into fuel-library that we can't treat them as a
> third-party library. The fact that we've started to pull them from
> separate git repositories shouldn't have stopped us from treating them
> as a part of our codebase. Like it or not, our relationship with them is
> more "in the same boat" than it is a "zero-sum game".
>
> > although I really admire their work very much.
>
>   lip service
>       n 1: an expression of agreement that is not supported by real
>            conviction [syn: {hypocrisy}, {lip service}]
>
> > We should not follow sliding git references without being 100% sure
> > that we have mutual gating of the code.
>
> Setting up mutual gating is impossible without the mutual trust that you
> have so easily dismissed. Sliding git references and the CI duty to
> support them are all parts of establishing that mutual trust, it won't
> just appear out of thin air and empty promises.
>
> Even at the level of trust we already have, I'm sure puppet-openstack
> core reviewers can agree to hold off merging a commit if a fuel-library
> developer votes -1 with a comment like "Fuel CI failed for this one,
> please give me a couple of hours to figure out why". A poor man's
> substitute of mutual gating, but serviceable nonetheless.
>
> > Moreover, having such git ref as a source in our Puppetfile will lead
> > to the situation when we have UNREPRODUCIBLE build of Fuel project.
>
> Easily mitigated with tooling, same as the undeservedly maligned removal
> of version.yaml.
>
> On Fri, Mar 04, 2016 at 04:51:34PM +0300, Dmitry Pyzhov wrote:
> > 1) It takes more than 50% time of a senior engineer;
>
> As explained above, even at 100% time it's less than the time we've been
> spending on periodic upstream syncs.
>
> > 2) There is a lot of noise in tests results because of broken CI
> > and/or broken Fuel master;
>
> Can be fixed by Aleksandra's proposal.
>
> > 3) There is a log of noise in tests results because of big number of
> > WIP commits that nobody is going to merge;
>
> Once we make Fuel CI votes visible (I see no reason to delay that any
> longer), it's going to be trivial to filter out commits with WIP flag or
> with a -1 from a voting gate job (why investigate Fuel CI failure if the
> commit can't pass a beaker test).
>
> > 4) There is no quick way to understand if the test failure caused by
> > commit or by other reasons;
>
> Is this a duplicate of #2 or a general observation about how difficult
> it is to investigate Fuel CI failures? If the latter, this problem is
> not limited to puppet-openstack and is causing us pain in all our repos,
> we should either fix it soon or give up on Fuel CI altogether.
>
> > 5) There is no quick way to understand if the issue should be fixed in
> > the commit or in Fuel;
>
> Yes there is: simply pick the side where it's easier to fix.
>
> > 6) Most important. Our monitoring doesn't protect us. Our master will
> > be broken by upstream manifests again sooner or later. And nobody
> > knows how much time it will take to fix it.
>
> Our master gets broken by our own mistakes at least as often as by
> upstream manifests, anything we can do to protect ourselves from that is
> applicable to puppet-openstack just the same.
>
> --
> Dmitry Borodaenko
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-- 

--

Andrew Woodward

Mirantis

Fuel Community Ambassador

Ceph Community
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160309/be05d680/attachment.html>


More information about the OpenStack-dev mailing list