<div dir="ltr"><div>Hi Sean,</div><div><br></div><div>On 21 July 2014 22:53, Collins, Sean <span dir="ltr"><<a href="mailto:Sean_Collins2@cable.comcast.com" target="_blank">Sean_Collins2@cable.comcast.com</a>></span> wrote:<br>
</div>
<div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word;color:rgb(0,0,0);font-size:14px;font-family:Calibri,sans-serif">
<span>
<div style="font-family:Calibri;font-size:11pt;text-align:left;color:black;BORDER-BOTTOM:medium none;BORDER-LEFT:medium none;PADDING-BOTTOM:0in;PADDING-LEFT:0in;PADDING-RIGHT:0in;BORDER-TOP:#b5c4df 1pt solid;BORDER-RIGHT:medium none;PADDING-TOP:3pt">
<span style="font-family:Calibri,sans-serif;font-size:14px">The fact that I tried to reach out to the person who was listed as the contact back in November to try and resolve the –1 that this CI system gave, and never received a response until the public mailing list thread about revoking voting rights for Tail-F,
makes me believe that the Tail-F CI system is still not ready to have that kind of privilege. Especially if the account was idle from around February, until June – that is a huge gap, if I understand correctly?</span></div>
</span></div></blockquote><div><br></div><div>I understand your frustration. It seems like the experience of bringing up our CI has been miserable for all concerned. I am sad about that. It does not seem that it should have worked out this way, since everybody concerned is a competent person and acting in good faith.</div>
<div><br></div><div>I hope we can finally clear this up and then continue with contributing to OpenStack on good terms with everybody.</div><div><br></div><div>Back in November we were feeling eager to be good citizens and we wanted to be amongst the first to setup a 3rd party CI for Neutron. We were trying to be proactive: our driver was already in Havana and the deadlines for us to setup the CI were far in the future. My colleague Tobbe was also planning to take the lead on development of our OpenStack code from me and we thought the perfect first step would be to setup our CI system, since that would get him familiar with the code and since neither of us had prior experience operating an OpenStack CI.</div>
<div><br></div><div>We read through the 3rd Party CI setup instructions and created a CI. Our initial setup ran Jenkins and would use a custom script to create a one-shot VM and inside that it would run the Neutron unit tests together with a patch that made our driver talk to our real external system. This got quite good test coverage because the unit tests really exercise the ML2 interface quite well. (Likely we should have used Tempest instead, as everybody does nowadays include us, but we didn't know that back then.)</div>
<div><br></div><div>This seemed to work well and so we let it run. Honestly, we did not really know what would happen with our results after they were posted, and we did not have a definite goal for what service level we should uphold. That was surely naive, but I think understandable. We were relatively new and minor contributors to OpenStack and we were amongst the first wave of Neutron people to setup a CI. We hadn't yet had the opportunity to learn from the mistakes of others or see how reviews are used by the upstream people and systems. We were also perhaps a little too relaxed because our total contribution was around 150 lines of code that only run when explicitly enabled, and we had our own test procedure in place separately from OpenStack CI that we had been using since Havana, so it did not feel like we had much potential to impact other OpenStack users and developers with our code.</div>
<div><br></div><div>Anyway. The test runs started to fail unexpectedly, for a boring kind of reason like that OpenStack needed a newer version of a library and our CI script lacked a "pip upgrade" command that would pick it up, so all tests would fail until manual intervention.</div>
<div><br></div><div>So what happens when the CI falls down and needs help to come back up? First of all, it creates a big problem for upstream developers and slows down work on OpenStack (ouch). Second, you poor guys who are having problems try to contact the person responsible, but all you have is one work email address and IRC nick. In that case, you guys did not get a response. I think that was for the very pedestrian reason that my colleague who was responsible was on vacation and didn't appreciate that an operational issue with our CI would create an urgent problem for other people and must be attended to at all times.</div>
<div><br></div><div>This must have been bad for you guys since you were stuck waiting on us and couldn't fix the problem on your side. I was also contacted by email, as the previous contact person for that driver, but the message simply asked me to confirm my colleague's email address and did not tell me that there was a problem that we had to resolve. So eventually the problem boiled over and when we started getting publicly flamed on the mailing list then I finally saw that there was an issue and called up my colleague directly who *then* jumped into account to sort it out (logging into gerrit and reversing old negative votes, and so on).</div>
<div><br></div><div>So what do we take away from this first experience? To me it just looks like processes to fix: people operating 3rd party CIs need to better understand the required service level, there should be multiple contact points to deal with mundane stuff like vacations and illness, and that people should operate their CI successfully for a while before voting is enabled. It sucks that work was interrupted and people got mad, but at the end of the day this happened with everybody acting in good faith, and it shows us what kind of problems to prevent in the future.</div>
<div><br></div><div>This is where it became a bit sad on our side. The reaction we got from the community was that the problem is not with the process but with the people. That is, that we are lazy, incompetent, don't respect the community, don't understand open source, and so on. My colleague got a really gut-wrenching epic reprimand to this effect on IRC, and understandably decided to stop contributing to OpenStack as a result. So then responsibility for the CI is transferred back to me.</div>
<div><br></div><div>I decided to change priority: instead of getting CI running *early* I want to get it setup *reliably* and still within the required timeframes. So I wait a while to see how other people setup their CIs with the hope of learning from their experiences and not making new mistakes.</div>
<div><br></div><div>End of Part One.</div><div><br></div><div>If you want to hear all of the details of my adventures with devstack-gate, of how I have been operating and supervising our CI for Juno these past 6 weeks, and of the development work that I am doing in the hope of making CI more robust both for myself and other members of the community, then I will be happy to explain about that after I have a chance to catch my breath.</div>
<div><br></div><div>TLDR; Driver developers are people too!</div><div><br></div><div><br></div></div></div></div>