<div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jul 17, 2017 at 10:13 AM, Emilien Macchi <span dir="ltr"><<a href="mailto:emilien@redhat.com" target="_blank">emilien@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="gmail-HOEnZb"><div class="gmail-h5">On Mon, Jul 17, 2017 at 5:32 AM, Flavio Percoco <<a href="mailto:flavio@redhat.com">flavio@redhat.com</a>> wrote:<br>
> On 14/07/17 08:08 -0700, Emilien Macchi wrote:<br>
>><br>
>> On Fri, Jul 14, 2017 at 2:17 AM, Flavio Percoco <<a href="mailto:flavio@redhat.com">flavio@redhat.com</a>> wrote:<br>
>>><br>
>>><br>
>>> Greetings,<br>
>>><br>
>>> As some of you know, I've been working on the second phase of TripleO's<br>
>>> containerization effort. This phase if about migrating the docker based<br>
>>> deployment onto Kubernetes.<br>
>>><br>
>>> These phase requires work on several areas: Kubernetes deployment,<br>
>>> OpenStack<br>
>>> deployment on Kubernetes, configuration management, etc. While I've been<br>
>>> diving<br>
>>> into all of these areas, this email is about the second point, OpenStack<br>
>>> deployment on Kubernetes.<br>
>>><br>
>>> There are several tools we could use for this task. kolla-kubernetes,<br>
>>> openstack-helm, ansible roles, among others. I've looked into these tools<br>
>>> and<br>
>>> I've come to the conclusion that TripleO would be better of by having<br>
>>> ansible<br>
>>> roles that would allow for deploying OpenStack services on Kubernetes.<br>
>>><br>
>>> The existing solutions in the OpenStack community require using Helm.<br>
>>> While<br>
>>> I<br>
>>> like Helm and both, kolla-kubernetes and openstack-helm OpenStack<br>
>>> projects,<br>
>>> I<br>
>>> believe using any of them would add an extra layer of complexity to<br>
>>> TripleO,<br>
>>> which is something the team has been fighting for years years -<br>
>>> especially<br>
>>> now<br>
>>> that the snowball is being chopped off.<br>
>>><br>
>>> Adopting any of the existing projects in the OpenStack communty would<br>
>>> require<br>
>>> TripleO to also write the logic to manage those projects. For example, in<br>
>>> the<br>
>>> case of openstack-helm, the TripleO team would have to write either<br>
>>> ansible<br>
>>> roles or heat templates to manage - install, remove, upgrade - the charts<br>
>>> (I'm<br>
>>> happy to discuss this point further but I'm keepping it at a high-level<br>
>>> on<br>
>>> purpose for the sake of not writing a 10k-words-long email).<br>
>>><br>
>>> James Slagle sent an email[0], a couple of days ago, to form TripleO<br>
>>> plans<br>
>>> around ansible. One take-away from this thread is that TripleO is<br>
>>> adopting<br>
>>> ansible more and more, which is great and it fits perfectly with the<br>
>>> conclusion<br>
>>> I reached.<br>
>>><br>
>>> Now, what this work means is that we would have to write an ansible role<br>
>>> for<br>
>>> each service that will deploy the service on a Kubernetes cluster.<br>
>>> Ideally<br>
>>> these<br>
>>> roles will also generate the configuration files (removing the need of<br>
>>> puppet<br>
>>> entirely) and they would manage the lifecycle. The roles would be<br>
>>> isolated<br>
>>> and<br>
>>> this will reduce the need of TripleO Heat templates. Doing this would<br>
>>> give<br>
>>> TripleO full control on the deployment process too.<br>
>>><br>
>>> In addition, we could also write Ansible Playbook Bundles to contain<br>
>>> these<br>
>>> roles<br>
>>> and run them using the existing docker-cmd implementation that is coming<br>
>>> out<br>
>>> in<br>
>>> Pike (you can find a PoC/example of this in this repo[1]).<br>
>>><br>
>>> Now, I do realize the amount of work this implies and that this is my<br>
>>> opinion/conclusion. I'm sending this email out to kick-off the discussion<br>
>>> and<br>
>>> gather thoughts and opinions from the rest of the community.<br>
>>><br>
>>> Finally, what I really like about writing pure ansible roles is that<br>
>>> ansible<br>
>>> is<br>
>>> a known, powerfull, tool that has been adopted by many operators already.<br>
>>> It'll<br>
>>> provide the flexibility needed and, if structured correctly, it'll allow<br>
>>> for<br>
>>> operators (and other teams) to just use the parts they need/want without<br>
>>> depending on the full-stack. I like the idea of being able to separate<br>
>>> concerns<br>
>>> in the deployment workflow and the idea of making it simple for users of<br>
>>> TripleO<br>
>>> to do the same at runtime. Unfortunately, going down this road means that<br>
>>> my<br>
>>> hope of creating a field where we could collaborate even more with other<br>
>>> deployment tools will be a bit limited but I'm confident the result would<br>
>>> also<br>
>>> be useful for others and that we all will benefit from it... My hopes<br>
>>> might<br>
>>> be a<br>
>>> bit naive *shrugs*<br>
>><br>
>><br>
>> Of course I'm biased since I've been (a little) involved in that work<br>
>> but I like the idea of :<br>
>><br>
>> - Moving forward with our containerization. docker-cmd will help us<br>
>> for sure for this transition (I insist on the fact TripleO is a<br>
>> product that you can upgrade and we try to make it smooth for our<br>
>> operators), so we can't just trash everything and switch to a new<br>
>> tool. I think the approach that we're taking is great and made of baby<br>
>> steps where we try to solve different problems.<br>
>> - Using more Ansible - the right way - when it makes sense : with the<br>
>> TripleO containerization, we only use Puppet for Configuration<br>
>> Management, managing a few resources but not for orchestration (or not<br>
>> all the features that Puppet provide) and for Data Binding (Hiera). To<br>
>> me, it doesn't make sense for us to keep investing much in Puppet<br>
>> modules if we go k8s & Ansible. That said, see the next point.<br>
>> - Having a transition path between TripleO with Puppet and TripleO<br>
>> with apbs and have some sort of binding between previous hieradata<br>
>> generated by TripleO & a similar data binding within Ansible playbooks<br>
>> would help. I saw your PoC Flavio, I found it great and I think we<br>
>> should make<br>
>> <a href="https://github.com/tripleo-apb/ansible-role-k8s-keystone/blob/331f405bd3f7ad346d99e964538b5b27447a0ebf/provision-keystone-apb/tasks/hiera.yaml" rel="noreferrer" target="_blank">https://github.com/tripleo-<wbr>apb/ansible-role-k8s-keystone/<wbr>blob/<wbr>331f405bd3f7ad346d99e964538b5b<wbr>27447a0ebf/provision-keystone-<wbr>apb/tasks/hiera.yaml</a><br>
>> optional when running apbs, and allow to provide another format (more<br>
>> Ansiblish) to let folks not using TripleO to use it. We also should<br>
>> target this new format and switch service by service in TripleO to use<br>
>> this new format, as long as apbs support both. I think that way we can<br>
>> step by step migrate to use Ansible for configuration management.<br>
>><br>
>> There are some things to figure out:<br>
>> - We kind of found out solutions for OpenStack services - great - now<br>
>> what do we do for services like MySQL, Apache, etc. Do we have<br>
>> "standard" and "community-supported" apbs? Do we need to create some?<br>
><br>
><br>
> The question should be whether we have community maintained roles that<br>
> deploy<br>
> third-party services on k8s. I'm sorry for nitpicking but I just want to<br>
> make<br>
> sure we all keep in mind that the APB wrap is optional (although convenient<br>
> for<br>
> us).<br>
><br>
> The answer is not that I'm aware of. There are roles to deploy some of these<br>
> services on baremetal.<br>
<br>
</div></div>*If* we're going to use Helm, I found the answer of my own question:<br>
<a href="https://github.com/kubernetes/charts/tree/master/stable/mariadb" rel="noreferrer" target="_blank">https://github.com/kubernetes/<wbr>charts/tree/master/stable/<wbr>mariadb</a><br>
(though it's missing Galera support AFIK).<br>
<br>
And some other services that we currently use in TripleO:<br>
<a href="https://github.com/kubernetes/charts/tree/master/stable" rel="noreferrer" target="_blank">https://github.com/kubernetes/<wbr>charts/tree/master/stable</a><br>
<br>
I tried to find if whether or not kolla-kubernetes was using these<br>
Helms, and unless I fail to find them I think they don't. I would be<br>
curious to know the reason if that's the case. (Please correct me with<br>
links).<br>
<div class="gmail-HOEnZb"><div class="gmail-h5"><br></div></div></blockquote><div><br></div><div>Emilien,</div><div><br></div><div>I think others have offered explanations as to why Kolla is not using the Hem upstream charts. I'll offer my unique perspective.</div><div><br></div><div>In the years of development of Kolla, we have tried to use images built for one distro on another (in this case, I suspect that mariadb image is based upon debian, however I didn't verify).</div><div><br></div><div>We have seen various real world failures when using a cross-distro approach to images vs what base operating system they are deployed on. Even *some* Docker engineers will acknowledge that such an action is dangerous.</div><div><br></div><div>It works - maybe 99% of the time - maybe more. Then some _very_ subtle kernel/libc incompatibility with RTNETLINK/IOCTLS/syscall interface/etc breaks one service or another in a fundamental way that is essentially impossible to debug or diagnose without a kernel debugger/printk and a reliable reproducer. There are thousands of interfaces into the kernel via just those 3 interfaces I mentioned above. </div><div>As much as the container ecosystem would like you to believe those interfaces are standardized in Linux, we have had real-world experiences where things broke in the past and will not function on another base operating system.</div><div><br></div><div>Its a dreadful shame this is the case, but it demands Kolla must fundamentally take the step of containerizing and building images for each unique distribution for which we offer support. This is why we have centos/ubuntu/oracle linux images and not just "here is the nova image". This work is and was never meant to keep people busy nor churn gate cycles in OpenStack CI it was meant to solve the fact that subtle incompatibilities exist between various distributions variance in kernels and system libraries.</div><div><br></div><div>Regards</div><div>-steve</div><div><br></div><div> </div></div></div></div>