<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Feb 21, 2019 at 6:47 PM Graham Hayes <<a href="mailto:gr@ham.ie">gr@ham.ie</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>

<br>

On 21/02/2019 17:28, Sylvain Bauza wrote:<br>

> <br>

> <br>

> On Thu, Feb 21, 2019 at 6:14 PM Graham Hayes <<a href="mailto:gr@ham.ie" target="_blank">gr@ham.ie</a><br>

> <mailto:<a href="mailto:gr@ham.ie" target="_blank">gr@ham.ie</a>>> wrote:<br>

> <br>

<br>

<snip><br>

<br>

> <br>

>     > * If you had a magic wand and could inspire and make a single<br>

>     >   sweeping architectural or software change across the services,<br>

>     >   what would it be? For now, ignore legacy or upgrade concerns.<br>

>     >   What role should the TC have in inspiring and driving such<br>

>     >   changes?<br>

> <br>

>     1: Single agent on each compute node that allows for plugins to do<br>

>        all the work required. (Nova / Neutron / Vitrage / watcher / etc)<br>

> <br>

>     2: Remove RMQ where it makes sense - e.g. for nova-api -> nova-compute<br>

>        using something like HTTP(S) would make a lot of sense.<br>

> <br>

>     3: Unified Error codes, with a central registry, but at the very least<br>

>        each time we raise an error, and it gets returned a user can see<br>

>        where in the code base it failed. e.g. a header that has<br>

>        OS-ERROR-COMPUTE-3142, which means that someone can google for<br>

>        something more informative than the VM failed scheduling<br>

> <br>

>     4: OpenTracing support in all projects.<br>

> <br>

>     5: Possibly something with pub / sub where each project can listen for<br>

>        events and not create something like designate did using<br>

>        notifications.<br>

> <br>

> <br>

> That's the exact reason why I tried to avoid to answer about<br>

> architectural changes I'd like to see it done. Because when I read the<br>

> above lines, I'm far off any consensus on those.<br>

> To answer 1. and 2. from my Nova developer's hat, I'd just say that we<br>

> invented Cells v2 and Placement.<br>

<br>

Sure - this was if *I* had a magic wand - I have a completely different<br>

viewpoint to others. No community really ever has a full agreement.<br>

<br></blockquote><div><br></div><div>Fair point, we work with consensus, not full agreements. It's always good to keep that distinction in mind.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

>From a TC perspective we have to look at these things from an<br>

overall view. My suggestions above were for *all* projects, specifically<br>

for #2 - I used a well known pattern as an example, but it can apply to<br>

Trove talking to DB instances, Octavia to LBaaS nodes (they already do<br>

this, and it is a good pattern), Zun, possibly Magnum (this is not an<br>

exaustive list, and may not suit all listed projects, I am taking them<br>

from the top of my head).<br>

<br></blockquote><div><br></div><div>I'd be interested in discussing the use cases requiring such important architectural splits. <br></div><div>The main reason why Cells v2 was implemented was to address the MQ/DB scalability issue of 1000+ compute nodes.  The Edge thingy came after this, so it wasn't the main driver for change.</div><div>If the projects you mention have the same footprints at scale, then yeah I'm supportive of any redesign discussion that would come up.</div><div><br></div><div>That said, before stepping in into major redesigns, I'd wonder : could the inter-services communication be improved in terms of reducing payload ?</div><div></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

>From what I understand there was even talk of doing it for Nova so that<br>

a central control plane could manage remote edge compute nodes without<br>

having to keep a RMQ connection alive across the WAN, but I am not sure<br>

where that got to.<br>

<br></blockquote><div><br></div><div>That's a separate usecase (Edge) which wasn't the initial reason why we started implementing Cells V2. I haven't heard any request from the Edge WG during the PTGs about changing our messaging interface because $WAN but I'm open to ideas.</div><div><br></div><div>-Sylvain</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

> To be clear, the redesign wasn't coming from any other sources but our<br>

> users, complaining about scale. IMHO If we really want to see some<br>

> comittee driving us about feature requests, this should be the UC and<br>

> not the TC.<br>

<br>

It should be a combination - UC and TC should be communicating about<br>

these requests - UC for the feedback, and the TC to see hwo they fit<br>

with the TCs vision for the direction of OpenStack.<br>

<br>

> Whatever it is, at the end of the day, we're all paid by our sponsors.<br>

> Meaning that any architectural redesign always hits the reality wall<br>

> where you need to convince your respective Product Managers of the great<br>

> benefit of the redesign. I'm maybe too pragmatic, but I remember so many<br>

> discussions we had about redesigns that I now feel we just need hands,<br>

> not ideas.<br>

<br>

I fully agree, and it has been an issue in the community for as long as<br>

I can remember. It doesn't mean that we should stop pushing the project<br>

forward. We have already moved the needle with the cycle goals, so we<br>

can influence what features are added to projects. Lets continue to do<br>

so.<br>

<br>

<br>

<snip><br>

<br>

</blockquote></div></div>