<div dir="ltr">Hi,<div>just to clarify,</div><div><br></div><div>CERN runs the superconductor.</div><div>Yes, affinity check is an issue. We plan work on it in the next cycle.</div><div>The metadata API runs per cell. The main reason is that we still run nova-network in few cells.</div><div><br></div><div>cheers,</div><div>Belmiro</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Sep 30, 2019 at 8:56 PM Matt Riedemann <<a href="mailto:mriedemos@gmail.com">mriedemos@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">On 9/30/2019 12:27 PM, Dan Smith wrote:<br>
>> 4. Does the cell conductor need access to the API DB?<br>
> Technically it should not be allowed to talk to the API DB for<br>
> "separation of concerns" reasons. However, there are a couple of<br>
> features that still rely on the cell conductor being able to upcall to<br>
> the API database, such as the late affinity check.<br>
<br>
In case you haven't seen this yet, we have a list of operations <br>
requiring "up-calls" from compute/cell-conductor to the API DB in the <br>
docs here:<br>
<br>
<a href="https://docs.openstack.org/nova/latest/user/cellsv2-layout.html#operations-requiring-upcalls" rel="noreferrer" target="_blank">https://docs.openstack.org/nova/latest/user/cellsv2-layout.html#operations-requiring-upcalls</a><br>
<br>
Some have been fixed for awhile and some are still open because they are <br>
not default configuration we normally deal with (cross_az_attach=False) <br>
or hit in CI* runs (reschedules).<br>
<br>
I think the biggest/hardest problem there to solve is the late affinity <br>
check which long-term should be solved with placement but no one is <br>
working on that. The reschedule stuff related to getting AZ/aggregate <br>
info is simpler but involves some RPC changes so it's not trivial and <br>
again no one is working on fixing that.<br>
<br>
I think for those reasons CERN is running without a superconductor mode <br>
and can hit the API DB from the cells. Devstack superconductor mode is <br>
the ideal though for the separation of concerns Dan pointed out.<br>
<br>
*Note we do hit the reschedule issue sometimes in multi-cell jobs:<br>
<br>
<a href="http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22CantStartEngineError%3A%20No%20sql_connection%20parameter%20is%20established%5C%22%20AND%20tags%3A%5C%22screen-n-cond-cell1.txt%5C%22&from=7d" rel="noreferrer" target="_blank">http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22CantStartEngineError%3A%20No%20sql_connection%20parameter%20is%20established%5C%22%20AND%20tags%3A%5C%22screen-n-cond-cell1.txt%5C%22&from=7d</a><br>
<br>
-- <br>
<br>
Thanks,<br>
<br>
Matt<br>
<br>
</blockquote></div>