Hi,
just to clarify,

CERN runs the superconductor.
Yes, affinity check is an issue. We plan work on it in the next cycle.
The metadata API runs per cell. The main reason is that we still run nova-network in few cells.

cheers,
Belmiro

On Mon, Sep 30, 2019 at 8:56 PM Matt Riedemann <mriedemos@gmail.com> wrote:
On 9/30/2019 12:27 PM, Dan Smith wrote:
>> 4. Does the cell conductor need access to the API DB?
> Technically it should not be allowed to talk to the API DB for
> "separation of concerns" reasons. However, there are a couple of
> features that still rely on the cell conductor being able to upcall to
> the API database, such as the late affinity check.

In case you haven't seen this yet, we have a list of operations
requiring "up-calls" from compute/cell-conductor to the API DB in the
docs here:

https://docs.openstack.org/nova/latest/user/cellsv2-layout.html#operations-requiring-upcalls

Some have been fixed for awhile and some are still open because they are
not default configuration we normally deal with (cross_az_attach=False)
or hit in CI* runs (reschedules).

I think the biggest/hardest problem there to solve is the late affinity
check which long-term should be solved with placement but no one is
working on that. The reschedule stuff related to getting AZ/aggregate
info is simpler but involves some RPC changes so it's not trivial and
again no one is working on fixing that.

I think for those reasons CERN is running without a superconductor mode
and can hit the API DB from the cells. Devstack superconductor mode is
the ideal though for the separation of concerns Dan pointed out.

*Note we do hit the reschedule issue sometimes in multi-cell jobs:

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22CantStartEngineError%3A%20No%20sql_connection%20parameter%20is%20established%5C%22%20AND%20tags%3A%5C%22screen-n-cond-cell1.txt%5C%22&from=7d

--

Thanks,

Matt