1. Is there any benefit to not having a superconductor? Presumably it's a little more efficient in the single cell case? Also IIUC it only requires a single message queue so is a little simpler?
In a multi-cell case you need it, but you're asking about the case where there's only one (real) cell yeah? If the deployment is really small, then the overhead of having one is probably measurable and undesirable. I dunno what to tell you about where that cut-off is, unfortunately. However, once you're over a certain number of nodes, that probably shakes out a bit. The superconductor does things that the cell-specific ones won't have to do, so there's about the same amount of total load, just a potentially larger memory footprint for running extra services, which would be measurable at small scales. For a tiny deployment there's also overhead just in the complexity, but one of the goals of v2 has always been to get everyone on the same architecture, so having a "small mode" and a "large mode" brings with it its own complexity.
2. Do console proxies need to live in the cells? This is what devstack does in superconductor mode. I did some digging through nova code, and it looks that way. Testing with novncproxy agrees. This suggests we need to expose a unique proxy endpoint for each cell, and configure all computes to use the right one via e.g. novncproxy_base_url, correct?
I'll punt this to Melanie, as she's the console expert at this point, but I imagine you're right.
3. Should I upgrade the superconductor or conductor service first?
Superconductor first, although they all kinda have to go around the same time. Superconductor, like the regular conductors, needs to look at the cell database directly, so if you were to upgrade superconductor before the cell database you'd likely have issues. I think probably the ideal would be to upgrade the db schema everywhere (which you can do without rolling code), then upgrade the top-level services (conductor, scheduler, api) and then you could probably get away with doing conductor in the cell along with computes, or whatever. If possible rolling the cell conductors with the top-level services would be ideal.
4. Does the cell conductor need access to the API DB?
Technically it should not be allowed to talk to the API DB for "separation of concerns" reasons. However, there are a couple of features that still rely on the cell conductor being able to upcall to the API database, such as the late affinity check. If you can only choose one, then I'd say configure the cell conductors to talk to the API DB, but if there's a knob for "isolate them" it'd be better.
5. What DB configuration should be used in nova.conf when running online data migrations? I can see some migrations that seem to need the API DB, and others that need a cell DB. If I just give it the API DB, will it use the cell mappings to get to each cell DB, or do I need to run it once for each cell?
The API DB has its own set of migrations, so you obviously need API DB connection info to make that happen. There is no fanout to all the rest of the cells (currently), so you need to run it with a conf file pointing to the cell, for each cell you have. The latest attempt at making this fan out was abanoned in July with no explanation, so it dropped off my radar at least.
6. After an upgrade, when can we restart services to unpin the compute RPC version? Looking at the compute RPC API, it looks like the super conductor will remain pinned until all computes have been upgraded. For a cell conductor, it looks like I could restart it to unpin after upgrading all computes in that cell, correct?
Yeah.
7. Which services require policy.{yml,json}? I can see policy referenced in API, conductor and compute.
That's a good question. I would have thought it was just API, so maybe someone else can chime in here, although it's not specific to cells. --Dan