Re: [nova] Instance migration/resize changes availability zone

20 Mar 2024

      On Wed, 2024-03-20 at 13:06 +0100, Tobias Urdin wrote:
...
Hello,
This sounds familiar.
If no availability zone was selected when the instance was spawned the “request spec” (saved in the database) does not
contain a availability zone
set and the scheduler will allow that instance to be scheduled to another availability zone because the original
request did not include a specific availability zone.
correct live and cold migration is fully supproted between avaiablity zones provided the operator when installing nova
has exchanged ssh keys across all nodes and has not placed a firewall or similar between them 

as you said if an instnace did not request an az when created, and it was not added by the schduler or a volume (with
cross_az_attach=false.) then the requqst_spec will not have an az. schduling by design does not consider the az that the
instance is currently on only the one in the request spec.

cross az migration is a core feature of nova not a bug and is expected to work by default in any deployment unless the
operator has taken messurs to prevent it. AZ in openstack are not fault domains and are not comparable to AWS avaiablity
zones. an AWS avaiablity zone is closer to a keyston region then it is to an nova AZ.
...
If you search for “request spec” on the mailing list you’ll see that there has been multiple threads about that with a
lot of
details that will help you out.
in this cycle we added the ablity to view the pinned az form the request spec to make understanding this easier.
going forward if you use the latest microversion 2.96 instance list and instance show will contain an addtional filed
detailing the requsted az if one is set in the request spec.
https://docs.openstack.org/nova/latest/reference/api-microversion-history.ht...
...
When we “migrated” to using availability zones we specifically populated this data in the database (note that it’s an
unsupported change so be careful).
yes it is but if done correctly it shoudl not directly break anything. it may be unexpected form a user point of view
and then can use shelve to do a cross az unshleve now so they still have a way to force it to change if they need too
but the main danager in doign that is this is stored in a json blob in the db its easy to mess up the formating of that
blob and result in nova being unable to read it. if you do do this (and im not encuraging poeple to do this this) then
if your mysql or postgress is new enough they now have funciton for working with json and that can be safer to use to
update the blob in the db then was previously possible.

just be sure to take a db backup before making changes like this if you do try.
...
Best regards
Tobias
...
On 20 Mar 2024, at 12:51, Marc Vorwerk <marc+openstack@marc-vorwerk.de> wrote:
Dear OpenStack Community,
I am reaching out for support with an issue that is specifically affecting a single instance during migration or
resize operations in our OpenStack environment. I want to emphasize that this problem is isolated and does not
reflect a broader issue within our cloud setup.
The issue arises when attempting a resize of the instance's flavor, which only differs in RAM+CPU specification.
Unexpectedly, the instance attempts to switch its availability zone from az1 to az2, which is not the intended
behavior.
...
...
The instance entered an error state during the resize or migration process, with a fault message indicating
'ImageNotFound', because after the availability zone change the volume cant be reached. We use seperate ceph
clusters per az.
if this was a cinder volume then that indicates that indicates that you have not correctly configured your cluster
by default nova expect that all cinder backbend are accessible by all hosts, incindentally nova also expect the same
to be true fo all neutron networks by default.
where that is not the case for cinder volume you need to set [cinder]cross_az_attach=false
https://docs.openstack.org/nova/latest/configuration/config.html#cinder.cros...
As noted above this is a feature not a bug. the ability for an unpined instance to change az at scheduling time is
an intended behavior that is often not expect by people coming form aws but is expect to work by long time openstack
users and operators.
that default to true as there is no expecation in general that nova aviabality zones align in any way to cinder
aviabality zones.

if you chosoe to make them align then you can use that option to enforce affinity but that is not expected to be the
case in general.

there are no az affintiy config option for neutron as a gain neutron netwroks are expecteed to span all hosts.
if you use the l3 routed network feature in neutron you can create an affintiy betwen l3 segments and host via physnets
howver that has no relationship to azs.

AZ in nova cinder and neutron are not modeling the same thing and while they can aline are not requred to as i said
above.

for images_type=rbd thre is no native schduling supprot to prevent you moving betwen backends.
we have discussed ways to do that in the past but never implemnted it.
if you want to prevent host changign cluster with the scheduelr today when using images_type=rbd you have 2 options

1.) you can model each ceph cluster as a seperate nova cell, by default we do not allow instance ot change cell so if
you align you ceph clusters to cell boundaryies then instance will never be schduled to a host connected to a diffent
ceph cluster.

2.) the other option is to manually configre the sechueler to enforce this
there are several ways to do this via a schduler filter and host aggrate metadta to map a flavor/image/tenant to a host
aggrate, alternitivaly you can use the required traits functionality of placement via the isolating aggreats feature.
https://docs.openstack.org/nova/latest/reference/isolate-aggregates.html
effectivly you can use the provider.yaml that advertises CUSTOM_CEPH_CLUSER_1 or CUSTOM_CEPH_CLUSER_2 on the relevent
hots vai provider.yaml https://docs.openstack.org/nova/latest/admin/managing-resource-providers.htm...
then you would create a host aggate per set of host to encorece the required custom trait
and modify your flavor/images to request the relevent trait.

unfortunately both approaches are hard to impalement for existing deployments, this is really something best planned for
and executed when you are commissioning a cloud for the first time.

what i have wanted to do for a long time but not had the time to propose or implement is have nova model the ceph
cluster and storage backend in use in placement so we can automatically schdule on it.
i more or less know what woudl be required to do that but while this is an ocational pain point for oeprators
its a long understood limitation and not wone that has been prioritised to address.

if this is something that people are interested in seeing adressed for images_type=rbd specificaly then feedback
form operators that they care about this would be appricated but i cannot commit to adressing this in the short to
medium term.
for now my recomendtaion is if you are deployin a new cloud and have image_type=rbd and you plan to have multiple ceph
clusters that are not asscabel by all hosts then you should create one nova cell per ceph cluster.
cells are relitivly cheap to create in nova, you can share the same database server/rabbitmq instance between cells if
you are not using cells for scaling and you can change that after the fact if you later find you need to scale.
you can also colocate mutlile conductors on the scame host for diffent cells provided your instalation tool can
accomidate that. we do that in devstack and its perfectly fine to do in production. cells are primarly a
scaling/sharding mechanium in nova but and be helpful for this usecase too. if you do have one cell per ceph cluster
you can also create one AZ per cell if you want to allow end users to choose the cluster but that is optionaly

AZ can span cells and cells can contain multiple diffent az, both concepts are entirly unrelated in nova.
cells are an architecutal choose for scaling nova to 1000s of compute nodes
az are just a lable on a hosts aggate with no other meaning.
neither are falut domains but both are often incorrectly assumed to be.

cells should not be required for this usecase to automatically work out of the box but no one in the community has ever
had time to work on the correct long term solution to model storage backend in placement.
that is sad as that was one of the orgianl primary usecases that placement was created to solve.
...
...
To debug this issue we enabled debug logs for the nova scheduler and found that the scheduler does not filter out
any node with any of our enabled filter plugins. As a quick test we disabled a couple of compute nodes and could
also verify that the ComputeFilter of the nova-scheduler still just returns the full list of all nodes in the
cluster. So it seems to us that for some reason the nova-scheduler just ignores all enabled filter plugins for
mentioned instance.
It's worth noting that other instances with the same flavor on the same compute node do not exhibit these issues,
highlighting the unique nature of this problem.
Furthermore we checked all relevant database tables to see if for some reason something strange is saved for this
instance but it seems to us that the instance has exactly the same attributes as other instances on this node.
We are seeking insights or suggestions from anyone who might have experienced similar issues or has knowledge of
potential causes and solutions. What specific logs or configuration details would be helpful for us to provide to
facilitate further diagnosis?
We greatly appreciate any guidance or assistance the community can offer.
Best regards,
Marc Vorwerk + Maximilian Stinsky

Re: [nova] Instance migration/resize changes availability zone

smooney＠redhat.com