Sean; Thank you for all your help, the migrated server was able to be force rebooted. For reference, I had the physical network configuration slightly off. Thank you, Dominic L. Hilsbos, MBA Director – Information Technology Perform Air International Inc. DHilsbos@PerformAir.com www.PerformAir.com -----Original Message----- From: Sean Mooney [mailto:smooney@redhat.com] Sent: Monday, April 19, 2021 10:52 AM To: Dominic Hilsbos; openstack-discuss@lists.openstack.org Subject: Re: [ops][nova][victoria] Migrate cross CPU? On 19/04/2021 18:27, DHilsbos@performair.com wrote:
All;
I think I've worked through the issue with ssh, however I now have another issue. I've attached an extract from the Nova Compute instance on the new server.
If I'm reading this correctly, is it having trouble accessing Ceph? Also, this machine I used here can be thrown away, but is there a way to recover it? Unexpected vif_type=binding_failed is the error and that indicates a issue with neutron. specifically in this case the excation was raised as part of _finish_resize when we update the neutron port host filed to the destination before we generate teh domain xml.
in this case the neutron ml2 driver refused to bind the port to the new host. req-228b5f98-e3a4-4c22-8c90-eacce6efb091 is the nova request id but we might also use the same on ewhne calling neutron so you cloud try and see if there is an message with that request id in the neutron-server log. if that return nothing then you can check with the port uuid 2e7d818a-43e1-48fb-a4d3-9e36034a46bf if you correct the neutron issue perhaps by manually unsetting and setting the binding_host on the port as an admin to retrigger port bininding you can hard reboot the vm to fix it. you should see an issue in the neturon logs however. i dont see anyth8ng related to ceph in those logs so i think your storage is likely fine.
Thank you,
Dominic L. Hilsbos, MBA Director – Information Technology Perform Air International Inc. DHilsbos@PerformAir.com www.PerformAir.com
-----Original Message----- From: Sean Mooney [mailto:smooney@redhat.com] Sent: Monday, April 19, 2021 3:38 AM To: Dominic Hilsbos; openstack-discuss@lists.openstack.org Subject: Re: [ops][nova][victoria] Migrate cross CPU?
Sean;
Thank you, your suggestion led me to a problem with ssh. I was a little surprised by this, as live migration works.
On 19/04/2021 03:51, DHilsbos@performair.com wrote: thats a pretty common issue. live migration does not use ssh or rsync to to copy the vm disk data that is done by qemu. for cold migration the data is copied by nova using one of 2 drivers either ssh/scp or rsync.
I reviewed: https://docs.openstack.org/nova/victoria/admin/ssh-configuration.html#cli-os... and found that I had a problem with the authorized keys file. I took care of that, and it still didn't work.
Here's what came out of the nova compute log: 2021-04-18 19:24:27.201 10808 ERROR oslo_messaging.rpc.server [req-225e7beb-f186-4235-abce-efcf4924d505 d7c514813e5d4fe6815f5f59e8e35f2f a008ad02d16f436a9e320882ca497055 - default default] Exception during message handling: nova.exception.ResizeError: Resize error: not able to execute ssh command: Unexpected error while running command. Command: ssh -o BatchMode=yes 10.0.128.20 mkdir -p /var/lib/nova/instances/64229d87-4cbb-44d1-ba8a-5fe63c9c40f3 Exit code: 255 Stdout: '' Stderr: 'Host key verification failed.\r\n'
When I do su - nova on the origin server, as per the above, then ssh to the receiving server, I get this: Load key "/etc/nova/migration/identity": invalid format
/etc/nova/migration/identity isn't mentioned anywhere in the documentation above.
I tried: cat id_rsa > /etc/nova/migration/identity and cat id_rsa.pub >> /etc/nova/migration/authorized_keys
Using the keys copied in the documentation above; still no go. Same 'Host key verification failed.\r\n' result.
What am I missing? you will need to su to the nova user and make sure the key has the correct permissions set typically 600 and is owned by nova. then you need to do the key exchange and ensure its added to the known hosts. i normally do that by manually sshing as the nova user to the destination hosts.
obviously if its more then a cople of hosts you will want to use ansible or something to automate the process. there are basicaly 3 thing you need to do. 1.) copy a key with out a password to the nova user on all hosts and set permission to 600 2.) add the public key to authorized_keys on all hosts 3.) pre populate the known_hosts on all hosts for all other hosts.(you can use ssh-keyscan for this) if you have more then about 20 hosts do this on one host and copy to all other because quadratic with large number of hosts takes a while...
Thank you,
Dominic L. Hilsbos, MBA Director – Information Technology Perform Air International Inc. DHilsbos@PerformAir.com www.PerformAir.com
-----Original Message----- From: Sean Mooney [mailto:smooney@redhat.com] Sent: Friday, April 16, 2021 9:58 AM To: Dominic Hilsbos; openstack-discuss@lists.openstack.org Subject: Re: [ops][nova][victoria] Migrate cross CPU?
hum ok the best way to debug this is to lis the server events and get the request id for the migration it may be req-ff109e53-74e0-40de-8ec7-29aff600b5f7 based on the logs you posted but you should see more info in the api, conductor and compute logs for that request id.
given the state has not change i suspect it failed rather early.
its possible that you are expirence an issue with the rabbitmq service and rpc calls are bing lost but i woudl not expect to see logs realted to this in the scudler while the vm is stilll in the SHUTOFF status.
can you do "openstack server event list 64229d87-4cbb-44d1-ba8a-5fe63c9c40f3" then get the most recent resize event's request id and see if there are any other logs.
regard sean.
(note i think it will be listed as a resize not a migrate since interanlly migreate is implmented as resize but to the same flavour).
On 16/04/2021 17:04, DHilsbos@performair.com wrote:
Sean;
Thank you very much for your response. I wasn't aware of the state change to resize_verify, that's useful.
Unfortunately, at present, the state change is not occurring.
Here's a series of commands, with output:
#openstack server show 64229d87-4cbb-44d1-ba8a-5fe63c9c40f3 +-------------------------------------+----------------------------------------------------------+ | Field | Value | +-------------------------------------+----------------------------------------------------------+ | OS-DCF:diskConfig | MANUAL | | OS-EXT-AZ:availability_zone | az-elcom-1 | | OS-EXT-SRV-ATTR:host | s700030.463.os.mcgown.enterprises | | OS-EXT-SRV-ATTR:hypervisor_hostname | s700030.463.os.mcgown.enterprises | | OS-EXT-SRV-ATTR:instance_name | instance-00000037 | | OS-EXT-STS:power_state | Shutdown | | OS-EXT-STS:task_state | None | | OS-EXT-STS:vm_state | stopped | | OS-SRV-USG:launched_at | 2021-03-06T04:36:07.000000 | | OS-SRV-USG:terminated_at | None | | accessIPv4 | | | accessIPv6 | | | addresses | it-network=10.255.127.208, 10.0.160.35 | | config_drive | | | created | 2021-03-06T04:35:51Z | | flavor | m4.large (8) | | hostId | 174a83351ac674a25a2bf5131b931fc7a9e16be48b62f37925a66676 | | id | 64229d87-4cbb-44d1-ba8a-5fe63c9c40f3 | | image | N/A (booted from volume) | | key_name | None | | name | Java Dev | | project_id | 10dfdfadb7374ea1ba37bee1435d87ad | | properties | | | security_groups | name='allow-ping' | | | name='allow-ssh' | | | name='default' | | status | SHUTOFF | | updated | 2021-04-16T15:52:07Z | | user_id | 69b73ea8f55c46a99021e77ebf70b62a | | volumes_attached | id='ae69c924-60e5-431e-9572-c41a153e720b' | +-------------------------------------+----------------------------------------------------------+ #openstack server migrate --host s700066.463.os.mcgown.enterprises --os-compute-api-version 2.56 64229d87-4cbb-44d1-ba8a-5fe63c9c40f3 #openstack server show 64229d87-4cbb-44d1-ba8a-5fe63c9c40f3 +-------------------------------------+----------------------------------------------------------+ | Field | Value | +-------------------------------------+----------------------------------------------------------+ | OS-DCF:diskConfig | MANUAL | | OS-EXT-AZ:availability_zone | az-elcom-1 | | OS-EXT-SRV-ATTR:host | s700030.463.os.mcgown.enterprises | | OS-EXT-SRV-ATTR:hypervisor_hostname | s700030.463.os.mcgown.enterprises | | OS-EXT-SRV-ATTR:instance_name | instance-00000037 | | OS-EXT-STS:power_state | Shutdown | | OS-EXT-STS:task_state | None | | OS-EXT-STS:vm_state | stopped | | OS-SRV-USG:launched_at | 2021-03-06T04:36:07.000000 | | OS-SRV-USG:terminated_at | None | | accessIPv4 | | | accessIPv6 | | | addresses | it-network=10.255.127.208, 10.0.160.35 | | config_drive | | | created | 2021-03-06T04:35:51Z | | flavor | m4.large (8) | | hostId | 174a83351ac674a25a2bf5131b931fc7a9e16be48b62f37925a66676 | | id | 64229d87-4cbb-44d1-ba8a-5fe63c9c40f3 | | image | N/A (booted from volume) | | key_name | None | | name | Java Dev | | project_id | 10dfdfadb7374ea1ba37bee1435d87ad | | properties | | | security_groups | name='allow-ping' | | | name='allow-ssh' | | | name='default' | | status | SHUTOFF | | updated | 2021-04-16T15:53:32Z | | user_id | 69b73ea8f55c46a99021e77ebf70b62a | | volumes_attached | id='ae69c924-60e5-431e-9572-c41a153e720b' | +-------------------------------------+----------------------------------------------------------+ #tail /var/log/nova/nova-conductor.log #tail /var/log/nova/nova-scheduler.log 2021-04-16 08:53:24.870 3773 INFO nova.scheduler.host_manager [req-ff109e53-74e0-40de-8ec7-29aff600b5f7 d7c514813e5d4fe6815f5f59e8e35f2f a008ad02d16f436a9e320882ca497055 - default default] Host filter only checking host s700066.463.os.mcgown.enterprises and node s700066.463.os.mcgown.enterprises 2021-04-16 08:53:24.871 3773 INFO nova.scheduler.host_manager [req-ff109e53-74e0-40de-8ec7-29aff600b5f7 d7c514813e5d4fe6815f5f59e8e35f2f a008ad02d16f436a9e320882ca497055 - default default] Host filter ignoring hosts:
Both Cinder volume storage, and ephemeral storage are being handled by Ceph.
Thank you,
Dominic L. Hilsbos, MBA Director – Information Technology Perform Air International Inc. DHilsbos@PerformAir.com www.PerformAir.com
-----Original Message----- From: Sean Mooney [mailto:smooney@redhat.com] Sent: Friday, April 16, 2021 6:28 AM To: openstack-discuss@lists.openstack.org Subject: Re: [ops][nova][victoria] Migrate cross CPU?
All;
I seem to have generated another issue for myself...
I built our Victoria cloud initially on Intel Atom servers. We recently received the first of our AMD Epyc (7002 series) servers, which are intended to take over the Nova Compute responsibilities.
I've had success in the past doing live migrates, but live migrating from one of the Atom servers to the new server fails, with an error indicating CPU compatibility problems. Ok, I can understand that.
My problem is that I don't seem to understand the openstack server migrate command (non-live). It doesn't seem to do anything, whether the instance is Running or Shut Down. I can't find errors in the logs from the API / conductor / scheduler host.
I also can't find an option to pass to the openstack server start command which requests a specific host.
Can I get these existing instances moved from the Atom servers to the Epyc server(s), or do I need to recreate them to do this? you should be able to cold migrate them using the migrate command but
On 15/04/2021 19:05, DHilsbos@performair.com wrote: that should put the servers into resize_verify and then you need to confirm the migration to complte it. we will not clean up the vm on the source node until you do that last step.
Thank you,
Dominic L. Hilsbos, MBA Director - Information Technology Perform Air International Inc. DHilsbos@PerformAir.com www.PerformAir.com