[ops][nova][victoria] Migrate cross CPU?

Sean Mooney smooney at redhat.com
Mon Apr 19 10:38:20 UTC 2021



On 19/04/2021 03:51, DHilsbos at performair.com wrote:
> Sean;
>
> Thank you, your suggestion led me to a problem with ssh.  I was a little surprised by this, as live migration works.
thats a pretty common issue.
live migration does not use ssh or rsync to to copy the vm disk data 
that is done by qemu.
for cold migration the data is copied by nova using one of 2 drivers 
either ssh/scp or rsync.
>
> I reviewed:
> https://docs.openstack.org/nova/victoria/admin/ssh-configuration.html#cli-os-migrate-cfg-ssh
> and found that I had a problem with the authorized keys file.  I took care of that, and it still didn't work.
>
> Here's what came out of the nova compute log:
> 2021-04-18 19:24:27.201 10808 ERROR oslo_messaging.rpc.server [req-225e7beb-f186-4235-abce-efcf4924d505 d7c514813e5d4fe6815f5f59e8e35f2f a008ad02d16f436a9e320882ca497055 - default default] Exception during message handling: nova.exception.ResizeError: Resize error: not able to execute ssh command: Unexpected error while running command.
> Command: ssh -o BatchMode=yes 10.0.128.20 mkdir -p /var/lib/nova/instances/64229d87-4cbb-44d1-ba8a-5fe63c9c40f3
> Exit code: 255
> Stdout: ''
> Stderr: 'Host key verification failed.\r\n'
>
> When I do su - nova on the origin server, as per the above, then ssh to the receiving server, I get this:
> Load key "/etc/nova/migration/identity": invalid format
>
> /etc/nova/migration/identity isn't mentioned anywhere in the documentation above.
>
> I tried:
> cat id_rsa > /etc/nova/migration/identity
> and
> cat id_rsa.pub >> /etc/nova/migration/authorized_keys
>
> Using the keys copied in the documentation above; still no go.  Same 'Host key verification failed.\r\n' result.
>
> What am I missing?
you will need to su to the nova user and make sure the key has the 
correct permissions set typically 600
and is owned by nova. then you need to do the key exchange and ensure 
its added to the known hosts.
i normally do that by manually sshing as the nova user to the 
destination hosts.

obviously if its more then a cople of hosts you will want to use ansible 
or something to automate the process.
there are basicaly 3 thing you need to do.
1.) copy a key with out a password to the nova user on all hosts and set 
permission to 600
2.) add the public key to authorized_keys on all hosts
3.) pre populate the known_hosts on all hosts for all other hosts.(you 
can use ssh-keyscan for this)
      if you have more then about 20 hosts do this on one host and copy 
to all other because quadratic with large number of hosts takes a while...

>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director – Information Technology
> Perform Air International Inc.
> DHilsbos at PerformAir.com
> www.PerformAir.com
>
> -----Original Message-----
> From: Sean Mooney [mailto:smooney at redhat.com]
> Sent: Friday, April 16, 2021 9:58 AM
> To: Dominic Hilsbos; openstack-discuss at lists.openstack.org
> Subject: Re: [ops][nova][victoria] Migrate cross CPU?
>
> hum ok the best way to debug this is to lis the server events and get
> the request id for the migration
> it may be req-ff109e53-74e0-40de-8ec7-29aff600b5f7 based on the logs you
> posted but you should see more info
> in the api, conductor and compute logs for that request id.
>
> given the state has not change i suspect it failed rather early.
>
> its possible that you are expirence an issue with the rabbitmq service
> and rpc calls are bing lost but
> i woudl not expect to see logs realted to this in the scudler while the
> vm is stilll in the SHUTOFF status.
>
> can you do "openstack server event list
> 64229d87-4cbb-44d1-ba8a-5fe63c9c40f3" then get the most recent
> resize event's request id and see if there are any other logs.
>
> regard
> sean.
>
> (note i think it will be listed as a resize not a migrate since
> interanlly migreate is implmented as resize but to the same flavour).
>
> On 16/04/2021 17:04, DHilsbos at performair.com wrote:
>> Sean;
>>
>> Thank you very much for your response.  I wasn't aware of the state change to resize_verify, that's useful.
>>
>> Unfortunately, at present, the state change is not occurring.
>>
>> Here's a series of commands, with output:
>>
>> #openstack server show 64229d87-4cbb-44d1-ba8a-5fe63c9c40f3
>> +-------------------------------------+----------------------------------------------------------+
>> | Field                               | Value                                                    |
>> +-------------------------------------+----------------------------------------------------------+
>> | OS-DCF:diskConfig                   | MANUAL                                                   |
>> | OS-EXT-AZ:availability_zone         | az-elcom-1                                               |
>> | OS-EXT-SRV-ATTR:host                | s700030.463.os.mcgown.enterprises                        |
>> | OS-EXT-SRV-ATTR:hypervisor_hostname | s700030.463.os.mcgown.enterprises                        |
>> | OS-EXT-SRV-ATTR:instance_name       | instance-00000037                                        |
>> | OS-EXT-STS:power_state              | Shutdown                                                 |
>> | OS-EXT-STS:task_state               | None                                                     |
>> | OS-EXT-STS:vm_state                 | stopped                                                  |
>> | OS-SRV-USG:launched_at              | 2021-03-06T04:36:07.000000                               |
>> | OS-SRV-USG:terminated_at            | None                                                     |
>> | accessIPv4                          |                                                          |
>> | accessIPv6                          |                                                          |
>> | addresses                           | it-network=10.255.127.208, 10.0.160.35                   |
>> | config_drive                        |                                                          |
>> | created                             | 2021-03-06T04:35:51Z                                     |
>> | flavor                              | m4.large (8)                                             |
>> | hostId                              | 174a83351ac674a25a2bf5131b931fc7a9e16be48b62f37925a66676 |
>> | id                                  | 64229d87-4cbb-44d1-ba8a-5fe63c9c40f3                     |
>> | image                               | N/A (booted from volume)                                 |
>> | key_name                            | None                                                     |
>> | name                                | Java Dev                                                 |
>> | project_id                          | 10dfdfadb7374ea1ba37bee1435d87ad                         |
>> | properties                          |                                                          |
>> | security_groups                     | name='allow-ping'                                        |
>> |                                     | name='allow-ssh'                                         |
>> |                                     | name='default'                                           |
>> | status                              | SHUTOFF                                                  |
>> | updated                             | 2021-04-16T15:52:07Z                                     |
>> | user_id                             | 69b73ea8f55c46a99021e77ebf70b62a                         |
>> | volumes_attached                    | id='ae69c924-60e5-431e-9572-c41a153e720b'                |
>> +-------------------------------------+----------------------------------------------------------+
>> #openstack server migrate --host s700066.463.os.mcgown.enterprises --os-compute-api-version 2.56 64229d87-4cbb-44d1-ba8a-5fe63c9c40f3
>> #openstack server show 64229d87-4cbb-44d1-ba8a-5fe63c9c40f3
>> +-------------------------------------+----------------------------------------------------------+
>> | Field                               | Value                                                    |
>> +-------------------------------------+----------------------------------------------------------+
>> | OS-DCF:diskConfig                   | MANUAL                                                   |
>> | OS-EXT-AZ:availability_zone         | az-elcom-1                                               |
>> | OS-EXT-SRV-ATTR:host                | s700030.463.os.mcgown.enterprises                        |
>> | OS-EXT-SRV-ATTR:hypervisor_hostname | s700030.463.os.mcgown.enterprises                        |
>> | OS-EXT-SRV-ATTR:instance_name       | instance-00000037                                        |
>> | OS-EXT-STS:power_state              | Shutdown                                                 |
>> | OS-EXT-STS:task_state               | None                                                     |
>> | OS-EXT-STS:vm_state                 | stopped                                                  |
>> | OS-SRV-USG:launched_at              | 2021-03-06T04:36:07.000000                               |
>> | OS-SRV-USG:terminated_at            | None                                                     |
>> | accessIPv4                          |                                                          |
>> | accessIPv6                          |                                                          |
>> | addresses                           | it-network=10.255.127.208, 10.0.160.35                   |
>> | config_drive                        |                                                          |
>> | created                             | 2021-03-06T04:35:51Z                                     |
>> | flavor                              | m4.large (8)                                             |
>> | hostId                              | 174a83351ac674a25a2bf5131b931fc7a9e16be48b62f37925a66676 |
>> | id                                  | 64229d87-4cbb-44d1-ba8a-5fe63c9c40f3                     |
>> | image                               | N/A (booted from volume)                                 |
>> | key_name                            | None                                                     |
>> | name                                | Java Dev                                                 |
>> | project_id                          | 10dfdfadb7374ea1ba37bee1435d87ad                         |
>> | properties                          |                                                          |
>> | security_groups                     | name='allow-ping'                                        |
>> |                                     | name='allow-ssh'                                         |
>> |                                     | name='default'                                           |
>> | status                              | SHUTOFF                                                  |
>> | updated                             | 2021-04-16T15:53:32Z                                     |
>> | user_id                             | 69b73ea8f55c46a99021e77ebf70b62a                         |
>> | volumes_attached                    | id='ae69c924-60e5-431e-9572-c41a153e720b'                |
>> +-------------------------------------+----------------------------------------------------------+
>> #tail /var/log/nova/nova-conductor.log
>> #tail /var/log/nova/nova-scheduler.log
>> 2021-04-16 08:53:24.870 3773 INFO nova.scheduler.host_manager [req-ff109e53-74e0-40de-8ec7-29aff600b5f7 d7c514813e5d4fe6815f5f59e8e35f2f a008ad02d16f436a9e320882ca497055 - default default] Host filter only checking host s700066.463.os.mcgown.enterprises and node s700066.463.os.mcgown.enterprises
>> 2021-04-16 08:53:24.871 3773 INFO nova.scheduler.host_manager [req-ff109e53-74e0-40de-8ec7-29aff600b5f7 d7c514813e5d4fe6815f5f59e8e35f2f a008ad02d16f436a9e320882ca497055 - default default] Host filter ignoring hosts:
>>
>> Both Cinder volume storage, and ephemeral storage are being handled by Ceph.
>>
>> Thank you,
>>
>> Dominic L. Hilsbos, MBA
>> Director – Information Technology
>> Perform Air International Inc.
>> DHilsbos at PerformAir.com
>> www.PerformAir.com
>>
>>
>> -----Original Message-----
>> From: Sean Mooney [mailto:smooney at redhat.com]
>> Sent: Friday, April 16, 2021 6:28 AM
>> To: openstack-discuss at lists.openstack.org
>> Subject: Re: [ops][nova][victoria] Migrate cross CPU?
>>
>>
>>
>> On 15/04/2021 19:05, DHilsbos at performair.com wrote:
>>> All;
>>>
>>> I seem to have generated another issue for myself...
>>>
>>> I built our Victoria cloud initially on Intel Atom servers.  We recently received the first of our AMD Epyc (7002 series) servers, which are intended to take over the Nova Compute responsibilities.
>>>
>>> I've had success in the past doing live migrates, but live migrating from one of the Atom servers to the new server fails, with an error indicating CPU compatibility problems.  Ok, I can understand that.
>>>
>>> My problem is that I don't seem to understand the openstack server migrate command (non-live).  It doesn't seem to do anything, whether the instance is Running or Shut Down.  I can't find errors in the logs from the API / conductor / scheduler host.
>>>
>>> I also can't find an option to pass to the openstack server start command which requests a specific host.
>>>
>>> Can I get these existing instances moved from the Atom servers to the Epyc server(s), or do I need to recreate them to do this?
>> you should be able to cold migrate them using the migrate command but
>> that should put the servers into resize_verify and then you need
>> to confirm the migration to complte it. we will not clean up the vm on
>> the source node until you do that last step.
>>
>>> Thank you,
>>>
>>> Dominic L. Hilsbos, MBA
>>> Director - Information Technology
>>> Perform Air International Inc.
>>> DHilsbos at PerformAir.com
>>> www.PerformAir.com
>>>
>>>
>>>




More information about the openstack-discuss mailing list