[ops][nova][victoria] Migrate cross CPU?

DHilsbos at performair.com DHilsbos at performair.com
Mon Apr 19 18:22:58 UTC 2021


Sean;

Thank you for all your help, the migrated server was able to be force rebooted.

For reference, I had the physical network configuration slightly off.

Thank you,

Dominic L. Hilsbos, MBA 
Director – Information Technology 
Perform Air International Inc.
DHilsbos at PerformAir.com 
www.PerformAir.com


-----Original Message-----
From: Sean Mooney [mailto:smooney at redhat.com] 
Sent: Monday, April 19, 2021 10:52 AM
To: Dominic Hilsbos; openstack-discuss at lists.openstack.org
Subject: Re: [ops][nova][victoria] Migrate cross CPU?



On 19/04/2021 18:27, DHilsbos at performair.com wrote:
> All;
>
> I think I've worked through the issue with ssh, however I now have another issue.  I've attached an extract from the Nova Compute instance on the new server.
>
> If I'm reading this correctly, is it having trouble accessing Ceph?  Also, this machine I used here can be thrown away, but is there a way to recover it?
Unexpected vif_type=binding_failed is the error and that indicates a 
issue with neutron.
specifically in this case the excation was raised as part of _finish_resize
when we update the neutron port host filed to the destination before we 
generate teh domain xml.

in this case the neutron ml2 driver refused to bind the port to the new 
host.

req-228b5f98-e3a4-4c22-8c90-eacce6efb091 is the nova request id but we 
might also use the same on ewhne calling neutron so you cloud try and 
see if there
is an message with that request id in the neutron-server log. if that 
return nothing then you can check with the port uuid 
2e7d818a-43e1-48fb-a4d3-9e36034a46bf
if you correct the neutron issue perhaps by manually unsetting and 
setting the binding_host on the port as an admin to retrigger port 
bininding you can hard reboot the
vm to fix it.

you should see an issue in the neturon logs however.

i dont see anyth8ng related to ceph in those logs so i think your 
storage is likely fine.


>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director – Information Technology
> Perform Air International Inc.
> DHilsbos at PerformAir.com
> www.PerformAir.com
>
>
> -----Original Message-----
> From: Sean Mooney [mailto:smooney at redhat.com]
> Sent: Monday, April 19, 2021 3:38 AM
> To: Dominic Hilsbos; openstack-discuss at lists.openstack.org
> Subject: Re: [ops][nova][victoria] Migrate cross CPU?
>
>
>
> On 19/04/2021 03:51, DHilsbos at performair.com wrote:
>> Sean;
>>
>> Thank you, your suggestion led me to a problem with ssh.  I was a little surprised by this, as live migration works.
> thats a pretty common issue.
> live migration does not use ssh or rsync to to copy the vm disk data
> that is done by qemu.
> for cold migration the data is copied by nova using one of 2 drivers
> either ssh/scp or rsync.
>> I reviewed:
>> https://docs.openstack.org/nova/victoria/admin/ssh-configuration.html#cli-os-migrate-cfg-ssh
>> and found that I had a problem with the authorized keys file.  I took care of that, and it still didn't work.
>>
>> Here's what came out of the nova compute log:
>> 2021-04-18 19:24:27.201 10808 ERROR oslo_messaging.rpc.server [req-225e7beb-f186-4235-abce-efcf4924d505 d7c514813e5d4fe6815f5f59e8e35f2f a008ad02d16f436a9e320882ca497055 - default default] Exception during message handling: nova.exception.ResizeError: Resize error: not able to execute ssh command: Unexpected error while running command.
>> Command: ssh -o BatchMode=yes 10.0.128.20 mkdir -p /var/lib/nova/instances/64229d87-4cbb-44d1-ba8a-5fe63c9c40f3
>> Exit code: 255
>> Stdout: ''
>> Stderr: 'Host key verification failed.\r\n'
>>
>> When I do su - nova on the origin server, as per the above, then ssh to the receiving server, I get this:
>> Load key "/etc/nova/migration/identity": invalid format
>>
>> /etc/nova/migration/identity isn't mentioned anywhere in the documentation above.
>>
>> I tried:
>> cat id_rsa > /etc/nova/migration/identity
>> and
>> cat id_rsa.pub >> /etc/nova/migration/authorized_keys
>>
>> Using the keys copied in the documentation above; still no go.  Same 'Host key verification failed.\r\n' result.
>>
>> What am I missing?
> you will need to su to the nova user and make sure the key has the
> correct permissions set typically 600
> and is owned by nova. then you need to do the key exchange and ensure
> its added to the known hosts.
> i normally do that by manually sshing as the nova user to the
> destination hosts.
>
> obviously if its more then a cople of hosts you will want to use ansible
> or something to automate the process.
> there are basicaly 3 thing you need to do.
> 1.) copy a key with out a password to the nova user on all hosts and set
> permission to 600
> 2.) add the public key to authorized_keys on all hosts
> 3.) pre populate the known_hosts on all hosts for all other hosts.(you
> can use ssh-keyscan for this)
>        if you have more then about 20 hosts do this on one host and copy
> to all other because quadratic with large number of hosts takes a while...
>
>> Thank you,
>>
>> Dominic L. Hilsbos, MBA
>> Director – Information Technology
>> Perform Air International Inc.
>> DHilsbos at PerformAir.com
>> www.PerformAir.com
>>
>> -----Original Message-----
>> From: Sean Mooney [mailto:smooney at redhat.com]
>> Sent: Friday, April 16, 2021 9:58 AM
>> To: Dominic Hilsbos; openstack-discuss at lists.openstack.org
>> Subject: Re: [ops][nova][victoria] Migrate cross CPU?
>>
>> hum ok the best way to debug this is to lis the server events and get
>> the request id for the migration
>> it may be req-ff109e53-74e0-40de-8ec7-29aff600b5f7 based on the logs you
>> posted but you should see more info
>> in the api, conductor and compute logs for that request id.
>>
>> given the state has not change i suspect it failed rather early.
>>
>> its possible that you are expirence an issue with the rabbitmq service
>> and rpc calls are bing lost but
>> i woudl not expect to see logs realted to this in the scudler while the
>> vm is stilll in the SHUTOFF status.
>>
>> can you do "openstack server event list
>> 64229d87-4cbb-44d1-ba8a-5fe63c9c40f3" then get the most recent
>> resize event's request id and see if there are any other logs.
>>
>> regard
>> sean.
>>
>> (note i think it will be listed as a resize not a migrate since
>> interanlly migreate is implmented as resize but to the same flavour).
>>
>> On 16/04/2021 17:04, DHilsbos at performair.com wrote:
>>> Sean;
>>>
>>> Thank you very much for your response.  I wasn't aware of the state change to resize_verify, that's useful.
>>>
>>> Unfortunately, at present, the state change is not occurring.
>>>
>>> Here's a series of commands, with output:
>>>
>>> #openstack server show 64229d87-4cbb-44d1-ba8a-5fe63c9c40f3
>>> +-------------------------------------+----------------------------------------------------------+
>>> | Field                               | Value                                                    |
>>> +-------------------------------------+----------------------------------------------------------+
>>> | OS-DCF:diskConfig                   | MANUAL                                                   |
>>> | OS-EXT-AZ:availability_zone         | az-elcom-1                                               |
>>> | OS-EXT-SRV-ATTR:host                | s700030.463.os.mcgown.enterprises                        |
>>> | OS-EXT-SRV-ATTR:hypervisor_hostname | s700030.463.os.mcgown.enterprises                        |
>>> | OS-EXT-SRV-ATTR:instance_name       | instance-00000037                                        |
>>> | OS-EXT-STS:power_state              | Shutdown                                                 |
>>> | OS-EXT-STS:task_state               | None                                                     |
>>> | OS-EXT-STS:vm_state                 | stopped                                                  |
>>> | OS-SRV-USG:launched_at              | 2021-03-06T04:36:07.000000                               |
>>> | OS-SRV-USG:terminated_at            | None                                                     |
>>> | accessIPv4                          |                                                          |
>>> | accessIPv6                          |                                                          |
>>> | addresses                           | it-network=10.255.127.208, 10.0.160.35                   |
>>> | config_drive                        |                                                          |
>>> | created                             | 2021-03-06T04:35:51Z                                     |
>>> | flavor                              | m4.large (8)                                             |
>>> | hostId                              | 174a83351ac674a25a2bf5131b931fc7a9e16be48b62f37925a66676 |
>>> | id                                  | 64229d87-4cbb-44d1-ba8a-5fe63c9c40f3                     |
>>> | image                               | N/A (booted from volume)                                 |
>>> | key_name                            | None                                                     |
>>> | name                                | Java Dev                                                 |
>>> | project_id                          | 10dfdfadb7374ea1ba37bee1435d87ad                         |
>>> | properties                          |                                                          |
>>> | security_groups                     | name='allow-ping'                                        |
>>> |                                     | name='allow-ssh'                                         |
>>> |                                     | name='default'                                           |
>>> | status                              | SHUTOFF                                                  |
>>> | updated                             | 2021-04-16T15:52:07Z                                     |
>>> | user_id                             | 69b73ea8f55c46a99021e77ebf70b62a                         |
>>> | volumes_attached                    | id='ae69c924-60e5-431e-9572-c41a153e720b'                |
>>> +-------------------------------------+----------------------------------------------------------+
>>> #openstack server migrate --host s700066.463.os.mcgown.enterprises --os-compute-api-version 2.56 64229d87-4cbb-44d1-ba8a-5fe63c9c40f3
>>> #openstack server show 64229d87-4cbb-44d1-ba8a-5fe63c9c40f3
>>> +-------------------------------------+----------------------------------------------------------+
>>> | Field                               | Value                                                    |
>>> +-------------------------------------+----------------------------------------------------------+
>>> | OS-DCF:diskConfig                   | MANUAL                                                   |
>>> | OS-EXT-AZ:availability_zone         | az-elcom-1                                               |
>>> | OS-EXT-SRV-ATTR:host                | s700030.463.os.mcgown.enterprises                        |
>>> | OS-EXT-SRV-ATTR:hypervisor_hostname | s700030.463.os.mcgown.enterprises                        |
>>> | OS-EXT-SRV-ATTR:instance_name       | instance-00000037                                        |
>>> | OS-EXT-STS:power_state              | Shutdown                                                 |
>>> | OS-EXT-STS:task_state               | None                                                     |
>>> | OS-EXT-STS:vm_state                 | stopped                                                  |
>>> | OS-SRV-USG:launched_at              | 2021-03-06T04:36:07.000000                               |
>>> | OS-SRV-USG:terminated_at            | None                                                     |
>>> | accessIPv4                          |                                                          |
>>> | accessIPv6                          |                                                          |
>>> | addresses                           | it-network=10.255.127.208, 10.0.160.35                   |
>>> | config_drive                        |                                                          |
>>> | created                             | 2021-03-06T04:35:51Z                                     |
>>> | flavor                              | m4.large (8)                                             |
>>> | hostId                              | 174a83351ac674a25a2bf5131b931fc7a9e16be48b62f37925a66676 |
>>> | id                                  | 64229d87-4cbb-44d1-ba8a-5fe63c9c40f3                     |
>>> | image                               | N/A (booted from volume)                                 |
>>> | key_name                            | None                                                     |
>>> | name                                | Java Dev                                                 |
>>> | project_id                          | 10dfdfadb7374ea1ba37bee1435d87ad                         |
>>> | properties                          |                                                          |
>>> | security_groups                     | name='allow-ping'                                        |
>>> |                                     | name='allow-ssh'                                         |
>>> |                                     | name='default'                                           |
>>> | status                              | SHUTOFF                                                  |
>>> | updated                             | 2021-04-16T15:53:32Z                                     |
>>> | user_id                             | 69b73ea8f55c46a99021e77ebf70b62a                         |
>>> | volumes_attached                    | id='ae69c924-60e5-431e-9572-c41a153e720b'                |
>>> +-------------------------------------+----------------------------------------------------------+
>>> #tail /var/log/nova/nova-conductor.log
>>> #tail /var/log/nova/nova-scheduler.log
>>> 2021-04-16 08:53:24.870 3773 INFO nova.scheduler.host_manager [req-ff109e53-74e0-40de-8ec7-29aff600b5f7 d7c514813e5d4fe6815f5f59e8e35f2f a008ad02d16f436a9e320882ca497055 - default default] Host filter only checking host s700066.463.os.mcgown.enterprises and node s700066.463.os.mcgown.enterprises
>>> 2021-04-16 08:53:24.871 3773 INFO nova.scheduler.host_manager [req-ff109e53-74e0-40de-8ec7-29aff600b5f7 d7c514813e5d4fe6815f5f59e8e35f2f a008ad02d16f436a9e320882ca497055 - default default] Host filter ignoring hosts:
>>>
>>> Both Cinder volume storage, and ephemeral storage are being handled by Ceph.
>>>
>>> Thank you,
>>>
>>> Dominic L. Hilsbos, MBA
>>> Director – Information Technology
>>> Perform Air International Inc.
>>> DHilsbos at PerformAir.com
>>> www.PerformAir.com
>>>
>>>
>>> -----Original Message-----
>>> From: Sean Mooney [mailto:smooney at redhat.com]
>>> Sent: Friday, April 16, 2021 6:28 AM
>>> To: openstack-discuss at lists.openstack.org
>>> Subject: Re: [ops][nova][victoria] Migrate cross CPU?
>>>
>>>
>>>
>>> On 15/04/2021 19:05, DHilsbos at performair.com wrote:
>>>> All;
>>>>
>>>> I seem to have generated another issue for myself...
>>>>
>>>> I built our Victoria cloud initially on Intel Atom servers.  We recently received the first of our AMD Epyc (7002 series) servers, which are intended to take over the Nova Compute responsibilities.
>>>>
>>>> I've had success in the past doing live migrates, but live migrating from one of the Atom servers to the new server fails, with an error indicating CPU compatibility problems.  Ok, I can understand that.
>>>>
>>>> My problem is that I don't seem to understand the openstack server migrate command (non-live).  It doesn't seem to do anything, whether the instance is Running or Shut Down.  I can't find errors in the logs from the API / conductor / scheduler host.
>>>>
>>>> I also can't find an option to pass to the openstack server start command which requests a specific host.
>>>>
>>>> Can I get these existing instances moved from the Atom servers to the Epyc server(s), or do I need to recreate them to do this?
>>> you should be able to cold migrate them using the migrate command but
>>> that should put the servers into resize_verify and then you need
>>> to confirm the migration to complte it. we will not clean up the vm on
>>> the source node until you do that last step.
>>>
>>>> Thank you,
>>>>
>>>> Dominic L. Hilsbos, MBA
>>>> Director - Information Technology
>>>> Perform Air International Inc.
>>>> DHilsbos at PerformAir.com
>>>> www.PerformAir.com
>>>>
>>>>
>>>>



More information about the openstack-discuss mailing list