Rebooted VM disappears when HV lacks memory
Albert Braden
Albert.Braden at synopsys.com
Fri Dec 6 20:59:45 UTC 2019
Hi Sean,
I didn't want to mess with prod configs but shelve/unshelve works. Thanks for your advice!
-----Original Message-----
From: Sean Mooney <smooney at redhat.com>
Sent: Friday, December 6, 2019 3:53 AM
To: Albert Braden <albertb at synopsys.com>; openstack-discuss at lists.openstack.org
Subject: Re: Rebooted VM disappears when HV lacks memory
On Fri, 2019-12-06 at 02:22 +0000, Albert Braden wrote:
> Update: It looks like attempting to migrate any VM, alive or dead, from an oversubscribed hypervisor throws the same
> error. Maybe I should have done the migrations before setting cpu_allocation_ratio and ram_allocation_ratio.
>
> Is there a way to recover from this besides deleting VMs from the oversubscribed hypervisors?
you could tempoerally set the allocation ratio higher to allow the vm to boot then try migrating it again.
you could also maybe try shelving the vm, waiting for ti to be shelve offloaded then unshelve it.
>
> -----Original Message-----
> From: Albert Braden
> Sent: Thursday, December 5, 2019 6:08 PM
> To: openstack-discuss at lists.openstack.org
> Subject: RE: Rebooted VM disappears when HV lacks memory
>
> We changed cpu_allocation_ratio and ram_allocation_ratio to 1 on the hypervisors. Now I'm trying to clean up the mess.
>
> I'm trying to cold-migrate dead VMs from an oversubscribed hypervisor, but they are failing to migrate:
>
> root at us01odc-p02-ctrl1:~# os hypervisor list --long|grep hv199
> > 218 | us01-p02-hv199.internal.synopsys.com | QEMU | 10.195.54.226 | up | 138 | 80
> > | 1122304 | 772691 |
>
> root at us01odc-p02-ctrl1:~# openstack server migrate fc250e76-4ca5-44b1-a6a0-8a7c7ad48c24
> No valid host was found. No valid host found for cold migrate (HTTP 400) (Request-ID: req-cc048221-abaf-4be0-bc90-
> f28eecec3df5)
>
> But we have lots of hosts with space, and if I build a new VM with the same flavor it works. When I check the logs, it
> appears that the migration fails because the hypervisor we are moving from is oversubscribed:
>
> /var/log/nova/nova-conductor.log:2019-12-05 17:30:17.163 48989 WARNING nova.scheduler.client.report [req-cc048221-
> abaf-4be0-bc90-f28eecec3df5 5b288691527245bda715ab7744a193e9 deea2d8541f741eda6fb0d242d16bb23 - default
> 824941b2ad6a43d7984fab9390290f18] Unable to post allocations for instance 87608240-1ad2-45fc-b4f0-c118e3bf0262 (409
> {"errors": [{"status": 409, "request_id": "req-e56d29f8-c97e-41ac-8481-0598f3c681b2", "detail": "There was a conflict
> when trying to complete your request.\n\n Unable to allocate inventory: Unable to create allocation for 'MEMORY_MB' on
> resource provider 'fde477cd-fd3d-4296-838c-f9c54b3e97ef'. The requested amount would exceed the capacity. ", "title":
> "Conflict"}]})
>
> 'fde477cd-fd3d-4296-838c-f9c54b3e97ef' is the UUID of hv199:
>
> root at us01odc-p02-ctrl1:~# curl -s -X GET -H "X-Auth-Token: $token" -H "Content-Type: application/json" "
> http://us01odc-p02-lb.internal.synopsys.com:28778/resource_providers/fde477cd-fd3d-4296-838c-f9c54b3e97ef";echo
> {"generation": 40, "uuid": "fde477cd-fd3d-4296-838c-f9c54b3e97ef", "links": [{"href": "/resource_providers/fde477cd-
> fd3d-4296-838c-f9c54b3e97ef", "rel": "self"}, {"href": "/resource_providers/fde477cd-fd3d-4296-838c-
> f9c54b3e97ef/inventories", "rel": "inventories"}, {"href": "/resource_providers/fde477cd-fd3d-4296-838c-
> f9c54b3e97ef/usages", "rel": "usages"}], "name": "us01-p02-hv199.internal.synopsys.com"}
>
> What am I doing wrong? Do I need to move the live VMs away from hv199 to make room for the dead ones?
>
>
>
> -----Original Message-----
> From: Sean Mooney <smooney at redhat.com>
> Sent: Wednesday, December 4, 2019 4:30 AM
> To: Albert Braden <albertb at synopsys.com>; openstack-discuss at lists.openstack.org
> Subject: Re: Rebooted VM disappears when HV lacks memory
>
> On Wed, 2019-12-04 at 00:51 +0000, Albert Braden wrote:
> > We are running Rocky and we have the allocation bug because we set cpu_allocation_ratio and ram_allocation_ratio to
> > 1
> > on the controllers but not on the hypervisors, so the hypervisors are oversubscribed. When we try to reboot a VM
> > when
> > the HV is short memory, the VM is deleted from the HV. It still exists in Openstack but not on the HV. Is it
> > possible
> > to recover a VM from this state?
>
> so the vm is likely not deleted the harddisk and the libvirt domain will still be defiend. my guess is starting the
> vm cause the oom reaper to kill the qemu process when it was relaunched.
>
> if you are deploying with memory over subsription enabled you should configre enough swap space equal to total
> memory* ram_allocation_ratio.
>
> if you do that you you should be able to start the vm. now the other option is while the vm is stopped you can
> cold migrate it to another host and then start it there.
More information about the openstack-discuss
mailing list