For me this does not look like a problem in OpenStack and is
likely the one with your ceph cluster
according to your explanation.
The nova log you posted shows that the VM goes down for some
reason and nova updated
its power status because it shut off. This is apparently an
expected behavior. You have to
find out why that VM goes down. According to your update there
seems to be some issue
with ceph storage cluster and that can be a cause. So I'd
recommend you first check ceph logs
and check its current status.
If you look into the problem from VM side then you may be able to
check console log file
at hypervisor, located at
/var/lib/nova/instances/<instance id>/console.log
after an instance goes down. The file might contain some messages
sent to console before
it crashes. Also domain logs at
/var/log/libvirt/qemu/instance-XXXXXX.log might explain
the exit status of the instance before it goes down (eg. crash)
Finally, you can leverage this mailing list to get help, but you
may need to understand
the people responding you by best-effort-basis. The people
subscribing to the list is geographically
distributed and usually early time in Monday is silent. If you
need urgent support to resolve your
problem asap to continue your business, then I'd personally
recommend looking for product support
rather than asking update after a few hours in this list.
Hi team,
Any update on this?
Thanks & RegardsArihant Jain
On Mon, 27 Nov, 2023, 8:07 am AJ_ sunny, <jains8550@gmail.com> wrote:
++adding @ceph-users-confirm+4555fdc6282a38c849f4d27a40339f1b7e4bde74@ceph.io++Adding dev@ceph.io
Thanks,&, RegardsArihant Jain
On Mon, 27 Nov, 2023, 7:48 am AJ_ sunny, <jains8550@gmail.com> wrote:
Hi team,
After doing above changes I am still getting the issue in which machine continuously went shutdown
In nova-compute logs I am getting only this footprint
Logs:-2023-10-16 08:48:10.971 7 WARNING nova.compute.manager [req-c7b731db-2b61-400e-917f-8645c9984696 f226d81a45dd46488fb2e19515 848 316d215042914de190f5f9e1c8466bf0 default default] [instance: 4b04d3f1-1fbd-4b63-b693-a0ef316ecff3] Received unexpected - vent network-vif-plugged-f191f6c8-dff5-4c1b-94b3-8d91aa6ff5ac for instance with vm_state active and task_state None. 2023-10-21 22:42:44.589 7 INFO nova.compute.manager [-] [instance: 4b04d3f1-1fbd-4b63-b693-a0ef316ecff3] VM Stopped (Lifecyc Event)
2023-10-21 22:42:44.683 7 INFO nova.compute.manager [req-1d99b87b-7ff7-462d-ab18-fbdec6bda71d -] [instance: 4b04d3f1- fbd-4b63-b693-a0ef316ecff3] During _sync_instance_power_state the DB power_state (1) does not match the vm_power_state from ti e hypervisor (4). Updating power_state in the DB to match the hypervisor.
2023-10-21 22:42:44.811 7 WARNING nova.compute.manager [req-1d99b87b-7ff7-462d-ab18-fbdec6bda71d ----] [instance: 4b04d3f 1-1fbd-4b63-b693-a0ef316ecff3] Instance shutdown by itself. Calling the stop API. Current vm_state: active, current task_state : None, original DB power_state: 1, current VM power_state: 4 2023-10-21 22:42:44.977 7 INFO nova.compute.manager [req-1d99b87b-7ff7-462d-ab18-fbdec6bda71d -] [instance: 4b04d3f1-1
fbd-4b63-b693-a0ef316ecff3] Instance is already powered off in the hypervisor when stop is called.
And in this architecture we are using ceph is the backend storage for Nova,glance & cinderWhen machine auto goes down and if i try to start the machine it will go in error i.e. in Vm console is show I/O ERROR during boot so first we need to rebuild the volume from ceph side then I have to start the machineRbd object-map rebuild<volume-id>Openstack server start <server-id>
So this issue is showing two faces one from ceph side and another from nova-compute logcan someone please help me out to fix out this issue asap
Thanks & RegardsArihant Jain
On Tue, 24 Oct, 2023, 4:56 pm , <smooney@redhat.com> wrote:
On Tue, 2023-10-24 at 10:11 +0530, AJ_ sunny wrote:
> Hi team,
>
> Vm is not shutting off by owner from inside its automatically went to
> shutdown i.e. libvirt lifecycle stop event triggering
> In my nova.conf configuration I am using ram_allocation_ratio = 1.5
> And previously I tried to set in nova.conf
> Sync_power_state_interval = -1 but still facing the same problem
> OOM might be causing this issue
> Can you please give me some idea to fix this issue if OOM is the cause
the general answer is swap.
nova should alwasy be deployed with swap even if you do not have over commit enabled.
there are a few reason for this the first being python allocates memory diffently if
any swap is aviable, even 1G is enough to have it not try to commit all memory. so
when swap is aviable the nova/neutron agents will use much less resident memeory even with
out usign any of the swap space.
we have some docs about this downstream
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/17.1/html/configuring_the_compute_service_for_instance_creation/assembly_configuring-the-compute-service_osp#ref_calculating-swap-size_configuring-compute
if you are being ultra conservative we recommend allocating (ram * allocation ratio) in swap so in your case allcoate
1.5 times your ram as swap. we woudl expect the actul useage of swap to be a small fraction of that however so we
also provide a formula for
overcommit_ratio = NovaRAMAllocationRatio - 1
Minimum swap size (MB) = (total_RAM * overcommit_ratio) + RHEL_min_swap
Recommended swap size (MB) = total_RAM * (overcommit_ratio + percentage_of_RAM_to_use_for_swap)
so say your host had 64G of ram with an allocation ratio of 1.5 and a min swap percentaiong of 25%
the conserviver swap recommentation would be
(64*(0.5+0.25)) + disto_min_swap
(64*0.75) + 4G = 52G of recommended swap
if your wondering why we add a min swap precentage and disto min swap its basically to acocund for the
Qemu and host OS memory overhead as well as the memory used by the nova/neutron agents and libvirt/ovs
if your not using memory over commit my general recommdation is if you have less then 64G of ram allcoate 16G if you
have more then 256G of ram allocate 64G and you should be fine. when you do use memofy over commit you must
have at least enouch swap to account for the qemu overhead of all instance + the over committed memory.
the other common cause of OOM errors is if you are using numa affinity and the guest dont request
hw:mem_page_size=<something> without setting a mem_page_size request we dont do numa aware memory placement. the kernel
OOM system works
on a per numa node basis, numa affintiy does not supprot memory over commit either so that is likly not your issue.
i jsut said i woudl mention it to cover all basis.
regards
sean
>
>
> Thanks & Regards
> Arihant Jain
>
> On Mon, 23 Oct, 2023, 11:29 pm , <smooney@redhat.com> wrote:
>
> > On Mon, 2023-10-23 at 13:19 -0400, Jonathan Proulx wrote:
> > >
> > > I've seen similar log traces with overcommitted memory when the
> > > hypervisor runs out of physical memory and OOM killer gets the VM
> > > process.
> > >
> > > This is an unusuall configuration (I think) but if the VM owner claims
> > > they didn't power down the VM internally you might look at the local
> > > hypevisor logs to see if the VM process crashed or was killed for some
> > > other reason.
> > yep OOM events are one common causes fo this.
> >
> > nova is bacialy just saying "hay you said this vm should be active but its
> > not, im going to update the db to reflect
> > reality." you can turn that off with
> >
> > https://docs.openstack.org/nova/latest/configuration/config.html#workarounds.handle_virt_lifecycle_events
> > or
> >
> > https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.sync_power_state_interval
> > either disabel the sync via setign the interval to -1
> > or disable haneling the virt lifecycle events.
> >
> > i would recommend the sync_power_state_interval approach but again if vms
> > are stopping
> > and you dont know why you likely should discover why rahter then just
> > turning if the update of the nova db to reflect
> > the actual sate.
> >
> > >
> > > -Jon
> > >
> > > On Mon, Oct 23, 2023 at 02:02:26PM +0100, smooney@redhat.com wrote:
> > > :On Mon, 2023-10-23 at 17:45 +0530, AJ_ sunny wrote:
> > > :> Hi team,
> > > :>
> > > :> I am using openstack kolla ansible on wallaby version and currently I
> > am
> > > :> facing issue with virtual machine, vm is shutoff by itself and and
> > from log
> > > :> it seems libvirt lifecycle stop event triggering again and again
> > > :>
> > > :> Logs:-
> > > :> 2023-10-16 08:48:10.971 7 WARNING nova.compute.manager
> > > :> [req-c7b731db-2b61-400e-917f-8645c9984696 f226d81a45dd46488fb2e19515
> > 848
> > > :> 316d215042914de190f5f9e1c8466bf0 default default] [instance:
> > > :> 4b04d3f1-1fbd-4b63-b693-a0ef316ecff3] Received unexpected - vent
> > > :> network-vif-plugged-f191f6c8-dff5-4c1b-94b3-8d91aa6ff5ac for instance
> > with
> > > :> vm_state active and task_state None. 2023-10-21 22:42:44.589 7 INFO
> > > :> nova.compute.manager [-] [instance:
> > 4b04d3f1-1fbd-4b63-b693-a0ef316ecff3]
> > > :> VM Stopped (Lifecyc Event)
> > > :>
> > > :> 2023-10-21 22:42:44.683 7 INFO nova.compute.manager
> > > :> [req-1d99b87b-7ff7-462d-ab18-fbdec6bda71d -] [instance: 4b04d3f1-
> > > :> fbd-4b63-b693-a0ef316ecff3] During _sync_instance_power_state the DB
> > > :> power_state (1) does not match the vm_power_state from ti e
> > hypervisor (4).
> > > :> Updating power_state in the DB to match the hypervisor.
> > > :>
> > > :> 2023-10-21 22:42:44.811 7 WARNING nova.compute.manager
> > > :> [req-1d99b87b-7ff7-462d-ab18-fbdec6bda71d ----] [instance: 4b04d3f
> > > :> 1-1fbd-4b63-b693-a0ef316ecff3] Instance shutdown by itself. Calling
> > the
> > > :> stop API. Current vm_state: active, current task_state : None,
> > original DB
> > > :> power_state: 1, current VM power_state: 4 2023-10-21 22:42:44.977 7
> > INFO
> > > :> nova.compute.manager [req-1d99b87b-7ff7-462d-ab18-fbdec6bda71d -]
> > > :> [instance: 4b04d3f1-1
> > > :>
> > > :> fbd-4b63-b693-a0ef316ecff3] Instance is already powered off in the
> > > :> hypervisor when stop is called.
> > > :
> > > :that sounds like the guest os shutdown the vm.
> > > :i.e. somethign in the guest ran sudo poweroff
> > > :then nova detected teh vm was stoped by the user and updated its db to
> > match
> > > :that.
> > > :
> > > :that is the expected beahvior wehn you have the power sync enabled.
> > > :it is enabled by default.
> > > :>
> > > :>
> > > :> Thanks & Regards
> > > :> Arihant Jain
> > > :> +91 8299719369
> > > :
> > >
> >
> >