On Mon, 2020-02-03 at 21:36 +0000, Albert Braden wrote:
When we build a Centos 7 VM with 1.4T RAM it fails with "[ 17.797177] BUG: unable to handle kernel paging request at ffff988b19478000"
I asked in #centos and they asked me to show a list of devices from a working VM (if I use 720G RAM it works). This is the list:
[root@alberttest1 ~]# lspci 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02) 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] 00:01.2 USB controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01) 00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03) 00:02.0 VGA compatible controller: Cirrus Logic GD 5446 00:03.0 Ethernet controller: Red Hat, Inc. Virtio network device 00:04.0 SCSI storage controller: Red Hat, Inc. Virtio block device 00:05.0 SCSI storage controller: Red Hat, Inc. Virtio block device 00:06.0 SCSI storage controller: Red Hat, Inc. Virtio block device 00:07.0 Unclassified device [00ff]: Red Hat, Inc. Virtio memory balloon [root@alberttest1 ~]# lsusb Bus 001 Device 002: ID 0627:0001 Adomax Technology Co., Ltd Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
They suspect that the "Virtio memory balloon" driver is causing the problem, and that we should disable it. I googled around and found this:
http://www.linux-kvm.org/page/Projects/auto-ballooning
It looks like memory ballooning is deprecated. How can I get rid of the driver? http://www.linux-kvm.org/page/Projects/auto-ballooning states that no qemu that exists today implements that feature but the fact you see it in lspci seams to be in conflict with that. there are several refernce to the feature in later release of qemu and it is documented in libvirt https://libvirt.org/formatdomain.html#elementsMemBalloon
there is no way to turn it off specificly currently and im not aware of it being deprecated. the guest will not interact witht he vitio memory balloon by default. it is there too allow the guest to free memory and retrun it to the host to allow copperation between the guests and host to enable memory oversubscription. i belive this normally need the qemu guest agent to be deploy to work fully. with a 1.4TB vm how much memory have you reserved on the host. qemu will need memory to implement the vm emulation and this tends to increase as the guess uses more resouces. my first incliantion would be to check it the vm was killed as a result of a OOM event on the host.
Also they complained about my host bridge device; they say that we should have a newer one:
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
Where can I specify the host bridge?
you change this by specifying the machine type. you can use the q35 machine type instead. q35 is the replacement for i440 but when you enable it it will change a lot of other parameters. i dont know if it will disable the virtio memory ballon or not but if you are using large amount of memory you should also be using hugepages to reduce the hoverhead and improve performance. you can either set the machine type in the config https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.hw_... [libvirt] hw_machine_type=x86_64=q35 or in the guest image https://github.com/openstack/glance/blob/master/etc/metadefs/compute-libvirt... e.g. hw_machine_type=q35 note in the image you dont include the arch
<bugs_> ok ozzzo one of the devices is called "virtio memory balloon" [13:18:12] <bugs_> do you see that? [13:18:21] <ozzzo> yes [13:18:47] <bugs_> i suggest you google that and read about what it does - i think it would [13:19:02] <bugs_> be worth trying to disable that device on your larger vm to see what happens [13:19:18] <ozzzo> ok I will try that, thank you [13:19:30] * Altiare (~Altiare@unaffiliated/altiare) has quit IRC (Quit: Leaving) [13:21:45] * Sheogorath[m] (sheogora1@gateway/shell/matrix.org/x-uiiwpoddodtgrwwz) joins #centos [13:22:06] <@TrevorH> I also notice that the VM seems to be using the very old 440FX and there's a newer model of hardware available that might be worth checking [13:22:21] <@TrevorH> 440FX chipset is the old old pentium Pro chipset! [13:22:32] <@TrevorH> I had one of those in about 1996
yes it is an old chip set form the 90s but it is the default that openstack has used since it was created. we will likely change that in a cycle or two but really dont be surprised that we are using 440fx by default. its not really emulating a plathform form 1996. it started that way but it has been updated with the same name kept. with that said it does not support pcie or many other fature which is why we want to move too q35. q35 however while much more modern and secure uses more memroy and does not support older operating systems so there are trade offs. if you need to run centos 5 or 6 i would not be surrpised if you have issue with q35.