Re: [rdo-users] [rdo][ussuri][TripleO][nova][kvm] libvirt.libvirtError: internal error: process exited while connecting to monitor
Hi all! I was able to analyze the attached log files and I hope that the results may help you understand what's going wrong with instance creation. You can find *Log_Tool's unique exported Error blocks* here: http://paste.openstack.org/show/795356/ *Some statistics and problematical messages:* ##### Statistics - Number of Errors/Warnings per Standard OSP log since: 2020-06-30 12:30:00 ##### Total_Number_Of_Errors --> 9 /home/ashtempl/Ruslanas/controller/neutron/server.log --> 1 /home/ashtempl/Ruslanas/compute/stdouts/ovn_controller.log --> 1 /home/ashtempl/Ruslanas/compute/nova/nova-compute.log --> 7 *nova-compute.log* *default default] Error launching a defined domain with XML: <domain type='kvm'>* 368-2020-06-30 12:30:10.815 7 *ERROR* nova.compute.manager [req-87bef18f-ad3d-4147-a1b3-196b5b64b688 7bdb8c3bf8004f98aae1b16d938ac09b 69134106b56941698e58c61... 70dc50f] Instance *failed* to spawn: *libvirt.libvirtError*: internal *error*: qemu unexpectedly closed the monitor: 2020-06-30T10:30:10.182675Z qemu-kvm: *error*: failed to set MSR 0... he monitor: 2020-06-30T10:30:10.182675Z *qemu-kvm: error: failed to set MSR 0x48e to 0xfff9fffe04006172* _msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' *failed*. [instance: 128f372c-cb2e-47d9-b1bf-ce17270dc50f] *Traceback* (most recent call last): 375-2020-06-30 12:30:10.815 7* ERROR* nova.compute.manager [instance: 128f372c-cb2e-47d9-b1bf-ce17270dc50f] File "/usr/lib/python3.6/site-packages/nova/vir... *server.log * 5821c815-d213-498d-9394-fe25c6849918', 'status': 'failed', *'code': 422} returned with failed status* *ovn_controller.log* 272-2020-06-30T12:30:10.126079625+02:00 stderr F 2020-06-30T10:30:10Z|00247|patch|WARN|*Bridge 'br-ex' not found for network 'datacentre'* Thanks! On Tue, Jun 30, 2020 at 2:13 PM Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
attaching logs here. let's see if it will work.
On Tue, 30 Jun 2020 at 12:55, Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
hi all,
I am back, had some issues with MTU. Now looks good, at least deployment part.
So I have installed back what I had, and still failing at same point as in first message.
I have tried to use: LogTool, how to use it? well, I launched it, but it always say [0] detailed output: File "./PyTool.py", line 596, in <module> random_node=random.choice(overcloud_nodes)
I do not get, how to make it work, should it get from stackrc ? as I see in overcloud_nodes = [] all_nodes = exec_command_line_command('source ' + source_rc_file_path + 'stackrc;openstack server list -f json')[
[0] http://paste.openstack.org/show/795345/
On Wed, 24 Jun 2020 at 20:02, Arkady Shtempler <ashtempl@redhat.com> wrote:
Hi Ruslanas!
Is it possible to get all logs under /var/log/containers somehow?
Thanks!
On Wed, Jun 24, 2020 at 2:18 AM Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
Hi Alfredo,
Compute nodes are baremetal or virtualized?, I've seen similar bug
reports when using nested virtualization in other OSes.
baremetal. Dell R630 if to be VERY precise.
When using podman, the recommended way to restart containers is using systemd:
https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/deployme...
Thank you, I will try. I also modified a file, and it looked like it relaunched podman container once config was changed. Either way, if I understand Linux config correctly, the default value for user and group is root, if commented out: #user = "root" #group = "root"
also in some logs, I saw, that it detected, that it is not AMD CPU :) and it is really not AMD CPU.
Just for fun, it might be important, here is how my node info looks. ComputeS01Parameters: NovaReservedHostMemory: 16384 KernelArgs: "crashkernel=no rhgb" ComputeS01ExtraConfig: nova::cpu_allocation_ratio: 4.0 nova::compute::libvirt::rx_queue_size: 1024 nova::compute::libvirt::tx_queue_size: 1024 nova::compute::resume_guests_state_on_host_boot: true _______________________________________________ users mailing list users@lists.rdoproject.org http://lists.rdoproject.org/mailman/listinfo/users
To unsubscribe: users-unsubscribe@lists.rdoproject.org
-- Ruslanas Gžibovskis +370 6030 7030
-- Ruslanas Gžibovskis +370 6030 7030
Hi all! Here we go, we are in the second part of this interesting troubleshooting! 1) I have LogTool setup.Thank you Arkady. 2) I have user OSP to create instance, and I have used virsh to create instance. 2.1) OSP way is failing in either way, if it is volume-based or image-based, it is failing either way.. [1] and [2] 2.2) when I create it using CLI: [0] [3] any ideas what can be wrong? What options I should choose? I have one network/vlan for whole cloud. I am doing proof of concept of remote booting, so I do not have br-ex setup. and I do not have br-provider. There is my compute[5] and controller[6] yaml files, Please help, how it should look like so it would have br-ex and br-int connected? as br-int now is in UNKNOWN state. And br-ex do not exist. As I understand, in roles data yaml, when we have tag external it should create br-ex? or am I wrong? [0] http://paste.openstack.org/show/Rdou7nvEWMxpGECfQHVm/ VM is running. [1] http://paste.openstack.org/show/tp8P0NUYNFcl4E0QR9IM/ < compute logs [2] http://paste.openstack.org/show/795431/ < controller logs [3] http://paste.openstack.org/show/HExQgBo4MDxItAEPNaRR/ [4] http://paste.openstack.org/show/795433/ < xml file for [5] https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/computeS... [6] https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/controll... On Tue, 30 Jun 2020 at 16:02, Arkady Shtempler <ashtempl@redhat.com> wrote:
Hi all!
I was able to analyze the attached log files and I hope that the results may help you understand what's going wrong with instance creation. You can find *Log_Tool's unique exported Error blocks* here: http://paste.openstack.org/show/795356/
*Some statistics and problematical messages:* ##### Statistics - Number of Errors/Warnings per Standard OSP log since: 2020-06-30 12:30:00 ##### Total_Number_Of_Errors --> 9 /home/ashtempl/Ruslanas/controller/neutron/server.log --> 1 /home/ashtempl/Ruslanas/compute/stdouts/ovn_controller.log --> 1 /home/ashtempl/Ruslanas/compute/nova/nova-compute.log --> 7
*nova-compute.log* *default default] Error launching a defined domain with XML: <domain type='kvm'>* 368-2020-06-30 12:30:10.815 7 *ERROR* nova.compute.manager [req-87bef18f-ad3d-4147-a1b3-196b5b64b688 7bdb8c3bf8004f98aae1b16d938ac09b 69134106b56941698e58c61... 70dc50f] Instance *failed* to spawn: *libvirt.libvirtError*: internal *error*: qemu unexpectedly closed the monitor: 2020-06-30T10:30:10.182675Z qemu-kvm: *error*: failed to set MSR 0... he monitor: 2020-06-30T10:30:10.182675Z *qemu-kvm: error: failed to set MSR 0x48e to 0xfff9fffe04006172* _msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' *failed*. [instance: 128f372c-cb2e-47d9-b1bf-ce17270dc50f] *Traceback* (most recent call last): 375-2020-06-30 12:30:10.815 7* ERROR* nova.compute.manager [instance: 128f372c-cb2e-47d9-b1bf-ce17270dc50f] File "/usr/lib/python3.6/site-packages/nova/vir...
*server.log * 5821c815-d213-498d-9394-fe25c6849918', 'status': 'failed', *'code': 422} returned with failed status*
*ovn_controller.log* 272-2020-06-30T12:30:10.126079625+02:00 stderr F 2020-06-30T10:30:10Z|00247|patch|WARN|*Bridge 'br-ex' not found for network 'datacentre'*
Thanks!
Compute nodes are baremetal or virtualized?, I've seen similar bug reports
> when using nested virtualization in other OSes. > baremetal. Dell R630 if to be VERY precise.
Thank you, I will try. I also modified a file, and it looked like it relaunched podman container once config was changed. Either way, if I understand Linux config correctly, the default value for user and group is root, if commented out: #user = "root" #group = "root"
also in some logs, I saw, that it detected, that it is not AMD CPU :) and it is really not AMD CPU.
Just for fun, it might be important, here is how my node info looks. ComputeS01Parameters: NovaReservedHostMemory: 16384 KernelArgs: "crashkernel=no rhgb" ComputeS01ExtraConfig: nova::cpu_allocation_ratio: 4.0 nova::compute::libvirt::rx_queue_size: 1024 nova::compute::libvirt::tx_queue_size: 1024 nova::compute::resume_guests_state_on_host_boot: true _______________________________________________
Hi All, I have one idea, why it might be the issue. during image creation step, I have hadd missing packets: pacemaker-remote osops-tools-monitoring-oschecks pacemaker pcs PCS thing can be found in HA repo, so I enabled it, but "osops-tools-monitoring-oschecks" ONLY in delorene for CentOS8... I believe that is a case... so it installed non CentOS8 maintained kvm or some dependent packages.... How can I get osops-tools-monitoring-oschecks from centos repos? it is last seen in CentOS7 repos.... $ yum list --enablerepo=* --disablerepo "c7-media" | grep osops-tools-monitoring-oschecks -A2 osops-tools-monitoring-oschecks.noarch 0.0.1-0.20191202171903.bafe3f0.el7 rdo-trunk-train-tested ostree-debuginfo.x86_64 2019.1-2.el7 base-debuginfo (undercloud) [stack@ironic-poc ~]$ can I somehow not include that package in image creation? OR if it is essential, can I create a different repo for that one? On Wed, 1 Jul 2020 at 14:19, Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
Hi all!
Here we go, we are in the second part of this interesting troubleshooting!
1) I have LogTool setup.Thank you Arkady.
2) I have user OSP to create instance, and I have used virsh to create instance. 2.1) OSP way is failing in either way, if it is volume-based or image-based, it is failing either way.. [1] and [2] 2.2) when I create it using CLI: [0] [3]
any ideas what can be wrong? What options I should choose? I have one network/vlan for whole cloud. I am doing proof of concept of remote booting, so I do not have br-ex setup. and I do not have br-provider.
There is my compute[5] and controller[6] yaml files, Please help, how it should look like so it would have br-ex and br-int connected? as br-int now is in UNKNOWN state. And br-ex do not exist. As I understand, in roles data yaml, when we have tag external it should create br-ex? or am I wrong?
[0] http://paste.openstack.org/show/Rdou7nvEWMxpGECfQHVm/ VM is running. [1] http://paste.openstack.org/show/tp8P0NUYNFcl4E0QR9IM/ < compute logs [2] http://paste.openstack.org/show/795431/ < controller logs [3] http://paste.openstack.org/show/HExQgBo4MDxItAEPNaRR/ [4] http://paste.openstack.org/show/795433/ < xml file for [5] https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/computeS... [6] https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/controll...
On Tue, 30 Jun 2020 at 16:02, Arkady Shtempler <ashtempl@redhat.com> wrote:
Hi all!
I was able to analyze the attached log files and I hope that the results may help you understand what's going wrong with instance creation. You can find *Log_Tool's unique exported Error blocks* here: http://paste.openstack.org/show/795356/
*Some statistics and problematical messages:* ##### Statistics - Number of Errors/Warnings per Standard OSP log since: 2020-06-30 12:30:00 ##### Total_Number_Of_Errors --> 9 /home/ashtempl/Ruslanas/controller/neutron/server.log --> 1 /home/ashtempl/Ruslanas/compute/stdouts/ovn_controller.log --> 1 /home/ashtempl/Ruslanas/compute/nova/nova-compute.log --> 7
*nova-compute.log* *default default] Error launching a defined domain with XML: <domain type='kvm'>* 368-2020-06-30 12:30:10.815 7 *ERROR* nova.compute.manager [req-87bef18f-ad3d-4147-a1b3-196b5b64b688 7bdb8c3bf8004f98aae1b16d938ac09b 69134106b56941698e58c61... 70dc50f] Instance *failed* to spawn: *libvirt.libvirtError*: internal *error*: qemu unexpectedly closed the monitor: 2020-06-30T10:30:10.182675Z qemu-kvm: *error*: failed to set MSR 0... he monitor: 2020-06-30T10:30:10.182675Z *qemu-kvm: error: failed to set MSR 0x48e to 0xfff9fffe04006172* _msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' *failed*. [instance: 128f372c-cb2e-47d9-b1bf-ce17270dc50f] *Traceback* (most recent call last): 375-2020-06-30 12:30:10.815 7* ERROR* nova.compute.manager [instance: 128f372c-cb2e-47d9-b1bf-ce17270dc50f] File "/usr/lib/python3.6/site-packages/nova/vir...
*server.log * 5821c815-d213-498d-9394-fe25c6849918', 'status': 'failed', *'code': 422} returned with failed status*
*ovn_controller.log* 272-2020-06-30T12:30:10.126079625+02:00 stderr F 2020-06-30T10:30:10Z|00247|patch|WARN|*Bridge 'br-ex' not found for network 'datacentre'*
Thanks!
Compute nodes are baremetal or virtualized?, I've seen similar bug
>> reports when using nested virtualization in other OSes. >> > baremetal. Dell R630 if to be VERY precise.
Thank you, I will try. I also modified a file, and it looked like it relaunched podman container once config was changed. Either way, if I understand Linux config correctly, the default value for user and group is root, if commented out: #user = "root" #group = "root"
also in some logs, I saw, that it detected, that it is not AMD CPU :) and it is really not AMD CPU.
Just for fun, it might be important, here is how my node info looks. ComputeS01Parameters: NovaReservedHostMemory: 16384 KernelArgs: "crashkernel=no rhgb" ComputeS01ExtraConfig: nova::cpu_allocation_ratio: 4.0 nova::compute::libvirt::rx_queue_size: 1024 nova::compute::libvirt::tx_queue_size: 1024 nova::compute::resume_guests_state_on_host_boot: true _______________________________________________
-- Ruslanas Gžibovskis +370 6030 7030
by the way in CentOS8, here is an error message I receive when searching around [stack@rdo-u ~]$ dnf list --enablerepo="*" --disablerepo "c8-media-BaseOS,c8-media-AppStream" | grep osops-tools-monitoring-oschecks Errors during downloading metadata for repository 'rdo-trunk-ussuri-tested': - Status code: 403 for https://trunk.rdoproject.org/centos8-ussuri/current-passed-ci/repodata/repom... (IP: 3.87.151.16) Error: Failed to download metadata for repo 'rdo-trunk-ussuri-tested': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried [stack@rdo-u ~]$ On Thu, 2 Jul 2020 at 15:56, Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
Hi All,
I have one idea, why it might be the issue.
during image creation step, I have hadd missing packets: pacemaker-remote osops-tools-monitoring-oschecks pacemaker pcs PCS thing can be found in HA repo, so I enabled it, but "osops-tools-monitoring-oschecks" ONLY in delorene for CentOS8...
I believe that is a case... so it installed non CentOS8 maintained kvm or some dependent packages....
How can I get osops-tools-monitoring-oschecks from centos repos? it is last seen in CentOS7 repos....
$ yum list --enablerepo=* --disablerepo "c7-media" | grep osops-tools-monitoring-oschecks -A2 osops-tools-monitoring-oschecks.noarch 0.0.1-0.20191202171903.bafe3f0.el7
rdo-trunk-train-tested ostree-debuginfo.x86_64 2019.1-2.el7 base-debuginfo (undercloud) [stack@ironic-poc ~]$
can I somehow not include that package in image creation? OR if it is essential, can I create a different repo for that one?
On Wed, 1 Jul 2020 at 14:19, Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
Hi all!
Here we go, we are in the second part of this interesting troubleshooting!
1) I have LogTool setup.Thank you Arkady.
2) I have user OSP to create instance, and I have used virsh to create instance. 2.1) OSP way is failing in either way, if it is volume-based or image-based, it is failing either way.. [1] and [2] 2.2) when I create it using CLI: [0] [3]
any ideas what can be wrong? What options I should choose? I have one network/vlan for whole cloud. I am doing proof of concept of remote booting, so I do not have br-ex setup. and I do not have br-provider.
There is my compute[5] and controller[6] yaml files, Please help, how it should look like so it would have br-ex and br-int connected? as br-int now is in UNKNOWN state. And br-ex do not exist. As I understand, in roles data yaml, when we have tag external it should create br-ex? or am I wrong?
[0] http://paste.openstack.org/show/Rdou7nvEWMxpGECfQHVm/ VM is running. [1] http://paste.openstack.org/show/tp8P0NUYNFcl4E0QR9IM/ < compute logs [2] http://paste.openstack.org/show/795431/ < controller logs [3] http://paste.openstack.org/show/HExQgBo4MDxItAEPNaRR/ [4] http://paste.openstack.org/show/795433/ < xml file for [5] https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/computeS... [6] https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/controll...
On Tue, 30 Jun 2020 at 16:02, Arkady Shtempler <ashtempl@redhat.com> wrote:
Hi all!
I was able to analyze the attached log files and I hope that the results may help you understand what's going wrong with instance creation. You can find *Log_Tool's unique exported Error blocks* here: http://paste.openstack.org/show/795356/
*Some statistics and problematical messages:* ##### Statistics - Number of Errors/Warnings per Standard OSP log since: 2020-06-30 12:30:00 ##### Total_Number_Of_Errors --> 9 /home/ashtempl/Ruslanas/controller/neutron/server.log --> 1 /home/ashtempl/Ruslanas/compute/stdouts/ovn_controller.log --> 1 /home/ashtempl/Ruslanas/compute/nova/nova-compute.log --> 7
*nova-compute.log* *default default] Error launching a defined domain with XML: <domain type='kvm'>* 368-2020-06-30 12:30:10.815 7 *ERROR* nova.compute.manager [req-87bef18f-ad3d-4147-a1b3-196b5b64b688 7bdb8c3bf8004f98aae1b16d938ac09b 69134106b56941698e58c61... 70dc50f] Instance *failed* to spawn: *libvirt.libvirtError*: internal *error*: qemu unexpectedly closed the monitor: 2020-06-30T10:30:10.182675Z qemu-kvm: *error*: failed to set MSR 0... he monitor: 2020-06-30T10:30:10.182675Z *qemu-kvm: error: failed to set MSR 0x48e to 0xfff9fffe04006172* _msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' *failed*. [instance: 128f372c-cb2e-47d9-b1bf-ce17270dc50f] *Traceback* (most recent call last): 375-2020-06-30 12:30:10.815 7* ERROR* nova.compute.manager [instance: 128f372c-cb2e-47d9-b1bf-ce17270dc50f] File "/usr/lib/python3.6/site-packages/nova/vir...
*server.log * 5821c815-d213-498d-9394-fe25c6849918', 'status': 'failed', *'code': 422} returned with failed status*
*ovn_controller.log* 272-2020-06-30T12:30:10.126079625+02:00 stderr F 2020-06-30T10:30:10Z|00247|patch|WARN|*Bridge 'br-ex' not found for network 'datacentre'*
Thanks!
Compute nodes are baremetal or virtualized?, I've seen similar bug
>>> reports when using nested virtualization in other OSes. >>> >> baremetal. Dell R630 if to be VERY precise. > > Thank you, I will try. I also modified a file, and it looked like it > relaunched podman container once config was changed. Either way, if I > understand Linux config correctly, the default value for user and group is > root, if commented out: > #user = "root" > #group = "root" > > also in some logs, I saw, that it detected, that it is not AMD CPU > :) and it is really not AMD CPU. > > > Just for fun, it might be important, here is how my node info looks. > ComputeS01Parameters: > NovaReservedHostMemory: 16384 > KernelArgs: "crashkernel=no rhgb" > ComputeS01ExtraConfig: > nova::cpu_allocation_ratio: 4.0 > nova::compute::libvirt::rx_queue_size: 1024 > nova::compute::libvirt::tx_queue_size: 1024 > nova::compute::resume_guests_state_on_host_boot: true > _______________________________________________ > >
-- Ruslanas Gžibovskis +370 6030 7030
-- Ruslanas Gžibovskis +370 6030 7030
current-passed-ci is not a valid repo. https://trunk.rdoproject.org/centos8-ussuri/ How are you configuring these repos? On Thu, Jul 2, 2020 at 7:59 AM Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
by the way in CentOS8, here is an error message I receive when searching around
[stack@rdo-u ~]$ dnf list --enablerepo="*" --disablerepo "c8-media-BaseOS,c8-media-AppStream" | grep osops-tools-monitoring-oschecks Errors during downloading metadata for repository 'rdo-trunk-ussuri-tested': - Status code: 403 for https://trunk.rdoproject.org/centos8-ussuri/current-passed-ci/repodata/repom... (IP: 3.87.151.16) Error: Failed to download metadata for repo 'rdo-trunk-ussuri-tested': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried [stack@rdo-u ~]$
On Thu, 2 Jul 2020 at 15:56, Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
Hi All,
I have one idea, why it might be the issue.
during image creation step, I have hadd missing packets: pacemaker-remote osops-tools-monitoring-oschecks pacemaker pcs PCS thing can be found in HA repo, so I enabled it, but "osops-tools-monitoring-oschecks" ONLY in delorene for CentOS8...
I believe that is a case... so it installed non CentOS8 maintained kvm or some dependent packages....
How can I get osops-tools-monitoring-oschecks from centos repos? it is last seen in CentOS7 repos....
$ yum list --enablerepo=* --disablerepo "c7-media" | grep osops-tools-monitoring-oschecks -A2 osops-tools-monitoring-oschecks.noarch 0.0.1-0.20191202171903.bafe3f0.el7 rdo-trunk-train-tested ostree-debuginfo.x86_64 2019.1-2.el7 base-debuginfo (undercloud) [stack@ironic-poc ~]$
can I somehow not include that package in image creation? OR if it is essential, can I create a different repo for that one?
On Wed, 1 Jul 2020 at 14:19, Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
Hi all!
Here we go, we are in the second part of this interesting troubleshooting!
1) I have LogTool setup.Thank you Arkady.
2) I have user OSP to create instance, and I have used virsh to create instance. 2.1) OSP way is failing in either way, if it is volume-based or image-based, it is failing either way.. [1] and [2] 2.2) when I create it using CLI: [0] [3]
any ideas what can be wrong? What options I should choose? I have one network/vlan for whole cloud. I am doing proof of concept of remote booting, so I do not have br-ex setup. and I do not have br-provider.
There is my compute[5] and controller[6] yaml files, Please help, how it should look like so it would have br-ex and br-int connected? as br-int now is in UNKNOWN state. And br-ex do not exist. As I understand, in roles data yaml, when we have tag external it should create br-ex? or am I wrong?
[0] http://paste.openstack.org/show/Rdou7nvEWMxpGECfQHVm/ VM is running. [1] http://paste.openstack.org/show/tp8P0NUYNFcl4E0QR9IM/ < compute logs [2] http://paste.openstack.org/show/795431/ < controller logs [3] http://paste.openstack.org/show/HExQgBo4MDxItAEPNaRR/ [4] http://paste.openstack.org/show/795433/ < xml file for [5] https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/computeS... [6] https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/controll...
On Tue, 30 Jun 2020 at 16:02, Arkady Shtempler <ashtempl@redhat.com> wrote:
Hi all!
I was able to analyze the attached log files and I hope that the results may help you understand what's going wrong with instance creation. You can find Log_Tool's unique exported Error blocks here: http://paste.openstack.org/show/795356/
Some statistics and problematical messages: ##### Statistics - Number of Errors/Warnings per Standard OSP log since: 2020-06-30 12:30:00 ##### Total_Number_Of_Errors --> 9 /home/ashtempl/Ruslanas/controller/neutron/server.log --> 1 /home/ashtempl/Ruslanas/compute/stdouts/ovn_controller.log --> 1 /home/ashtempl/Ruslanas/compute/nova/nova-compute.log --> 7
nova-compute.log default default] Error launching a defined domain with XML: <domain type='kvm'> 368-2020-06-30 12:30:10.815 7 ERROR nova.compute.manager [req-87bef18f-ad3d-4147-a1b3-196b5b64b688 7bdb8c3bf8004f98aae1b16d938ac09b 69134106b56941698e58c61... 70dc50f] Instance failed to spawn: libvirt.libvirtError: internal error: qemu unexpectedly closed the monitor: 2020-06-30T10:30:10.182675Z qemu-kvm: error: failed to set MSR 0... he monitor: 2020-06-30T10:30:10.182675Z qemu-kvm: error: failed to set MSR 0x48e to 0xfff9fffe04006172 _msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. [instance: 128f372c-cb2e-47d9-b1bf-ce17270dc50f] Traceback (most recent call last): 375-2020-06-30 12:30:10.815 7 ERROR nova.compute.manager [instance: 128f372c-cb2e-47d9-b1bf-ce17270dc50f] File "/usr/lib/python3.6/site-packages/nova/vir...
server.log 5821c815-d213-498d-9394-fe25c6849918', 'status': 'failed', 'code': 422} returned with failed status
ovn_controller.log 272-2020-06-30T12:30:10.126079625+02:00 stderr F 2020-06-30T10:30:10Z|00247|patch|WARN|Bridge 'br-ex' not found for network 'datacentre'
Thanks!
>>>> Compute nodes are baremetal or virtualized?, I've seen similar bug reports when using nested virtualization in other OSes. >> >> baremetal. Dell R630 if to be VERY precise. >> >> Thank you, I will try. I also modified a file, and it looked like it relaunched podman container once config was changed. Either way, if I understand Linux config correctly, the default value for user and group is root, if commented out: >> #user = "root" >> #group = "root" >> >> also in some logs, I saw, that it detected, that it is not AMD CPU :) and it is really not AMD CPU. >> >> >> Just for fun, it might be important, here is how my node info looks. >> ComputeS01Parameters: >> NovaReservedHostMemory: 16384 >> KernelArgs: "crashkernel=no rhgb" >> ComputeS01ExtraConfig: >> nova::cpu_allocation_ratio: 4.0 >> nova::compute::libvirt::rx_queue_size: 1024 >> nova::compute::libvirt::tx_queue_size: 1024 >> nova::compute::resume_guests_state_on_host_boot: true >> _______________________________________________ >>
-- Ruslanas Gžibovskis +370 6030 7030
-- Ruslanas Gžibovskis +370 6030 7030 _______________________________________________ users mailing list users@lists.rdoproject.org http://lists.rdoproject.org/mailman/listinfo/users
To unsubscribe: users-unsubscribe@lists.rdoproject.org
On Thu, Jul 2, 2020 at 3:59 PM Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
by the way in CentOS8, here is an error message I receive when searching around
[stack@rdo-u ~]$ dnf list --enablerepo="*" --disablerepo "c8-media-BaseOS,c8-media-AppStream" | grep osops-tools-monitoring-oschecks Errors during downloading metadata for repository 'rdo-trunk-ussuri-tested': - Status code: 403 for https://trunk.rdoproject.org/centos8-ussuri/current-passed-ci/repodata/repom... (IP: 3.87.151.16) Error: Failed to download metadata for repo 'rdo-trunk-ussuri-tested': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried [stack@rdo-u ~]$
Yep, rdo-trunk-ussuri-tested repo included in the release rpm is disabled by default and not longer usable (i'll send a patch to retire it), don't enable it. Sorry, I'm not sure how adding osops-tools-monitoring-oschecks may lead to install CentOS8 maintained kvm. BTW, i think that package should not be required in CentOS8: https://opendev.org/openstack/tripleo-puppet-elements/commit/2d2bc4d8b20304d...
On Thu, 2 Jul 2020 at 15:56, Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
Hi All,
I have one idea, why it might be the issue.
during image creation step, I have hadd missing packets: pacemaker-remote osops-tools-monitoring-oschecks pacemaker pcs PCS thing can be found in HA repo, so I enabled it, but "osops-tools-monitoring-oschecks" ONLY in delorene for CentOS8...
I believe that is a case... so it installed non CentOS8 maintained kvm or some dependent packages....
How can I get osops-tools-monitoring-oschecks from centos repos? it is last seen in CentOS7 repos....
$ yum list --enablerepo=* --disablerepo "c7-media" | grep osops-tools-monitoring-oschecks -A2 osops-tools-monitoring-oschecks.noarch 0.0.1-0.20191202171903.bafe3f0.el7
rdo-trunk-train-tested ostree-debuginfo.x86_64 2019.1-2.el7 base-debuginfo (undercloud) [stack@ironic-poc ~]$
can I somehow not include that package in image creation? OR if it is essential, can I create a different repo for that one?
On Wed, 1 Jul 2020 at 14:19, Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
Hi all!
Here we go, we are in the second part of this interesting troubleshooting!
1) I have LogTool setup.Thank you Arkady.
2) I have user OSP to create instance, and I have used virsh to create instance. 2.1) OSP way is failing in either way, if it is volume-based or image-based, it is failing either way.. [1] and [2] 2.2) when I create it using CLI: [0] [3]
any ideas what can be wrong? What options I should choose? I have one network/vlan for whole cloud. I am doing proof of concept of remote booting, so I do not have br-ex setup. and I do not have br-provider.
There is my compute[5] and controller[6] yaml files, Please help, how it should look like so it would have br-ex and br-int connected? as br-int now is in UNKNOWN state. And br-ex do not exist. As I understand, in roles data yaml, when we have tag external it should create br-ex? or am I wrong?
[0] http://paste.openstack.org/show/Rdou7nvEWMxpGECfQHVm/ VM is running. [1] http://paste.openstack.org/show/tp8P0NUYNFcl4E0QR9IM/ < compute logs [2] http://paste.openstack.org/show/795431/ < controller logs [3] http://paste.openstack.org/show/HExQgBo4MDxItAEPNaRR/ [4] http://paste.openstack.org/show/795433/ < xml file for [5] https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/computeS... [6] https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/controll...
On Tue, 30 Jun 2020 at 16:02, Arkady Shtempler <ashtempl@redhat.com> wrote:
Hi all!
I was able to analyze the attached log files and I hope that the results may help you understand what's going wrong with instance creation. You can find *Log_Tool's unique exported Error blocks* here: http://paste.openstack.org/show/795356/
*Some statistics and problematical messages:* ##### Statistics - Number of Errors/Warnings per Standard OSP log since: 2020-06-30 12:30:00 ##### Total_Number_Of_Errors --> 9 /home/ashtempl/Ruslanas/controller/neutron/server.log --> 1 /home/ashtempl/Ruslanas/compute/stdouts/ovn_controller.log --> 1 /home/ashtempl/Ruslanas/compute/nova/nova-compute.log --> 7
*nova-compute.log* *default default] Error launching a defined domain with XML: <domain type='kvm'>* 368-2020-06-30 12:30:10.815 7 *ERROR* nova.compute.manager [req-87bef18f-ad3d-4147-a1b3-196b5b64b688 7bdb8c3bf8004f98aae1b16d938ac09b 69134106b56941698e58c61... 70dc50f] Instance *failed* to spawn: *libvirt.libvirtError*: internal *error*: qemu unexpectedly closed the monitor: 2020-06-30T10:30:10.182675Z qemu-kvm: *error*: failed to set MSR 0... he monitor: 2020-06-30T10:30:10.182675Z *qemu-kvm: error: failed to set MSR 0x48e to 0xfff9fffe04006172* _msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' *failed*. [instance: 128f372c-cb2e-47d9-b1bf-ce17270dc50f] *Traceback* (most recent call last): 375-2020-06-30 12:30:10.815 7* ERROR* nova.compute.manager [instance: 128f372c-cb2e-47d9-b1bf-ce17270dc50f] File "/usr/lib/python3.6/site-packages/nova/vir...
*server.log * 5821c815-d213-498d-9394-fe25c6849918', 'status': 'failed', *'code': 422} returned with failed status*
*ovn_controller.log* 272-2020-06-30T12:30:10.126079625+02:00 stderr F 2020-06-30T10:30:10Z|00247|patch|WARN|*Bridge 'br-ex' not found for network 'datacentre'*
Thanks!
Compute nodes are baremetal or virtualized?, I've seen similar bug
>>>> reports when using nested virtualization in other OSes. >>>> >>> baremetal. Dell R630 if to be VERY precise. >> >> Thank you, I will try. I also modified a file, and it looked like >> it relaunched podman container once config was changed. Either way, if I >> understand Linux config correctly, the default value for user and group is >> root, if commented out: >> #user = "root" >> #group = "root" >> >> also in some logs, I saw, that it detected, that it is not AMD CPU >> :) and it is really not AMD CPU. >> >> >> Just for fun, it might be important, here is how my node info looks. >> ComputeS01Parameters: >> NovaReservedHostMemory: 16384 >> KernelArgs: "crashkernel=no rhgb" >> ComputeS01ExtraConfig: >> nova::cpu_allocation_ratio: 4.0 >> nova::compute::libvirt::rx_queue_size: 1024 >> nova::compute::libvirt::tx_queue_size: 1024 >> nova::compute::resume_guests_state_on_host_boot: true >> _______________________________________________ >> >>
-- Ruslanas Gžibovskis +370 6030 7030
-- Ruslanas Gžibovskis +370 6030 7030 _______________________________________________ users mailing list users@lists.rdoproject.org http://lists.rdoproject.org/mailman/listinfo/users
To unsubscribe: users-unsubscribe@lists.rdoproject.org
it is, i have image build failing. i can modify yaml used to create image. can you remind me which files it would be? and your question, "how it can impact kvm": in image most of the packages get deployed from deloren repos. I believe part is from centos repos and part of whole packages in overcloud-full.qcow2 are from deloren. so it might have bit different minor version, that might be incompactible... at least it have happend for me previously with train release so i used tested ci fully from the beginning... I might be for sure wrong. On Thu, 2 Jul 2020, 17:18 Alfredo Moralejo Alonso, <amoralej@redhat.com> wrote:
On Thu, Jul 2, 2020 at 3:59 PM Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
by the way in CentOS8, here is an error message I receive when searching around
[stack@rdo-u ~]$ dnf list --enablerepo="*" --disablerepo "c8-media-BaseOS,c8-media-AppStream" | grep osops-tools-monitoring-oschecks Errors during downloading metadata for repository 'rdo-trunk-ussuri-tested': - Status code: 403 for https://trunk.rdoproject.org/centos8-ussuri/current-passed-ci/repodata/repom... (IP: 3.87.151.16) Error: Failed to download metadata for repo 'rdo-trunk-ussuri-tested': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried [stack@rdo-u ~]$
Yep, rdo-trunk-ussuri-tested repo included in the release rpm is disabled by default and not longer usable (i'll send a patch to retire it), don't enable it.
Sorry, I'm not sure how adding osops-tools-monitoring-oschecks may lead to install CentOS8 maintained kvm. BTW, i think that package should not be required in CentOS8:
https://opendev.org/openstack/tripleo-puppet-elements/commit/2d2bc4d8b20304d...
On Thu, 2 Jul 2020 at 15:56, Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
Hi All,
I have one idea, why it might be the issue.
during image creation step, I have hadd missing packets: pacemaker-remote osops-tools-monitoring-oschecks pacemaker pcs PCS thing can be found in HA repo, so I enabled it, but "osops-tools-monitoring-oschecks" ONLY in delorene for CentOS8...
I believe that is a case... so it installed non CentOS8 maintained kvm or some dependent packages....
How can I get osops-tools-monitoring-oschecks from centos repos? it is last seen in CentOS7 repos....
$ yum list --enablerepo=* --disablerepo "c7-media" | grep osops-tools-monitoring-oschecks -A2 osops-tools-monitoring-oschecks.noarch 0.0.1-0.20191202171903.bafe3f0.el7
rdo-trunk-train-tested ostree-debuginfo.x86_64 2019.1-2.el7 base-debuginfo (undercloud) [stack@ironic-poc ~]$
can I somehow not include that package in image creation? OR if it is essential, can I create a different repo for that one?
On Wed, 1 Jul 2020 at 14:19, Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
Hi all!
Here we go, we are in the second part of this interesting troubleshooting!
1) I have LogTool setup.Thank you Arkady.
2) I have user OSP to create instance, and I have used virsh to create instance. 2.1) OSP way is failing in either way, if it is volume-based or image-based, it is failing either way.. [1] and [2] 2.2) when I create it using CLI: [0] [3]
any ideas what can be wrong? What options I should choose? I have one network/vlan for whole cloud. I am doing proof of concept of remote booting, so I do not have br-ex setup. and I do not have br-provider.
There is my compute[5] and controller[6] yaml files, Please help, how it should look like so it would have br-ex and br-int connected? as br-int now is in UNKNOWN state. And br-ex do not exist. As I understand, in roles data yaml, when we have tag external it should create br-ex? or am I wrong?
[0] http://paste.openstack.org/show/Rdou7nvEWMxpGECfQHVm/ VM is running. [1] http://paste.openstack.org/show/tp8P0NUYNFcl4E0QR9IM/ < compute logs [2] http://paste.openstack.org/show/795431/ < controller logs [3] http://paste.openstack.org/show/HExQgBo4MDxItAEPNaRR/ [4] http://paste.openstack.org/show/795433/ < xml file for [5] https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/computeS... [6] https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/controll...
On Tue, 30 Jun 2020 at 16:02, Arkady Shtempler <ashtempl@redhat.com> wrote:
Hi all!
I was able to analyze the attached log files and I hope that the results may help you understand what's going wrong with instance creation. You can find *Log_Tool's unique exported Error blocks* here: http://paste.openstack.org/show/795356/
*Some statistics and problematical messages:* ##### Statistics - Number of Errors/Warnings per Standard OSP log since: 2020-06-30 12:30:00 ##### Total_Number_Of_Errors --> 9 /home/ashtempl/Ruslanas/controller/neutron/server.log --> 1 /home/ashtempl/Ruslanas/compute/stdouts/ovn_controller.log --> 1 /home/ashtempl/Ruslanas/compute/nova/nova-compute.log --> 7
*nova-compute.log* *default default] Error launching a defined domain with XML: <domain type='kvm'>* 368-2020-06-30 12:30:10.815 7 *ERROR* nova.compute.manager [req-87bef18f-ad3d-4147-a1b3-196b5b64b688 7bdb8c3bf8004f98aae1b16d938ac09b 69134106b56941698e58c61... 70dc50f] Instance *failed* to spawn: *libvirt.libvirtError*: internal *error*: qemu unexpectedly closed the monitor: 2020-06-30T10:30:10.182675Z qemu-kvm: *error*: failed to set MSR 0... he monitor: 2020-06-30T10:30:10.182675Z *qemu-kvm: error: failed to set MSR 0x48e to 0xfff9fffe04006172* _msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' *failed*. [instance: 128f372c-cb2e-47d9-b1bf-ce17270dc50f] *Traceback* (most recent call last): 375-2020-06-30 12:30:10.815 7* ERROR* nova.compute.manager [instance: 128f372c-cb2e-47d9-b1bf-ce17270dc50f] File "/usr/lib/python3.6/site-packages/nova/vir...
*server.log * 5821c815-d213-498d-9394-fe25c6849918', 'status': 'failed', *'code': 422} returned with failed status*
*ovn_controller.log* 272-2020-06-30T12:30:10.126079625+02:00 stderr F 2020-06-30T10:30:10Z|00247|patch|WARN|*Bridge 'br-ex' not found for network 'datacentre'*
Thanks!
Compute nodes are baremetal or virtualized?, I've seen similar bug
>>>>> reports when using nested virtualization in other OSes. >>>>> >>>> baremetal. Dell R630 if to be VERY precise. >>> >>> Thank you, I will try. I also modified a file, and it looked like >>> it relaunched podman container once config was changed. Either way, if I >>> understand Linux config correctly, the default value for user and group is >>> root, if commented out: >>> #user = "root" >>> #group = "root" >>> >>> also in some logs, I saw, that it detected, that it is not AMD CPU >>> :) and it is really not AMD CPU. >>> >>> >>> Just for fun, it might be important, here is how my node info >>> looks. >>> ComputeS01Parameters: >>> NovaReservedHostMemory: 16384 >>> KernelArgs: "crashkernel=no rhgb" >>> ComputeS01ExtraConfig: >>> nova::cpu_allocation_ratio: 4.0 >>> nova::compute::libvirt::rx_queue_size: 1024 >>> nova::compute::libvirt::tx_queue_size: 1024 >>> nova::compute::resume_guests_state_on_host_boot: true >>> _______________________________________________ >>> >>>
-- Ruslanas Gžibovskis +370 6030 7030
-- Ruslanas Gžibovskis +370 6030 7030 _______________________________________________ users mailing list users@lists.rdoproject.org http://lists.rdoproject.org/mailman/listinfo/users
To unsubscribe: users-unsubscribe@lists.rdoproject.org
On Thu, Jul 2, 2020 at 4:38 PM Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
it is, i have image build failing. i can modify yaml used to create image. can you remind me which files it would be?
Right, I see that the patch must not be working fine for centos and the package is being installed from delorean repos in the log. I guess it needs an entry to cover the centos 8 case (i'm checking with opstools maintainer). As workaround I'd propose you to use the package from: https://trunk.rdoproject.org/centos8-ussuri/component/cloudops/current-tripl... or alternatively applying some local patch to tripleo-puppet-elements.
and your question, "how it can impact kvm":
in image most of the packages get deployed from deloren repos. I believe part is from centos repos and part of whole packages in overcloud-full.qcow2 are from deloren. so it might have bit different minor version, that might be incompactible... at least it have happend for me previously with train release so i used tested ci fully from the beginning... I might be for sure wrong.
Delorean repos contain only OpenStack packages, things like nova, etc... not kvm or things included in CentOS repos. KVM will always installed which should be installed from "Advanced Virtualization" repository. May you check what versions of qemu-kvm and libvirt you got installed into the overcloud-full image?, it should match with the versions in: http://mirror.centos.org/centos/8/virt/x86_64/advanced-virtualization/Packag... like qemu-kvm-4.2.0-19.el8.x86_64.rpm and libvirt-6.0.0-17.el8.x86_64.rpm
On Thu, 2 Jul 2020, 17:18 Alfredo Moralejo Alonso, <amoralej@redhat.com> wrote:
On Thu, Jul 2, 2020 at 3:59 PM Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
by the way in CentOS8, here is an error message I receive when searching around
[stack@rdo-u ~]$ dnf list --enablerepo="*" --disablerepo "c8-media-BaseOS,c8-media-AppStream" | grep osops-tools-monitoring-oschecks Errors during downloading metadata for repository 'rdo-trunk-ussuri-tested': - Status code: 403 for https://trunk.rdoproject.org/centos8-ussuri/current-passed-ci/repodata/repom... (IP: 3.87.151.16) Error: Failed to download metadata for repo 'rdo-trunk-ussuri-tested': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried [stack@rdo-u ~]$
Yep, rdo-trunk-ussuri-tested repo included in the release rpm is disabled by default and not longer usable (i'll send a patch to retire it), don't enable it.
Sorry, I'm not sure how adding osops-tools-monitoring-oschecks may lead to install CentOS8 maintained kvm. BTW, i think that package should not be required in CentOS8:
https://opendev.org/openstack/tripleo-puppet-elements/commit/2d2bc4d8b20304d...
On Thu, 2 Jul 2020 at 15:56, Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
Hi All,
I have one idea, why it might be the issue.
during image creation step, I have hadd missing packets: pacemaker-remote osops-tools-monitoring-oschecks pacemaker pcs PCS thing can be found in HA repo, so I enabled it, but "osops-tools-monitoring-oschecks" ONLY in delorene for CentOS8...
I believe that is a case... so it installed non CentOS8 maintained kvm or some dependent packages....
How can I get osops-tools-monitoring-oschecks from centos repos? it is last seen in CentOS7 repos....
$ yum list --enablerepo=* --disablerepo "c7-media" | grep osops-tools-monitoring-oschecks -A2 osops-tools-monitoring-oschecks.noarch 0.0.1-0.20191202171903.bafe3f0.el7
rdo-trunk-train-tested ostree-debuginfo.x86_64 2019.1-2.el7 base-debuginfo (undercloud) [stack@ironic-poc ~]$
can I somehow not include that package in image creation? OR if it is essential, can I create a different repo for that one?
On Wed, 1 Jul 2020 at 14:19, Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
Hi all!
Here we go, we are in the second part of this interesting troubleshooting!
1) I have LogTool setup.Thank you Arkady.
2) I have user OSP to create instance, and I have used virsh to create instance. 2.1) OSP way is failing in either way, if it is volume-based or image-based, it is failing either way.. [1] and [2] 2.2) when I create it using CLI: [0] [3]
any ideas what can be wrong? What options I should choose? I have one network/vlan for whole cloud. I am doing proof of concept of remote booting, so I do not have br-ex setup. and I do not have br-provider.
There is my compute[5] and controller[6] yaml files, Please help, how it should look like so it would have br-ex and br-int connected? as br-int now is in UNKNOWN state. And br-ex do not exist. As I understand, in roles data yaml, when we have tag external it should create br-ex? or am I wrong?
[0] http://paste.openstack.org/show/Rdou7nvEWMxpGECfQHVm/ VM is running. [1] http://paste.openstack.org/show/tp8P0NUYNFcl4E0QR9IM/ < compute logs [2] http://paste.openstack.org/show/795431/ < controller logs [3] http://paste.openstack.org/show/HExQgBo4MDxItAEPNaRR/ [4] http://paste.openstack.org/show/795433/ < xml file for [5] https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/computeS... [6] https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/controll...
On Tue, 30 Jun 2020 at 16:02, Arkady Shtempler <ashtempl@redhat.com> wrote:
Hi all!
I was able to analyze the attached log files and I hope that the results may help you understand what's going wrong with instance creation. You can find *Log_Tool's unique exported Error blocks* here: http://paste.openstack.org/show/795356/
*Some statistics and problematical messages:* ##### Statistics - Number of Errors/Warnings per Standard OSP log since: 2020-06-30 12:30:00 ##### Total_Number_Of_Errors --> 9 /home/ashtempl/Ruslanas/controller/neutron/server.log --> 1 /home/ashtempl/Ruslanas/compute/stdouts/ovn_controller.log --> 1 /home/ashtempl/Ruslanas/compute/nova/nova-compute.log --> 7
*nova-compute.log* *default default] Error launching a defined domain with XML: <domain type='kvm'>* 368-2020-06-30 12:30:10.815 7 *ERROR* nova.compute.manager [req-87bef18f-ad3d-4147-a1b3-196b5b64b688 7bdb8c3bf8004f98aae1b16d938ac09b 69134106b56941698e58c61... 70dc50f] Instance *failed* to spawn: *libvirt.libvirtError*: internal *error*: qemu unexpectedly closed the monitor: 2020-06-30T10:30:10.182675Z qemu-kvm: *error*: failed to set MSR 0... he monitor: 2020-06-30T10:30:10.182675Z *qemu-kvm: error: failed to set MSR 0x48e to 0xfff9fffe04006172* _msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' *failed*. [instance: 128f372c-cb2e-47d9-b1bf-ce17270dc50f] *Traceback* (most recent call last): 375-2020-06-30 12:30:10.815 7* ERROR* nova.compute.manager [instance: 128f372c-cb2e-47d9-b1bf-ce17270dc50f] File "/usr/lib/python3.6/site-packages/nova/vir...
*server.log * 5821c815-d213-498d-9394-fe25c6849918', 'status': 'failed', *'code': 422} returned with failed status*
*ovn_controller.log* 272-2020-06-30T12:30:10.126079625+02:00 stderr F 2020-06-30T10:30:10Z|00247|patch|WARN|*Bridge 'br-ex' not found for network 'datacentre'*
Thanks!
Compute nodes are baremetal or virtualized?, I've seen similar bug >>>>>> reports when using nested virtualization in other OSes. >>>>>> >>>>> baremetal. Dell R630 if to be VERY precise. >>>> >>>> Thank you, I will try. I also modified a file, and it looked like >>>> it relaunched podman container once config was changed. Either way, if I >>>> understand Linux config correctly, the default value for user and group is >>>> root, if commented out: >>>> #user = "root" >>>> #group = "root" >>>> >>>> also in some logs, I saw, that it detected, that it is not AMD >>>> CPU :) and it is really not AMD CPU. >>>> >>>> >>>> Just for fun, it might be important, here is how my node info >>>> looks. >>>> ComputeS01Parameters: >>>> NovaReservedHostMemory: 16384 >>>> KernelArgs: "crashkernel=no rhgb" >>>> ComputeS01ExtraConfig: >>>> nova::cpu_allocation_ratio: 4.0 >>>> nova::compute::libvirt::rx_queue_size: 1024 >>>> nova::compute::libvirt::tx_queue_size: 1024 >>>> nova::compute::resume_guests_state_on_host_boot: true >>>> _______________________________________________ >>>> >>>>
-- Ruslanas Gžibovskis +370 6030 7030
-- Ruslanas Gžibovskis +370 6030 7030 _______________________________________________ users mailing list users@lists.rdoproject.org http://lists.rdoproject.org/mailman/listinfo/users
To unsubscribe: users-unsubscribe@lists.rdoproject.org
On Thu, Jul 2, 2020 at 5:35 PM Alfredo Moralejo Alonso <amoralej@redhat.com> wrote:
On Thu, Jul 2, 2020 at 4:38 PM Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
it is, i have image build failing. i can modify yaml used to create image. can you remind me which files it would be?
Right, I see that the patch must not be working fine for centos and the package is being installed from delorean repos in the log. I guess it needs an entry to cover the centos 8 case (i'm checking with opstools maintainer).
https://review.opendev.org/739085
As workaround I'd propose you to use the package from:
https://trunk.rdoproject.org/centos8-ussuri/component/cloudops/current-tripl...
or alternatively applying some local patch to tripleo-puppet-elements.
and your question, "how it can impact kvm":
in image most of the packages get deployed from deloren repos. I believe part is from centos repos and part of whole packages in overcloud-full.qcow2 are from deloren. so it might have bit different minor version, that might be incompactible... at least it have happend for me previously with train release so i used tested ci fully from the beginning... I might be for sure wrong.
Delorean repos contain only OpenStack packages, things like nova, etc... not kvm or things included in CentOS repos. KVM will always installed which should be installed from "Advanced Virtualization" repository. May you check what versions of qemu-kvm and libvirt you got installed into the overcloud-full image?, it should match with the versions in:
http://mirror.centos.org/centos/8/virt/x86_64/advanced-virtualization/Packag...
like qemu-kvm-4.2.0-19.el8.x86_64.rpm and libvirt-6.0.0-17.el8.x86_64.rpm
On Thu, 2 Jul 2020, 17:18 Alfredo Moralejo Alonso, <amoralej@redhat.com> wrote:
On Thu, Jul 2, 2020 at 3:59 PM Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
by the way in CentOS8, here is an error message I receive when searching around
[stack@rdo-u ~]$ dnf list --enablerepo="*" --disablerepo "c8-media-BaseOS,c8-media-AppStream" | grep osops-tools-monitoring-oschecks Errors during downloading metadata for repository 'rdo-trunk-ussuri-tested': - Status code: 403 for https://trunk.rdoproject.org/centos8-ussuri/current-passed-ci/repodata/repom... (IP: 3.87.151.16) Error: Failed to download metadata for repo 'rdo-trunk-ussuri-tested': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried [stack@rdo-u ~]$
Yep, rdo-trunk-ussuri-tested repo included in the release rpm is disabled by default and not longer usable (i'll send a patch to retire it), don't enable it.
Sorry, I'm not sure how adding osops-tools-monitoring-oschecks may lead to install CentOS8 maintained kvm. BTW, i think that package should not be required in CentOS8:
https://opendev.org/openstack/tripleo-puppet-elements/commit/2d2bc4d8b20304d...
On Thu, 2 Jul 2020 at 15:56, Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
Hi All,
I have one idea, why it might be the issue.
during image creation step, I have hadd missing packets: pacemaker-remote osops-tools-monitoring-oschecks pacemaker pcs PCS thing can be found in HA repo, so I enabled it, but "osops-tools-monitoring-oschecks" ONLY in delorene for CentOS8...
I believe that is a case... so it installed non CentOS8 maintained kvm or some dependent packages....
How can I get osops-tools-monitoring-oschecks from centos repos? it is last seen in CentOS7 repos....
$ yum list --enablerepo=* --disablerepo "c7-media" | grep osops-tools-monitoring-oschecks -A2 osops-tools-monitoring-oschecks.noarch 0.0.1-0.20191202171903.bafe3f0.el7
rdo-trunk-train-tested ostree-debuginfo.x86_64 2019.1-2.el7 base-debuginfo (undercloud) [stack@ironic-poc ~]$
can I somehow not include that package in image creation? OR if it is essential, can I create a different repo for that one?
On Wed, 1 Jul 2020 at 14:19, Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
Hi all!
Here we go, we are in the second part of this interesting troubleshooting!
1) I have LogTool setup.Thank you Arkady.
2) I have user OSP to create instance, and I have used virsh to create instance. 2.1) OSP way is failing in either way, if it is volume-based or image-based, it is failing either way.. [1] and [2] 2.2) when I create it using CLI: [0] [3]
any ideas what can be wrong? What options I should choose? I have one network/vlan for whole cloud. I am doing proof of concept of remote booting, so I do not have br-ex setup. and I do not have br-provider.
There is my compute[5] and controller[6] yaml files, Please help, how it should look like so it would have br-ex and br-int connected? as br-int now is in UNKNOWN state. And br-ex do not exist. As I understand, in roles data yaml, when we have tag external it should create br-ex? or am I wrong?
[0] http://paste.openstack.org/show/Rdou7nvEWMxpGECfQHVm/ VM is running. [1] http://paste.openstack.org/show/tp8P0NUYNFcl4E0QR9IM/ < compute logs [2] http://paste.openstack.org/show/795431/ < controller logs [3] http://paste.openstack.org/show/HExQgBo4MDxItAEPNaRR/ [4] http://paste.openstack.org/show/795433/ < xml file for [5] https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/computeS... [6] https://github.com/qw3r3wq/homelab/blob/master/overcloud/net-config/controll...
On Tue, 30 Jun 2020 at 16:02, Arkady Shtempler <ashtempl@redhat.com> wrote:
> Hi all! > > I was able to analyze the attached log files and I hope that the > results may help you understand what's going wrong with instance creation. > You can find *Log_Tool's unique exported Error blocks* here: > http://paste.openstack.org/show/795356/ > > *Some statistics and problematical messages:* > ##### Statistics - Number of Errors/Warnings per Standard OSP log > since: 2020-06-30 12:30:00 ##### > Total_Number_Of_Errors --> 9 > /home/ashtempl/Ruslanas/controller/neutron/server.log --> 1 > /home/ashtempl/Ruslanas/compute/stdouts/ovn_controller.log --> 1 > /home/ashtempl/Ruslanas/compute/nova/nova-compute.log --> 7 > > *nova-compute.log* > *default default] Error launching a defined domain with XML: <domain > type='kvm'>* > 368-2020-06-30 12:30:10.815 7 *ERROR* nova.compute.manager > [req-87bef18f-ad3d-4147-a1b3-196b5b64b688 7bdb8c3bf8004f98aae1b16d938ac09b > 69134106b56941698e58c61... > 70dc50f] Instance *failed* to spawn: *libvirt.libvirtError*: > internal *error*: qemu unexpectedly closed the monitor: > 2020-06-30T10:30:10.182675Z qemu-kvm: *error*: failed to set MSR > 0... > he monitor: 2020-06-30T10:30:10.182675Z *qemu-kvm: error: failed to > set MSR 0x48e to 0xfff9fffe04006172* > _msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' *failed*. > [instance: 128f372c-cb2e-47d9-b1bf-ce17270dc50f] *Traceback* (most > recent call last): > 375-2020-06-30 12:30:10.815 7* ERROR* nova.compute.manager > [instance: 128f372c-cb2e-47d9-b1bf-ce17270dc50f] File > "/usr/lib/python3.6/site-packages/nova/vir... > > *server.log * > 5821c815-d213-498d-9394-fe25c6849918', 'status': 'failed', *'code': > 422} returned with failed status* > > *ovn_controller.log* > 272-2020-06-30T12:30:10.126079625+02:00 stderr F > 2020-06-30T10:30:10Z|00247|patch|WARN|*Bridge 'br-ex' not found for > network 'datacentre'* > > Thanks! > > Compute nodes are baremetal or virtualized?, I've seen similar bug >>>>>>> reports when using nested virtualization in other OSes. >>>>>>> >>>>>> baremetal. Dell R630 if to be VERY precise. >>>>> >>>>> Thank you, I will try. I also modified a file, and it looked >>>>> like it relaunched podman container once config was changed. Either way, if >>>>> I understand Linux config correctly, the default value for user and group >>>>> is root, if commented out: >>>>> #user = "root" >>>>> #group = "root" >>>>> >>>>> also in some logs, I saw, that it detected, that it is not AMD >>>>> CPU :) and it is really not AMD CPU. >>>>> >>>>> >>>>> Just for fun, it might be important, here is how my node info >>>>> looks. >>>>> ComputeS01Parameters: >>>>> NovaReservedHostMemory: 16384 >>>>> KernelArgs: "crashkernel=no rhgb" >>>>> ComputeS01ExtraConfig: >>>>> nova::cpu_allocation_ratio: 4.0 >>>>> nova::compute::libvirt::rx_queue_size: 1024 >>>>> nova::compute::libvirt::tx_queue_size: 1024 >>>>> nova::compute::resume_guests_state_on_host_boot: true >>>>> _______________________________________________ >>>>> >>>>>
-- Ruslanas Gžibovskis +370 6030 7030
-- Ruslanas Gžibovskis +370 6030 7030 _______________________________________________ users mailing list users@lists.rdoproject.org http://lists.rdoproject.org/mailman/listinfo/users
To unsubscribe: users-unsubscribe@lists.rdoproject.org
Hi Alfredo, since you mentioned, it is not essential to have that opstool, so I have replaced it with "sysstat" /usr/share/tripleo-puppet-elements/overcloud-opstools/pkg-map so now it is: "default": { "oschecks_package": "sysstat" } And you are absolutely right regarding delorean, it took only OSP packages from that, and kvm and libvirt are at your specified versions. And then I believe, I found case for failing VM: "6536f105-3f38-41bd-9ddd-6702d23c4ccb] Instance failed to spawn: nova.exception.PortBindingFailed: Binding failed for port af8ecd79-ddb8-4ba1-990d-1ccdb76f1442, please check" so, my question is: I have only control (pxe) network, which is distributed between sites and OSP is having only one network (ControlPlane). How my controller and compute network should look like? My controller network looks like [1] and compute like [2]. When I uncomment in compute br-provider part, it do not deploy. does br-provider networks MUST be interconnectable? I would need to have the possibility with the local network (vxlan) to communicate between instances within the cloud, and external connectivity would be done using provider vlan. each provider VLAN will be used only on one compute node. is it possible? [0] http://paste.openstack.org/show/lUAOzDZdzCCcDrrPCASq/ # full package list in libvirt container [1] http://paste.openstack.org/show/795562/ # controller net-config [2] http://paste.openstack.org/show/795563/ @ compute net-config On Thu, 2 Jul 2020 at 17:36, Alfredo Moralejo Alonso <amoralej@redhat.com> wrote:
On Thu, Jul 2, 2020 at 4:38 PM Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
it is, i have image build failing. i can modify yaml used to create image. can you remind me which files it would be?
Right, I see that the patch must not be working fine for centos and the package is being installed from delorean repos in the log. I guess it needs an entry to cover the centos 8 case (i'm checking with opstools maintainer).
As workaround I'd propose you to use the package from:
https://trunk.rdoproject.org/centos8-ussuri/component/cloudops/current-tripl...
or alternatively applying some local patch to tripleo-puppet-elements.
and your question, "how it can impact kvm":
in image most of the packages get deployed from deloren repos. I believe part is from centos repos and part of whole packages in overcloud-full.qcow2 are from deloren. so it might have bit different minor version, that might be incompactible... at least it have happend for me previously with train release so i used tested ci fully from the beginning... I might be for sure wrong.
Delorean repos contain only OpenStack packages, things like nova, etc... not kvm or things included in CentOS repos. KVM will always installed which should be installed from "Advanced Virtualization" repository. May you check what versions of qemu-kvm and libvirt you got installed into the overcloud-full image?, it should match with the versions in:
http://mirror.centos.org/centos/8/virt/x86_64/advanced-virtualization/Packag...
like qemu-kvm-4.2.0-19.el8.x86_64.rpm and libvirt-6.0.0-17.el8.x86_64.rpm
I have created a network with geneve and it worked. Previous network which it used by default was vlan. First of all, thank you Arkady for LogTool ;) Second, how to modify my config, to have VLAN working? NeutronNetworkType: 'vlan,geneve' NeutronTunnelTypes: 'vxlan' NeutronBridgeMappings: 'default:br-provider' NeutronGlobalPhysnetMtu: 1500 NeutronBridgeMappings: datacentre:br-ex NeutronExternalNetworkBridge: 'br-ex' my compute network layout. [1] http://paste.openstack.org/show/795562/ # controller net-config [2] http://paste.openstack.org/show/795563/ @ compute net-config [3] http://paste.openstack.org/show/795564/ # ip a s from compute On Mon, 6 Jul 2020 at 10:46, Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
Hi Alfredo,
since you mentioned, it is not essential to have that opstool, so I have replaced it with "sysstat" /usr/share/tripleo-puppet-elements/overcloud-opstools/pkg-map so now it is: "default": { "oschecks_package": "sysstat" }
And you are absolutely right regarding delorean, it took only OSP packages from that, and kvm and libvirt are at your specified versions.
And then I believe, I found case for failing VM:
"6536f105-3f38-41bd-9ddd-6702d23c4ccb] Instance failed to spawn: nova.exception.PortBindingFailed: Binding failed for port af8ecd79-ddb8-4ba1-990d-1ccdb76f1442, please check"
so, my question is: I have only control (pxe) network, which is distributed between sites and OSP is having only one network (ControlPlane). How my controller and compute network should look like? My controller network looks like [1] and compute like [2]. When I uncomment in compute br-provider part, it do not deploy. does br-provider networks MUST be interconnectable?
I would need to have the possibility with the local network (vxlan) to communicate between instances within the cloud, and external connectivity would be done using provider vlan. each provider VLAN will be used only on one compute node. is it possible?
[0] http://paste.openstack.org/show/lUAOzDZdzCCcDrrPCASq/ # full package list in libvirt container [1] http://paste.openstack.org/show/795562/ # controller net-config [2] http://paste.openstack.org/show/795563/ @ compute net-config
On Thu, 2 Jul 2020 at 17:36, Alfredo Moralejo Alonso <amoralej@redhat.com> wrote:
On Thu, Jul 2, 2020 at 4:38 PM Ruslanas Gžibovskis <ruslanas@lpic.lt> wrote:
it is, i have image build failing. i can modify yaml used to create image. can you remind me which files it would be?
Right, I see that the patch must not be working fine for centos and the package is being installed from delorean repos in the log. I guess it needs an entry to cover the centos 8 case (i'm checking with opstools maintainer).
As workaround I'd propose you to use the package from:
https://trunk.rdoproject.org/centos8-ussuri/component/cloudops/current-tripl...
or alternatively applying some local patch to tripleo-puppet-elements.
and your question, "how it can impact kvm":
in image most of the packages get deployed from deloren repos. I believe part is from centos repos and part of whole packages in overcloud-full.qcow2 are from deloren. so it might have bit different minor version, that might be incompactible... at least it have happend for me previously with train release so i used tested ci fully from the beginning... I might be for sure wrong.
Delorean repos contain only OpenStack packages, things like nova, etc... not kvm or things included in CentOS repos. KVM will always installed which should be installed from "Advanced Virtualization" repository. May you check what versions of qemu-kvm and libvirt you got installed into the overcloud-full image?, it should match with the versions in:
http://mirror.centos.org/centos/8/virt/x86_64/advanced-virtualization/Packag...
like qemu-kvm-4.2.0-19.el8.x86_64.rpm and libvirt-6.0.0-17.el8.x86_64.rpm
-- Ruslanas Gžibovskis +370 6030 7030
participants (4)
-
Alex Schultz
-
Alfredo Moralejo Alonso
-
Arkady Shtempler
-
Ruslanas Gžibovskis