[Openstack] compute nodes down

Volodymyr Litovka doka.ua at gmx.com
Mon Dec 18 23:18:14 UTC 2017


Hi Jim,

switch debug to true in nova.conf and check *also* other logs - 
nova-scheduler, nova-placement, nova-conductor.

On 12/19/17 12:54 AM, Jim Okken wrote:
> hi list,
>
> hoping someone could shed some light on this issue I just started 
> seeing today
>
> all my compute nodes started showing as "Down" in the Horizon -> 
> Hypervisors -> Compute Nodes tab
>
>
> root at node-1:~# nova service-list
> +-----+------------------+-------------------+----------+---------+-------+----------------------------+-----------------+
> | Id  | Binary           | Host         | Zone     | Status  | State | 
> Updated_at      | Disabled Reason |
> +-----+------------------+-------------------+----------+---------+-------+----------------------------+-----------------+
> | 325 | nova-compute     | node-9.mydom.com <http://node-9.mydom.com> 
> | nova     | enabled | down  | 2017-12-18T21:59:38.000000 | -          
>      |
> | 448 | nova-compute     | node-14.mydom.com 
> <http://node-14.mydom.com> | nova     | enabled | up    | 
> 2017-12-18T22:41:42.000000 | -               |
> | 451 | nova-compute     | node-17.mydom.com 
> <http://node-17.mydom.com> | nova     | enabled | up    | 
> 2017-12-18T22:42:04.000000 | -               |
> | 454 | nova-compute     | node-11.mydom.com 
> <http://node-11.mydom.com> | nova     | enabled | up    | 
> 2017-12-18T22:42:02.000000 | -               |
> | 457 | nova-compute     | node-12.mydom.com 
> <http://node-12.mydom.com> | nova     | enabled | up    | 
> 2017-12-18T22:42:12.000000 | -               |
> | 472 | nova-compute     | node-16.mydom.com 
> <http://node-16.mydom.com> | nova     | enabled | down  | 
> 2017-12-18T00:16:01.000000 | -               |
> | 475 | nova-compute     | node-10.mydom.com 
> <http://node-10.mydom.com> | nova     | enabled | down  | 
> 2017-12-18T00:26:09.000000 | -               |
> | 478 | nova-compute     | node-13.mydom.com 
> <http://node-13.mydom.com> | nova     | enabled | down  | 
> 2017-12-17T23:54:06.000000 | -               |
> | 481 | nova-compute     | node-15.mydom.com 
> <http://node-15.mydom.com> | nova     | enabled | up    | 
> 2017-12-18T22:41:34.000000 | -               |
> | 484 | nova-compute     | node-8.mydom.com <http://node-8.mydom.com> 
> | nova     | enabled | down  | 2017-12-17T23:55:50.000000 | -          
>      |
>
>
> if I stop and the start nova-compute on the down nodes the stop will 
> take several minutes and then the start will be quick and fine. but 
> after about 2 hours the nova-compute service will show down again.
>
> i am not seeing any ERRORS in nova logs.
>
> I get this for the status of a node that is showing as "UP"
>
>
>
> root at node-14:~# systemctl status nova-compute.service
> â nova-compute.service - OpenStack Compute
>    Loaded: loaded (/lib/systemd/system/nova-compute.service; enabled; 
> vendor preset: enabled)
>    Active: active (running) since Mon 2017-12-18 21:57:10 UTC; 35min ago
>      Docs: man:nova-compute(1)
>   Process: 32193 ExecStartPre=/bin/chown nova:adm /var/log/nova 
> (code=exited, status=0/SUCCESS)
>   Process: 32190 ExecStartPre=/bin/chown nova:nova /var/lock/nova 
> /var/lib/nova (code=exited, status=0/SUCCESS)
>   Process: 32187 ExecStartPre=/bin/mkdir -p /var/lock/nova 
> /var/log/nova /var/lib/nova (code=exited, status=0/SUCCESS)
>  Main PID: 32196 (nova-compute)
>    CGroup: /system.slice/nova-compute.service
>            ââ32196 /usr/bin/python /usr/bin/nova-compute 
> --config-file=/etc/nova/nova-compute.conf 
> --config-file=/etc/nova/nova.conf 
> --log-file=/var/log/nova/nova-compute.log
>
> Dec 18 22:31:47 node-14.mydom.com <http://node-14.mydom.com> 
> nova-compute[32196]: 2017-12-18 22:31:47.570 32196 DEBUG 
> oslo_messaging._drivers.amqpdriver 
> [req-f30b2331-2097-4981-89c8-acea4a81f7f2 - - - - -] CALL msg_id: 
> 2877b9707da144f3a91e7b80e2705fb3 exchange 'nova' topic 'conductor' 
> _send 
> /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:448
> Dec 18 22:31:47 node-14.mydom.com <http://node-14.mydom.com> 
> nova-compute[32196]: 2017-12-18 22:31:47.604 32196 DEBUG 
> oslo_messaging._drivers.amqpdriver [-] received reply msg_id: 
> 2877b9707da144f3a91e7b80e2705fb3 __call__ 
> /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:296
> Dec 18 22:31:47 node-14.mydom.com <http://node-14.mydom.com> 
> nova-compute[32196]: 2017-12-18 22:31:47.605 32196 INFO 
> nova.compute.resource_tracker 
> [req-f30b2331-2097-4981-89c8-acea4a81f7f2 - - - - -] Total usable 
> vcpus: 40, total allocated vcpus: 0
> Dec 18 22:31:47 node-14.mydom.com <http://node-14.mydom.com> 
> nova-compute[32196]: 2017-12-18 22:31:47.606 32196 INFO 
> nova.compute.resource_tracker 
> [req-f30b2331-2097-4981-89c8-acea4a81f7f2 - - - - -] Final resource 
> view: name=node-14.mydom.com <http://node-14.mydom.com> 
> phys_ram=128812MB used_ram=512MB phys_disk=6691GB used_disk=0GB 
> total_vcpus=40 used_vcpus=0 pci_stats=[]
> Dec 18 22:31:47 node-14.mydom.com <http://node-14.mydom.com> 
> nova-compute[32196]: 2017-12-18 22:31:47.610 32196 DEBUG 
> oslo_messaging._drivers.amqpdriver 
> [req-f30b2331-2097-4981-89c8-acea4a81f7f2 - - - - -] CALL msg_id: 
> ad32abe833f4440d86c15b911aa35c43 exchange 'nova' topic 'conductor' 
> _send 
> /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:448
> Dec 18 22:31:47 node-14.mydom.com <http://node-14.mydom.com> 
> nova-compute[32196]: 2017-12-18 22:31:47.632 32196 DEBUG 
> oslo_messaging._drivers.amqpdriver [-] received reply msg_id: 
> ad32abe833f4440d86c15b911aa35c43 __call__ 
> /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:296
> Dec 18 22:31:47 node-14.mydom.com <http://node-14.mydom.com> 
> nova-compute[32196]: 2017-12-18 22:31:47.633 32196 WARNING 
> nova.scheduler.client.report [req-f30b2331-2097-4981-89c8-acea4a81f7f2 
> - - - - -] Unable to refresh my resource provider record
> Dec 18 22:31:47 node-14.mydom.com <http://node-14.mydom.com> 
> nova-compute[32196]: 2017-12-18 22:31:47.634 32196 INFO 
> nova.compute.resource_tracker 
> [req-f30b2331-2097-4981-89c8-acea4a81f7f2 - - - - -] Compute_service 
> record updated for node-14.mydom.com:node-14.mydom.com 
> <http://ode-14.mydom.com>
> Dec 18 22:31:52 node-14.mydom.com <http://node-14.mydom.com> 
> nova-compute[32196]: 2017-12-18 22:31:52.247 32196 DEBUG 
> oslo_messaging._drivers.amqpdriver 
> [req-f30b2331-2097-4981-89c8-acea4a81f7f2 - - - - -] CALL msg_id: 
> 4cbf019c36ce41cd89d34e59f6acc55f exchange 'nova' topic 'conductor' 
> _send 
> /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:448
> Dec 18 22:31:52 node-14.mydom.com <http://node-14.mydom.com> 
> nova-compute[32196]: 2017-12-18 22:31:52.265 32196 DEBUG 
> oslo_messaging._drivers.amqpdriver [-] received reply msg_id: 
> 4cbf019c36ce41cd89d34e59f6acc55f __call__ 
> /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:296
> root at node-14:~#
>
>
>
>
>
>
>
> I get this for the status of a node that is showing "DOWN"
>
>
> root at node-9:~# systemctl status nova-compute.service
> â nova-compute.service - OpenStack Compute
>    Loaded: loaded (/lib/systemd/system/nova-compute.service; enabled; 
> vendor preset: enabled)
>    Active: active (running) since Mon 2017-12-18 21:20:30 UTC; 1h 
> 11min ago
>      Docs: man:nova-compute(1)
>   Process: 9488 ExecStartPre=/bin/chown nova:adm /var/log/nova 
> (code=exited, status=0/SUCCESS)
>   Process: 9485 ExecStartPre=/bin/chown nova:nova /var/lock/nova 
> /var/lib/nova (code=exited, status=0/SUCCESS)
>   Process: 9482 ExecStartPre=/bin/mkdir -p /var/lock/nova 
> /var/log/nova /var/lib/nova (code=exited, status=0/SUCCESS)
>  Main PID: 9491 (nova-compute)
>    CGroup: /system.slice/nova-compute.service
>            ââ 9491 /usr/bin/python /usr/bin/nova-compute 
> --config-file=/etc/nova/nova-compute.conf 
> --config-file=/etc/nova/nova.conf 
> --log-file=/var/log/nova/nova-compute.log
>            ââ20428 /usr/bin/python /usr/bin/nova-compute 
> --config-file=/etc/nova/nova-compute.conf 
> --config-file=/etc/nova/nova.conf 
> --log-file=/var/log/nova/nova-compute.log
>
> Dec 18 22:00:32 node-9.mydom.com <http://node-9.mydom.com> 
> nova-compute[9491]: 2017-12-18 22:00:32.065 9491 INFO 
> nova.virt.libvirt.imagecache [req-7623143a-2263-448e-8100-b6248501ab42 
> - - - - -] image 2e42dd45-0b7d-468a-bc1e-3ece5bdc1638 at 
> (/mnt/MSA_FC_Vol1/nodes/_base/fe073169880f11099449fef24d86f3c1f8bb9763): 
> checking
> Dec 18 22:00:32 node-9.mydom.com <http://node-9.mydom.com> 
> nova-compute[9491]: 2017-12-18 22:00:32.066 9491 INFO 
> nova.virt.libvirt.imagecache [req-7623143a-2263-448e-8100-b6248501ab42 
> - - - - -] image 2e42dd45-0b7d-468a-bc1e-3ece5bdc1638 at 
> (/mnt/MSA_FC_Vol1/nodes/_base/fe073169880f11099449fef24d86f3c1f8bb9763): 
> in use: on this node 0 local, 4 on other nodes sharing this
> Dec 18 22:00:32 node-9.mydom.com <http://node-9.mydom.com> 
> sudo[40588]:     nova : TTY=unknown ; PWD=/var/lib/nova ; USER=root ; 
> COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf touch -c 
> /mnt/MSA_FC_Vol1/nodes/_base/fe073169880f11099449fef24d86f3c1f8bb9763
> Dec 18 22:00:32 node-9.mydom.com <http://node-9.mydom.com> 
> sudo[40588]: pam_unix(sudo:session): session opened for user root by 
> (uid=0)
> Dec 18 22:00:32 node-9.mydom.com <http://node-9.mydom.com> 
> sudo[40588]: pam_unix(sudo:session): session closed for user root
> Dec 18 22:00:32 node-9.mydom.com <http://node-9.mydom.com> 
> nova-compute[9491]: 2017-12-18 22:00:32.148 9491 INFO 
> nova.virt.libvirt.imagecache [req-7623143a-2263-448e-8100-b6248501ab42 
> - - - - -] image 01db7edf-ee68-4fbc-bf3e-1ac4bc990488 at 
> (/mnt/MSA_FC_Vol1/nodes/_base/a49721a231fdd7b45293b29dd13c34207c9c891b): 
> checking
> Dec 18 22:00:32 node-9.mydom.com <http://node-9.mydom.com> 
> nova-compute[9491]: 2017-12-18 22:00:32.149 9491 INFO 
> nova.virt.libvirt.imagecache [req-7623143a-2263-448e-8100-b6248501ab42 
> - - - - -] image 01db7edf-ee68-4fbc-bf3e-1ac4bc990488 at 
> (/mnt/MSA_FC_Vol1/nodes/_base/a49721a231fdd7b45293b29dd13c34207c9c891b): 
> in use: on this node 0 local, 2 on other nodes sharing this
> Dec 18 22:00:32 node-9.mydom.com <http://node-9.mydom.com> 
> sudo[40591]:     nova : TTY=unknown ; PWD=/var/lib/nova ; USER=root ; 
> COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf touch -c 
> /mnt/MSA_FC_Vol1/nodes/_base/a49721a231fdd7b45293b29dd13c34207c9c891b
> Dec 18 22:00:32 node-9.mydom.com <http://node-9.mydom.com> 
> sudo[40591]: pam_unix(sudo:session): session opened for user root by 
> (uid=0)
> Dec 18 22:00:32 node-9.mydom.com <http://node-9.mydom.com> 
> sudo[40591]: pam_unix(sudo:session): session closed for user root
> lines 1-22/22 (END)
> root at node-9:~#
>
>
>
>
>
>
> Does anything in the status messages show what could be wrong? What do 
> the "nova : TTY=unknown" messages mean?
>
> thanks!!
>
> -- Jim
>
>
>
> _______________________________________________
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack at lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

-- 
Volodymyr Litovka
   "Vision without Execution is Hallucination." -- Thomas Edison

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20171219/677ef0ad/attachment.html>


More information about the Openstack mailing list