<div dir="auto">Be careful, a compute node showing as down may not actually be down at all but the agent not being able to report back or frm the conductor not being able to update the db. I was about to ask if the VM's were offline or not. Glad you got it figured out!<div dir="auto"><br></div><div dir="auto">//adam</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Dec 29, 2017 1:04 PM, "Jim Okken" <<a href="mailto:jim@jokken.com">jim@jokken.com</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">I believe this issue turned out to be the shared storage device we are using for shared storage to each compute node.<div><br><div>it had an access issue and one instance's vHD files had access attempts that hung forever and never timed out.</div><div>this make sense for one node to be having nova issues. But could this cause all compute nodes to have nova services to stop after some time? (in a shared storage setup does each node access/query each vHD on the storage periodically?)<br></div></div><div><br></div><div>thanks!</div><div class="gmail_extra"><br clear="all"><div><div class="m_8395651661324353467gmail_signature" data-smartmail="gmail_signature">-- Jim</div></div>
<br><div class="gmail_quote">On Tue, Dec 19, 2017 at 3:45 AM, Tobias Urdin <span dir="ltr"><<a href="mailto:tobias.urdin@crystone.com" target="_blank">tobias.urdin@crystone.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">



<div bgcolor="#FFFFFF" text="#000000">
<p>Enable debug in nova.conf and check conductor and compute logs.</p>
<p>Check that your clock is in-sync with NTP or you might experience that the alive checks in the database exceeds the service_down_time config value.<br>
</p>
<br>
<div class="m_8395651661324353467m_3361422542378429702moz-cite-prefix">On 12/19/2017 12:09 AM, Jim Okken wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">hi list,</div>
<div class="gmail_extra"><br>
</div>
<div class="gmail_extra">hoping someone could shed some light on this issue I just started seeing today</div>
<div class="gmail_extra"><br>
</div>
<div class="gmail_extra">all my compute nodes started showing as "Down" in the Horizon -> Hypervisors -> Compute Nodes tab</div>
<div class="gmail_extra"><br>
</div>
<div class="gmail_extra"><br>
</div>
<div class="gmail_extra">
<div class="gmail_extra">root@node-1:~# nova service-list</div>
<div class="gmail_extra">+-----+------------------+----<wbr>---------------+----------+---<wbr>------+-------+---------------<wbr>-------------+----------------<wbr>-+</div>
<div class="gmail_extra">| Id  | Binary           | Host              | Zone     | Status  | State | Updated_at                 | Disabled Reason |</div>
<div class="gmail_extra">+-----+------------------+----<wbr>---------------+----------+---<wbr>------+-------+---------------<wbr>-------------+----------------<wbr>-+</div>
<div class="gmail_extra">| 325 | nova-compute     | <a href="http://node-9.mydom.com" target="_blank">
node-9.mydom.com</a>  | nova     | enabled | down  | 2017-12-18T21:59:38.000000 | -               |</div>
<div class="gmail_extra">| 448 | nova-compute     | <a href="http://node-14.mydom.com" target="_blank">
node-14.mydom.com</a> | nova     | enabled | up    | 2017-12-18T22:41:42.000000 | -               |</div>
<div class="gmail_extra">| 451 | nova-compute     | <a href="http://node-17.mydom.com" target="_blank">
node-17.mydom.com</a> | nova     | enabled | up    | 2017-12-18T22:42:04.000000 | -               |</div>
<div class="gmail_extra">| 454 | nova-compute     | <a href="http://node-11.mydom.com" target="_blank">
node-11.mydom.com</a> | nova     | enabled | up    | 2017-12-18T22:42:02.000000 | -               |</div>
<div class="gmail_extra">| 457 | nova-compute     | <a href="http://node-12.mydom.com" target="_blank">
node-12.mydom.com</a> | nova     | enabled | up    | 2017-12-18T22:42:12.000000 | -               |</div>
<div class="gmail_extra">| 472 | nova-compute     | <a href="http://node-16.mydom.com" target="_blank">
node-16.mydom.com</a> | nova     | enabled | down  | 2017-12-18T00:16:01.000000 | -               |</div>
<div class="gmail_extra">| 475 | nova-compute     | <a href="http://node-10.mydom.com" target="_blank">
node-10.mydom.com</a> | nova     | enabled | down  | 2017-12-18T00:26:09.000000 | -               |</div>
<div class="gmail_extra">| 478 | nova-compute     | <a href="http://node-13.mydom.com" target="_blank">
node-13.mydom.com</a> | nova     | enabled | down  | 2017-12-17T23:54:06.000000 | -               |</div>
<div class="gmail_extra">| 481 | nova-compute     | <a href="http://node-15.mydom.com" target="_blank">
node-15.mydom.com</a> | nova     | enabled | up    | 2017-12-18T22:41:34.000000 | -               |</div>
<div class="gmail_extra">| 484 | nova-compute     | <a href="http://node-8.mydom.com" target="_blank">
node-8.mydom.com</a>  | nova     | enabled | down  | 2017-12-17T23:55:50.000000 | -               |</div>
<div><br>
</div>
</div>
<div class="gmail_extra"><br>
</div>
<div class="gmail_extra">if I stop and the start nova-compute on the down nodes the stop will take several minutes and then the start will be quick and fine. but after about 2 hours the nova-compute service will show down again.</div>
<div class="gmail_extra"><br>
</div>
<div class="gmail_extra">i am not seeing any ERRORS in nova logs.</div>
<div class="gmail_extra"><br>
</div>
<div class="gmail_extra">I get this for the status of a node that is showing as "UP"<br>
<br>
<div class="gmail_extra">
<div class="gmail_extra"><br>
</div>
<div class="gmail_extra"><br>
</div>
<div class="gmail_extra">root@node-14:~# systemctl status nova-compute.service</div>
<div class="gmail_extra">â nova-compute.service - OpenStack Compute</div>
<div class="gmail_extra">   Loaded: loaded (/lib/systemd/system/nova-comp<wbr>ute.service; enabled; vendor preset: enabled)</div>
<div class="gmail_extra">   Active: active (running) since Mon 2017-12-18 21:57:10 UTC; 35min ago</div>
<div class="gmail_extra">     Docs: man:nova-compute(1)</div>
<div class="gmail_extra">  Process: 32193 ExecStartPre=/bin/chown nova:adm /var/log/nova (code=exited, status=0/SUCCESS)</div>
<div class="gmail_extra">  Process: 32190 ExecStartPre=/bin/chown nova:nova /var/lock/nova /var/lib/nova (code=exited, status=0/SUCCESS)</div>
<div class="gmail_extra">  Process: 32187 ExecStartPre=/bin/mkdir -p /var/lock/nova /var/log/nova /var/lib/nova (code=exited, status=0/SUCCESS)</div>
<div class="gmail_extra"> Main PID: 32196 (nova-compute)</div>
<div class="gmail_extra">   CGroup: /system.slice/nova-compute.ser<wbr>vice</div>
<div class="gmail_extra">           ââ32196 /usr/bin/python /usr/bin/nova-compute --config-file=/etc/nova/nova-c<wbr>ompute.conf --config-file=/etc/nova/nova.c<wbr>onf --log-file=/var/log/nova/nova-<wbr>compute.log</div>
<div class="gmail_extra"><br>
</div>
<div class="gmail_extra">Dec 18 22:31:47 <a href="http://node-14.mydom.com" target="_blank">
node-14.mydom.com</a> nova-compute[32196]: 2017-12-18 22:31:47.570 32196 DEBUG oslo_messaging._drivers.amqpdr<wbr>iver [req-f30b2331-2097-4981-89c8-a<wbr>cea4a81f7f2 - - - - -] CALL msg_id: 2877b9707da144f3a91e7b80e2705f<wbr>b3 exchange 'nova' topic 'conductor' _send /usr/lib/python2.7/dist-packag<wbr>es/oslo_messaging/_drivers/<wbr>amqpdriver.py:448</div>
<div class="gmail_extra">Dec 18 22:31:47 <a href="http://node-14.mydom.com" target="_blank">
node-14.mydom.com</a> nova-compute[32196]: 2017-12-18 22:31:47.604 32196 DEBUG oslo_messaging._drivers.amqpdr<wbr>iver [-] received reply msg_id: 2877b9707da144f3a91e7b80e2705f<wbr>b3 __call__ /usr/lib/python2.7/dist-packag<wbr>es/oslo_messaging/_drivers/<wbr>amqpdriver.py:296</div>
<div class="gmail_extra">Dec 18 22:31:47 <a href="http://node-14.mydom.com" target="_blank">
node-14.mydom.com</a> nova-compute[32196]: 2017-12-18 22:31:47.605 32196 INFO nova.compute.resource_tracker [req-f30b2331-2097-4981-89c8-a<wbr>cea4a81f7f2 - - - - -] Total usable vcpus: 40, total allocated vcpus: 0</div>
<div class="gmail_extra">Dec 18 22:31:47 <a href="http://node-14.mydom.com" target="_blank">
node-14.mydom.com</a> nova-compute[32196]: 2017-12-18 22:31:47.606 32196 INFO nova.compute.resource_tracker [req-f30b2331-2097-4981-89c8-a<wbr>cea4a81f7f2 - - - - -] Final resource view: name=<a href="http://node-14.mydom.com" target="_blank">node-14.mydom.com</a>
 phys_ram=128812MB used_ram=512MB phys_disk=6691GB used_disk=0GB total_vcpus=40 used_vcpus=0 pci_stats=[]</div>
<div class="gmail_extra">Dec 18 22:31:47 <a href="http://node-14.mydom.com" target="_blank">
node-14.mydom.com</a> nova-compute[32196]: 2017-12-18 22:31:47.610 32196 DEBUG oslo_messaging._drivers.amqpdr<wbr>iver [req-f30b2331-2097-4981-89c8-a<wbr>cea4a81f7f2 - - - - -] CALL msg_id: ad32abe833f4440d86c15b911aa35c<wbr>43 exchange 'nova' topic 'conductor' _send /usr/lib/python2.7/dist-packag<wbr>es/oslo_messaging/_drivers/<wbr>amqpdriver.py:448</div>
<div class="gmail_extra">Dec 18 22:31:47 <a href="http://node-14.mydom.com" target="_blank">
node-14.mydom.com</a> nova-compute[32196]: 2017-12-18 22:31:47.632 32196 DEBUG oslo_messaging._drivers.amqpdr<wbr>iver [-] received reply msg_id: ad32abe833f4440d86c15b911aa35c<wbr>43 __call__ /usr/lib/python2.7/dist-packag<wbr>es/oslo_messaging/_drivers/<wbr>amqpdriver.py:296</div>
<div class="gmail_extra">Dec 18 22:31:47 <a href="http://node-14.mydom.com" target="_blank">
node-14.mydom.com</a> nova-compute[32196]: 2017-12-18 22:31:47.633 32196 WARNING nova.scheduler.client.report [req-f30b2331-2097-4981-89c8-a<wbr>cea4a81f7f2 - - - - -] Unable to refresh my resource provider record</div>
<div class="gmail_extra">Dec 18 22:31:47 <a href="http://node-14.mydom.com" target="_blank">
node-14.mydom.com</a> nova-compute[32196]: 2017-12-18 22:31:47.634 32196 INFO nova.compute.resource_tracker [req-f30b2331-2097-4981-89c8-a<wbr>cea4a81f7f2 - - - - -] Compute_service record updated for node-14.mydom.com:n<a href="http://ode-14.mydom.com" target="_blank">ode-14.mydo<wbr>m.com</a></div>
<div class="gmail_extra">Dec 18 22:31:52 <a href="http://node-14.mydom.com" target="_blank">
node-14.mydom.com</a> nova-compute[32196]: 2017-12-18 22:31:52.247 32196 DEBUG oslo_messaging._drivers.amqpdr<wbr>iver [req-f30b2331-2097-4981-89c8-a<wbr>cea4a81f7f2 - - - - -] CALL msg_id: 4cbf019c36ce41cd89d34e59f6acc5<wbr>5f exchange 'nova' topic 'conductor' _send /usr/lib/python2.7/dist-packag<wbr>es/oslo_messaging/_drivers/<wbr>amqpdriver.py:448</div>
<div class="gmail_extra">Dec 18 22:31:52 <a href="http://node-14.mydom.com" target="_blank">
node-14.mydom.com</a> nova-compute[32196]: 2017-12-18 22:31:52.265 32196 DEBUG oslo_messaging._drivers.amqpdr<wbr>iver [-] received reply msg_id: 4cbf019c36ce41cd89d34e59f6acc5<wbr>5f __call__ /usr/lib/python2.7/dist-packag<wbr>es/oslo_messaging/_drivers/<wbr>amqpdriver.py:296</div>
<div class="gmail_extra">root@node-14:~#</div>
<div class="gmail_extra"><br>
</div>
</div>
<div class="gmail_extra"><br>
</div>
<div class="gmail_extra"><br>
</div>
<div class="gmail_extra"><br>
</div>
<div class="gmail_extra"><br>
</div>
<div class="gmail_extra"><br>
</div>
<div class="gmail_extra"><br>
</div>
<div class="gmail_extra">I get this for the status of a node that is showing "DOWN"</div>
</div>
<div class="gmail_extra"><br>
</div>
<div class="gmail_extra"><br>
</div>
<div class="gmail_extra">
<div class="gmail_extra">root@node-9:~# systemctl status nova-compute.service</div>
<div class="gmail_extra">â nova-compute.service - OpenStack Compute</div>
<div class="gmail_extra">   Loaded: loaded (/lib/systemd/system/nova-comp<wbr>ute.service; enabled; vendor preset: enabled)</div>
<div class="gmail_extra">   Active: active (running) since Mon 2017-12-18 21:20:30 UTC; 1h 11min ago</div>
<div class="gmail_extra">     Docs: man:nova-compute(1)</div>
<div class="gmail_extra">  Process: 9488 ExecStartPre=/bin/chown nova:adm /var/log/nova (code=exited, status=0/SUCCESS)</div>
<div class="gmail_extra">  Process: 9485 ExecStartPre=/bin/chown nova:nova /var/lock/nova /var/lib/nova (code=exited, status=0/SUCCESS)</div>
<div class="gmail_extra">  Process: 9482 ExecStartPre=/bin/mkdir -p /var/lock/nova /var/log/nova /var/lib/nova (code=exited, status=0/SUCCESS)</div>
<div class="gmail_extra"> Main PID: 9491 (nova-compute)</div>
<div class="gmail_extra">   CGroup: /system.slice/nova-compute.ser<wbr>vice</div>
<div class="gmail_extra">           ââ 9491 /usr/bin/python /usr/bin/nova-compute --config-file=/etc/nova/nova-c<wbr>ompute.conf --config-file=/etc/nova/nova.c<wbr>onf --log-file=/var/log/nova/nova-<wbr>compute.log</div>
<div class="gmail_extra">           ââ20428 /usr/bin/python /usr/bin/nova-compute --config-file=/etc/nova/nova-c<wbr>ompute.conf --config-file=/etc/nova/nova.c<wbr>onf --log-file=/var/log/nova/nova-<wbr>compute.log</div>
<div class="gmail_extra"><br>
</div>
<div class="gmail_extra">Dec 18 22:00:32 <a href="http://node-9.mydom.com" target="_blank">
node-9.mydom.com</a> nova-compute[9491]: 2017-12-18 22:00:32.065 9491 INFO nova.virt.libvirt.imagecache [req-7623143a-2263-448e-8100-b<wbr>6248501ab42 - - - - -] image 2e42dd45-0b7d-468a-bc1e-3ece5b<wbr>dc1638 at (/mnt/MSA_FC_Vol1/nodes/_base/<wbr>fe073169880f11099449fef24d86f3<wbr>c1f8bb9763):
 checking</div>
<div class="gmail_extra">Dec 18 22:00:32 <a href="http://node-9.mydom.com" target="_blank">
node-9.mydom.com</a> nova-compute[9491]: 2017-12-18 22:00:32.066 9491 INFO nova.virt.libvirt.imagecache [req-7623143a-2263-448e-8100-b<wbr>6248501ab42 - - - - -] image 2e42dd45-0b7d-468a-bc1e-3ece5b<wbr>dc1638 at (/mnt/MSA_FC_Vol1/nodes/_base/<wbr>fe073169880f11099449fef24d86f3<wbr>c1f8bb9763):
 in use: on this node 0 local, 4 on other nodes sharing this</div>
<div class="gmail_extra">Dec 18 22:00:32 <a href="http://node-9.mydom.com" target="_blank">
node-9.mydom.com</a> sudo[40588]:     nova : TTY=unknown ; PWD=/var/lib/nova ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf touch -c /mnt/MSA_FC_Vol1/nodes/_base/f<wbr>e073169880f11099449fef24d86f3c<wbr>1f8bb9763</div>
<div class="gmail_extra">Dec 18 22:00:32 <a href="http://node-9.mydom.com" target="_blank">
node-9.mydom.com</a> sudo[40588]: pam_unix(sudo:session): session opened for user root by (uid=0)</div>
<div class="gmail_extra">Dec 18 22:00:32 <a href="http://node-9.mydom.com" target="_blank">
node-9.mydom.com</a> sudo[40588]: pam_unix(sudo:session): session closed for user root</div>
<div class="gmail_extra">Dec 18 22:00:32 <a href="http://node-9.mydom.com" target="_blank">
node-9.mydom.com</a> nova-compute[9491]: 2017-12-18 22:00:32.148 9491 INFO nova.virt.libvirt.imagecache [req-7623143a-2263-448e-8100-b<wbr>6248501ab42 - - - - -] image 01db7edf-ee68-4fbc-bf3e-1ac4bc<wbr>990488 at (/mnt/MSA_FC_Vol1/nodes/_base/<wbr>a49721a231fdd7b45293b29dd13c34<wbr>207c9c891b):
 checking</div>
<div class="gmail_extra">Dec 18 22:00:32 <a href="http://node-9.mydom.com" target="_blank">
node-9.mydom.com</a> nova-compute[9491]: 2017-12-18 22:00:32.149 9491 INFO nova.virt.libvirt.imagecache [req-7623143a-2263-448e-8100-b<wbr>6248501ab42 - - - - -] image 01db7edf-ee68-4fbc-bf3e-1ac4bc<wbr>990488 at (/mnt/MSA_FC_Vol1/nodes/_base/<wbr>a49721a231fdd7b45293b29dd13c34<wbr>207c9c891b):
 in use: on this node 0 local, 2 on other nodes sharing this</div>
<div class="gmail_extra">Dec 18 22:00:32 <a href="http://node-9.mydom.com" target="_blank">
node-9.mydom.com</a> sudo[40591]:     nova : TTY=unknown ; PWD=/var/lib/nova ; USER=root ; COMMAND=/usr/bin/nova-rootwrap /etc/nova/rootwrap.conf touch -c /mnt/MSA_FC_Vol1/nodes/_base/a<wbr>49721a231fdd7b45293b29dd13c342<wbr>07c9c891b</div>
<div class="gmail_extra">Dec 18 22:00:32 <a href="http://node-9.mydom.com" target="_blank">
node-9.mydom.com</a> sudo[40591]: pam_unix(sudo:session): session opened for user root by (uid=0)</div>
<div class="gmail_extra">Dec 18 22:00:32 <a href="http://node-9.mydom.com" target="_blank">
node-9.mydom.com</a> sudo[40591]: pam_unix(sudo:session): session closed for user root</div>
<div class="gmail_extra">lines 1-22/22 (END)</div>
<div class="gmail_extra">root@node-9:~#</div>
<div class="gmail_extra"><br>
</div>
<div class="gmail_extra"><br>
</div>
<div class="gmail_extra"><br>
</div>
<div class="gmail_extra"><br>
</div>
<div class="gmail_extra"><br>
</div>
<div class="gmail_extra"><br>
</div>
<div class="gmail_extra">Does anything in the status messages show what could be wrong? What do the "nova : TTY=unknown" messages mean?</div>
<div class="gmail_extra"><br>
</div>
<div class="gmail_extra">thanks!!</div>
<div class="gmail_extra"><br>
</div>
<div class="gmail_extra">-- Jim</div>
</div>
<div class="gmail_extra"><br>
</div>
</div>
</blockquote>
<br>
</div>

</blockquote></div><br></div></div>
<br>______________________________<wbr>_________________<br>
Mailing list: <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack" rel="noreferrer" target="_blank">http://lists.openstack.org/<wbr>cgi-bin/mailman/listinfo/<wbr>openstack</a><br>
Post to     : <a href="mailto:openstack@lists.openstack.org">openstack@lists.openstack.org</a><br>
Unsubscribe : <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack" rel="noreferrer" target="_blank">http://lists.openstack.org/<wbr>cgi-bin/mailman/listinfo/<wbr>openstack</a><br>
<br></blockquote></div></div>