<html>
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<style>
    font{
        line-height: 1.5;
    }
</style>
<div style="font-family:"微软雅黑"; font-size: 13px; color:#000000; line-height:1.5;">
    
<style>
    font{
        line-height: 1.5;
    }
</style>
<div style="font-family:"微软雅黑"; font-size: 13px; color:#000000; line-height:1.5;">
    <div>我们在使用Fuel 9.0部署M版OpenStack时, 系统在正常运行数天后,总会出现控制节点失效,不得不重启整个环境的问题。</div><div>使用brctl show 发现正常情况下的虚拟网桥接口全部消失不见了,如下:</div><div><div>root@node-5:/var/log# brctl show</div><div>bridge name<span class="Apple-tab-span" style="white-space:pre"> </span>bridge id<span class="Apple-tab-span" style="white-space:pre">           </span>STP enabled<span class="Apple-tab-span" style="white-space:pre"> </span>interfaces</div><div>br-ex<span class="Apple-tab-span" style="white-space:pre">              </span>8000.9457a5565678<span class="Apple-tab-span" style="white-space:pre">   </span>no<span class="Apple-tab-span" style="white-space:pre">          </span>eno1.104</div><div><span class="Apple-tab-span" style="white-space:pre">                                                     </span>p_ff798dba-0</div><div>br-fw-admin<span class="Apple-tab-span" style="white-space:pre">              </span>8000.9457a5565678<span class="Apple-tab-span" style="white-space:pre">   </span>no<span class="Apple-tab-span" style="white-space:pre">          </span>eno1</div><div><span class="Apple-tab-span" style="white-space:pre">                                                 </span>p_eeee51a2-0</div><div>br-mgmt<span class="Apple-tab-span" style="white-space:pre">          </span>8000.3215d3e4d700<span class="Apple-tab-span" style="white-space:pre">   </span>no<span class="Apple-tab-span" style="white-space:pre">          </span>eno1.101</div><div><span class="Apple-tab-span" style="white-space:pre">                                                     </span>mgmt-conntrd</div><div>br-storage<span class="Apple-tab-span" style="white-space:pre">               </span>8000.9457a5565678<span class="Apple-tab-span" style="white-space:pre">   </span>no<span class="Apple-tab-span" style="white-space:pre">          </span>eno1.102</div></div><div><br></div><div>经过查看neturon、upstart、nova、rabbitmq等组件的日志,均未解决问题,最终在pacemaker日志中发现如下错误:</div><div><div>ec 08 03:20:50 [7126] node-5.domain.tld       lrmd:   notice: operation_finished:<span class="Apple-tab-span" style="white-space:pre">  </span>p_rabbitmq-server_notify_0:79939:stderr [ Error: rabbit application is not running on node rabbit@messaging-node-5. ]</div><div>Dec 08 03:20:50 [7126] node-5.domain.tld       lrmd:   notice: operation_finished:<span class="Apple-tab-span" style="white-space:pre">  </span>p_rabbitmq-server_notify_0:79939:stderr [  * Suggestion: start it with "rabbitmqctl start_app" and try again ]</div><div>Dec 08 03:20:50 [7126] node-5.domain.tld       lrmd:     info: log_finished:<span class="Apple-tab-span" style="white-space:pre">   </span>finished - rsc:p_rabbitmq-server action:notify call_id:200 pid:79939 exit-code:0 exec-time:3738ms queue-time:0ms</div><div>Dec 08 03:20:50 [7129] node-5.domain.tld       crmd:     info: match_graph_event:<span class="Apple-tab-span" style="white-space:pre">   </span>Action p_rabbitmq-server_notify_0 (116) confirmed on node-5.domain.tld (rc=0)</div><div>Dec 08 03:20:50 [7129] node-5.domain.tld       crmd:   notice: process_lrm_event:<span class="Apple-tab-span" style="white-space:pre">   </span>Operation p_rabbitmq-server_notify_0: ok (node=node-5.domain.tld, call=200, rc=0, cib-update=0, confirmed=true)</div><div>Dec 08 03:20:50 [7129] node-5.domain.tld       crmd:   notice: run_graph:<span class="Apple-tab-span" style="white-space:pre"> </span>Transition 53849 (Complete=18, Pending=0, Fired=0, Skipped=1, Incomplete=8, Source=/var/lib/pacemaker/pengine/pe-input-2696.bz2): Stopped</div><div>Dec 08 03:20:50 [7129] node-5.domain.tld       crmd:     info: do_state_transition:<span class="Apple-tab-span" style="white-space:pre">        </span>State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd ]</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:   notice: unpack_config:<span class="Apple-tab-span" style="white-space:pre">   </span>On loss of CCM Quorum: Ignore</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: determine_online_status:<span class="Apple-tab-span" style="white-space:pre">     </span>Node node-5.domain.tld is online</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: determine_op_status:<span class="Apple-tab-span" style="white-space:pre">      </span>Operation monitor found resource p_vrouter:0 active on node-5.domain.tld</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: apply_system_health:<span class="Apple-tab-span" style="white-space:pre">      </span>Applying automated node health strategy: migrate-on-red</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: apply_system_health:<span class="Apple-tab-span" style="white-space:pre">       </span> Node node-5.domain.tld has an combined system health of -1000000</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: clone_print:<span class="Apple-tab-span" style="white-space:pre">     </span> Clone Set: clone_p_vrouter [p_vrouter]</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: short_print:<span class="Apple-tab-span" style="white-space:pre">       </span>     Stopped: [ node-5.domain.tld ]</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: native_print:<span class="Apple-tab-span" style="white-space:pre">        </span>vip__management<span class="Apple-tab-span" style="white-space:pre">     </span>(ocf::fuel:ns_IPaddr2):<span class="Apple-tab-span" style="white-space:pre">     </span>Stopped</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: native_print:<span class="Apple-tab-span" style="white-space:pre">      </span>vip__vrouter_pub<span class="Apple-tab-span" style="white-space:pre">    </span>(ocf::fuel:ns_IPaddr2):<span class="Apple-tab-span" style="white-space:pre">     </span>Stopped</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: native_print:<span class="Apple-tab-span" style="white-space:pre">      </span>vip__vrouter<span class="Apple-tab-span" style="white-space:pre">        </span>(ocf::fuel:ns_IPaddr2):<span class="Apple-tab-span" style="white-space:pre">     </span>Stopped</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: native_print:<span class="Apple-tab-span" style="white-space:pre">      </span>vip__public<span class="Apple-tab-span" style="white-space:pre"> </span>(ocf::fuel:ns_IPaddr2):<span class="Apple-tab-span" style="white-space:pre">     </span>Stopped</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: clone_print:<span class="Apple-tab-span" style="white-space:pre">       </span> Clone Set: clone_p_haproxy [p_haproxy]</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: short_print:<span class="Apple-tab-span" style="white-space:pre">       </span>     Stopped: [ node-5.domain.tld ]</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: clone_print:<span class="Apple-tab-span" style="white-space:pre"> </span> Clone Set: clone_p_mysqld [p_mysqld]</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: short_print:<span class="Apple-tab-span" style="white-space:pre"> </span>     Stopped: [ node-5.domain.tld ]</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: clone_print:<span class="Apple-tab-span" style="white-space:pre"> </span> Master/Slave Set: master_p_rabbitmq-server [p_rabbitmq-server]</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: short_print:<span class="Apple-tab-span" style="white-space:pre">       </span>     Slaves: [ node-5.domain.tld ]</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: native_print:<span class="Apple-tab-span" style="white-space:pre"> </span>p_aodh-evaluator<span class="Apple-tab-span" style="white-space:pre">    </span>(ocf::fuel:aodh-evaluator):<span class="Apple-tab-span" style="white-space:pre"> </span>Stopped</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: native_print:<span class="Apple-tab-span" style="white-space:pre">      </span>p_ceilometer-agent-central<span class="Apple-tab-span" style="white-space:pre">  </span>(ocf::fuel:ceilometer-agent-central):<span class="Apple-tab-span" style="white-space:pre">       </span>Stopped</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: clone_print:<span class="Apple-tab-span" style="white-space:pre">       </span> Clone Set: clone_neutron-openvswitch-agent [neutron-openvswitch-agent]</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: short_print:<span class="Apple-tab-span" style="white-space:pre">       </span>     Stopped: [ node-5.domain.tld ]</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: clone_print:<span class="Apple-tab-span" style="white-space:pre"> </span> Clone Set: clone_neutron-l3-agent [neutron-l3-agent]</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: short_print:<span class="Apple-tab-span" style="white-space:pre"> </span>     Stopped: [ node-5.domain.tld ]</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: clone_print:<span class="Apple-tab-span" style="white-space:pre"> </span> Clone Set: clone_neutron-metadata-agent [neutron-metadata-agent]</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: short_print:<span class="Apple-tab-span" style="white-space:pre">     </span>     Stopped: [ node-5.domain.tld ]</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: clone_print:<span class="Apple-tab-span" style="white-space:pre"> </span> Clone Set: clone_p_heat-engine [p_heat-engine]</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: short_print:<span class="Apple-tab-span" style="white-space:pre">       </span>     Stopped: [ node-5.domain.tld ]</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: clone_print:<span class="Apple-tab-span" style="white-space:pre"> </span> Clone Set: clone_neutron-dhcp-agent [neutron-dhcp-agent]</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: short_print:<span class="Apple-tab-span" style="white-space:pre">     </span>     Stopped: [ node-5.domain.tld ]</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: native_print:<span class="Apple-tab-span" style="white-space:pre">        </span>sysinfo_node-5.domain.tld<span class="Apple-tab-span" style="white-space:pre">   </span>(ocf::pacemaker:SysInfo):<span class="Apple-tab-span" style="white-space:pre">   </span>Stopped</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: clone_print:<span class="Apple-tab-span" style="white-space:pre">       </span> Clone Set: clone_p_dns [p_dns]</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: short_print:<span class="Apple-tab-span" style="white-space:pre">       </span>     Stopped: [ node-5.domain.tld ]</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: clone_print:<span class="Apple-tab-span" style="white-space:pre"> </span> Master/Slave Set: master_p_conntrackd [p_conntrackd]</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: short_print:<span class="Apple-tab-span" style="white-space:pre"> </span>     Stopped: [ node-5.domain.tld ]</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: clone_print:<span class="Apple-tab-span" style="white-space:pre"> </span> Clone Set: clone_p_ntp [p_ntp]</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: short_print:<span class="Apple-tab-span" style="white-space:pre">       </span>     Stopped: [ node-5.domain.tld ]</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: clone_print:<span class="Apple-tab-span" style="white-space:pre"> </span> Clone Set: clone_ping_vip__public [ping_vip__public]</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: short_print:<span class="Apple-tab-span" style="white-space:pre"> </span>     Stopped: [ node-5.domain.tld ]</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: rsc_merge_weights:<span class="Apple-tab-span" style="white-space:pre">   </span>clone_p_vrouter: Rolling back scores from clone_p_dns</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: rsc_merge_weights:<span class="Apple-tab-span" style="white-space:pre">   </span>clone_p_vrouter: Rolling back scores from clone_p_ntp</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: native_color:<span class="Apple-tab-span" style="white-space:pre">        </span>Resource p_vrouter:0 cannot run anywhere</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: rsc_merge_weights:<span class="Apple-tab-span" style="white-space:pre">        </span>clone_p_haproxy: Rolling back scores from vip__management</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: rsc_merge_weights:<span class="Apple-tab-span" style="white-space:pre">       </span>clone_p_haproxy: Rolling back scores from vip__public</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: native_color:<span class="Apple-tab-span" style="white-space:pre">        </span>Resource p_haproxy:0 cannot run anywhere</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: native_color:<span class="Apple-tab-span" style="white-space:pre">     </span>Resource vip__management cannot run anywhere</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: rsc_merge_weights:<span class="Apple-tab-span" style="white-space:pre">    </span>vip__vrouter_pub: Rolling back scores from master_p_conntrackd</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: rsc_merge_weights:<span class="Apple-tab-span" style="white-space:pre">  </span>vip__vrouter_pub: Rolling back scores from vip__vrouter</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: native_color:<span class="Apple-tab-span" style="white-space:pre">      </span>Resource vip__vrouter_pub cannot run anywhere</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: native_color:<span class="Apple-tab-span" style="white-space:pre">        </span>Resource vip__vrouter cannot run anywhere</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: native_color:<span class="Apple-tab-span" style="white-space:pre">    </span>Resource vip__public cannot run anywhere</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: native_color:<span class="Apple-tab-span" style="white-space:pre">     </span>Resource p_mysqld:0 cannot run anywhere</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: native_color:<span class="Apple-tab-span" style="white-space:pre">      </span>Resource p_rabbitmq-server:0 cannot run anywhere</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: master_color:<span class="Apple-tab-span" style="white-space:pre">     </span>master_p_rabbitmq-server: Promoted 0 instances of a possible 1 to master</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: native_color:<span class="Apple-tab-span" style="white-space:pre">     </span>Resource p_aodh-evaluator cannot run anywhere</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: native_color:<span class="Apple-tab-span" style="white-space:pre">        </span>Resource p_ceilometer-agent-central cannot run anywhere</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: native_color:<span class="Apple-tab-span" style="white-space:pre">      </span>Resource neutron-openvswitch-agent:0 cannot run anywhere</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: native_color:<span class="Apple-tab-span" style="white-space:pre">     </span>Resource neutron-l3-agent:0 cannot run anywhere</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: native_color:<span class="Apple-tab-span" style="white-space:pre">      </span>Resource neutron-metadata-agent:0 cannot run anywhere</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: native_color:<span class="Apple-tab-span" style="white-space:pre">        </span>Resource p_heat-engine:0 cannot run anywhere</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: native_color:<span class="Apple-tab-span" style="white-space:pre"> </span>Resource neutron-dhcp-agent:0 cannot run anywhere</div><div>Dec 08 03:20:50 [7128] node-5.domain.tld    pengine:     info: native_color:<span class="Apple-tab-span" style="white-space:pre">    </span>Resource sysinfo_node-5.domain.tld cannot run anywhere</div></div><div>可以看出pacemaker某些情况下认为节点健康度为负无穷,认为节点不可用,所有的资源无法找到可运行的节点,就停止掉了。关键信息是:</div><div>Applying automated node health strategy: migrate-on-red</div><div>证明可能与节点健康策略有关,通过google搜索pacemaker migrate-on-red,发现配置成该选项后操作系统等出现的问题会被设置为负无穷,导致节点不可用,但实际情况是服务器仍可用,</div><div>可能是硬件健康检查有一些告警,单节点情况下无节点切换,就整个当机了。</div><div>修正错误方法:</div><div>1、登入控制节点</div><div>2、输入crm 进入 pacemaker控制台</div><div>3、输入configure进入配置界面</div><div>4、输入edit编辑,将property中的<span style="line-height: 1.5;">node health strategy设为none,如下:</span></div><div><div>property cib-bootstrap-options: \</div><div>        have-watchdog=false \</div><div>        dc-version=1.1.14-70404b0 \</div><div>        cluster-infrastructure=corosync \</div><div>        cluster-recheck-interval=190s \</div><div>        no-quorum-policy=ignore \</div><div>        stonith-enabled=false \</div><div>        start-failure-is-fatal=false \</div><div>        symmetric-cluster=false \</div><div>        last-lrm-refresh=1477747972 \</div><div>        node-health-strategy=none</div></div><div><br></div><div><br></div><div><br><br>
</div><style type="text/css">
        a#ntes-pcmail-signature-default:hover {
            text-decoration: underline;
            color: #3593db;
            cursor: pointer;
        }
    </style><style>
        font{
            line-height: 1.5;
        }
    </style>
</div>
</div>
</body>
</html>