[Openstack] Masakari on queens

Torin Woltjer torin.woltjer at granddial.com
Fri Jun 29 21:04:42 UTC 2018


The wrong address was specified in the corosync configuration. Corrected that and now it runs without error. The important part here was the -c 1 switch of tcpdump. Timeout was being reached before a single packet was captured on tcpdump ( because the configuration of corosync was incorrect ). Once timeout was reached it was producing an exit code 124, which triggered the exception in the host_handler. 

Torin Woltjer
 
Grand Dial Communications - A ZK Tech Inc. Company
 
616.776.1066 ext. 2006
www.granddial.com

----------------------------------------
From: "Torin Woltjer" <torin.woltjer at granddial.com>
Sent: 6/22/18 2:17 PM
To: "Tushar.Patil at nttdata.com" <tushar.patil at nttdata.com>
Subject: Re: Masakari on queens
Oddly enough, I never made changes to the original code to get that output. It is just masakari-monitor 4.0.0 as installed by pip.

Here are the changes and output to that code snippit you sent:
http://paste.openstack.org/show/723924/

I'd like to increase the logging, but I'm not familiar with the codebase and lack more than a rudimentary knowledge of python. I've found where it seems pip installed the files for masakari-hostmonitor, but I don't know which one contains the corosync bit.

----------------------------------------
From: "Patil, Tushar" <Tushar.Patil at nttdata.com>
Sent: 6/20/18 12:51 AM
To: "torin.woltjer at granddial.com" <torin.woltjer at granddial.com>
Subject: Re: Masakari on queens
Hi Torin,

Option -i is correct.

It seems that you have modified code to log error message: "ProcessExecutionError: Unexpected error while running command."

Could you please log 'stderr' and 'exit_code' as well in order to know the exact error you are getting?
I suspect you must be getting 124 exit code.

This is a small program which I have created to simulate the error you are getting.
http://paste.openstack.org/show/723882/

Please specify interface and port as per your configuration and run the program.

Regards,
Tushar Patil

________________________________________
From: Torin Woltjer
Sent: Tuesday, June 19, 2018 9:58:32 PM
To: Patil, Tushar
Subject: Re: Masakari on queens

Thank for the reply. Tushar Patil.

The command:
$timeout 5 tcpdump -n -c 1 -p -I vlan60 port 5405
returns:
"tcpdump: enp2s0f0: That device doesn't support monitor mode"

The command:
(lowercase i)
$timeout 5 tcpdump -n -c 1 -p -i vlan60 port 5405
Runs fine with no errors:
"tcpdump: listening on vlan60, link-type EN10MB (Ethernet), capture size 262144 bytes"

The in use interfaces on all of my nodes are as follows:

enp2s0f0=192.168.114.x
enp3s0f0=bond0=vlan60,vlan101
enp3s0f1=bond0=vlan60,vlan101
vlan60=management
vlan101=provider

>From this part of handle_host.py I can't tell what is causing the command to raise exception.

________________________________
From: "Patil, Tushar"
Sent: 6/18/18 9:10 PM
To: "openstack at lists.openstack.org" , "torin.woltjer at granddial.com"
Subject: Re: Masakari on queens
Hi Torin,

Looking at the code, it seems it is trying to run below command as root user.

timeout tcpdump -n -c 1 -p -I port

where,
tcpdump_timeout -> CONF.host.tcpdump_timeout -> default value is 5 seconds
multicast_interface -> corosync_multicast_interface -> vlan60
multicast_ports-> corosync_multicast_ports -> 5405

Unfortunately, the error message is suppressed [1] hence it's difficult to know the exact reason.
Can you please run below command on the host where you are running masakari-hostmonitor service? The error you would get after running this command would give you some hint to troubleshoot this issue further.

$timeout 5 tcpdump -n -c 1 -p -I vlan60 port 5405

[1] : https://github.com/openstack/masakari-monitors/blob/cde057bc685b7bbc35f5c425f9690b01766654b2/masakarimonitors/hostmonitor/host_handler/handle_host.py#L121

Regards,
Tushar Patil

________________________________________
From: Torin Woltjer
Sent: Tuesday, June 19, 2018 4:01:29 AM
To: Patil, Tushar; openstack at lists.openstack.org
Subject: Masakari on queens

Hello Tushar Patil,

I have upgraded to Openstack Queens and am trying to run Masakari version 4.0.0 . I'm curious what additional configuration is required to get this set up correctly.

/etc/masakarimonitors/masakarimonitors.conf
http://paste.openstack.org/show/723726/

masakari-hostmonitor is giving me errors like:
2018-06-18 12:44:44.812 18236 ERROR masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync communication is failed.: ProcessExecutionError: Unexpected error while running command.
2018-06-18 12:45:14.895 18236 INFO masakarimonitors.hostmonitor.host_handler.handle_host [-] 'UBNTU-OSTACK-COMPUTE2' is 'online'.
2018-06-18 12:46:20.047 18236 WARNING masakarimonitors.hostmonitor.host_handler.handle_host [-] Corosync communication using 'vlan60' is failed.: ProcessExecutionError: Unexpected error while running command.

Do you have any knowledge on this?
Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged,confidential, and proprietary data. If you are not the intended recipient,please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding.
Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged,confidential, and proprietary data. If you are not the intended recipient,please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20180629/788a92f3/attachment.html>


More information about the Openstack mailing list