<p dir="ltr">Yes, I have created custom iso with debug output. It didn't help, so another one with strace was created. </p>
<div class="gmail_quote">On Jan 27, 2016 00:56, "Alex Schultz" <<a href="mailto:aschultz@mirantis.com">aschultz@mirantis.com</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Tue, Jan 26, 2016 at 2:16 PM, Stanislaw Bogatkin<br>
<<a href="mailto:sbogatkin@mirantis.com">sbogatkin@mirantis.com</a>> wrote:<br>
> When there is too high strata, ntpdate can understand this and always write<br>
> this into its log. In our case there are just no log - ntpdate send first<br>
> packet, get an answer - that's all. So, fudging won't save us, as I think.<br>
> Also, it's a really bad approach to fudge a server which doesn't have a real<br>
> clock onboard.<br>
<br>
Do you have a debug output of the ntpdate somewhere? I'm not finding<br>
it in the bugs or in some of the snapshots for the failures. I did<br>
find one snapshot with the -v change that didn't have any response<br>
information so maybe it's the other problem where there is some<br>
network connectivity isn't working correctly or the responses are<br>
getting dropped somewhere?<br>
<br>
-Alex<br>
<br>
><br>
> On Tue, Jan 26, 2016 at 10:41 PM, Alex Schultz <<a href="mailto:aschultz@mirantis.com">aschultz@mirantis.com</a>><br>
> wrote:<br>
>><br>
>> On Tue, Jan 26, 2016 at 11:42 AM, Stanislaw Bogatkin<br>
>> <<a href="mailto:sbogatkin@mirantis.com">sbogatkin@mirantis.com</a>> wrote:<br>
>> > Hi guys,<br>
>> ><br>
>> > for some time we have a bug [0] with ntpdate. It doesn't reproduced 100%<br>
>> > of<br>
>> > time, but breaks our BVT and swarm tests. There is no exact point where<br>
>> > problem root located. To better understand this, some verbosity to<br>
>> > ntpdate<br>
>> > output was added but in logs we can see only that packet exchange<br>
>> > between<br>
>> > ntpdate and server was started and was never completed.<br>
>> ><br>
>><br>
>> So when I've hit this in my local environments there is usually one or<br>
>> two possible causes for this. 1) lack of network connectivity so ntp<br>
>> server never responds or 2) the stratum is too high. My assumption is<br>
>> that we're running into #2 because of our revert-resume in testing.<br>
>> When we resume, the ntp server on the master may take a while to<br>
>> become stable. This sync in the deployment uses the fuel master for<br>
>> synchronization so if the stratum is too high, it will fail with this<br>
>> lovely useless error. My assumption on what is happening is that<br>
>> because we aren't using a set of internal ntp servers but rather<br>
>> relying on the standard <a href="http://ntp.org" rel="noreferrer" target="_blank">ntp.org</a> pools. So when the master is being<br>
>> resumed it's struggling to find a good enough set of servers so it<br>
>> takes a while to sync. This then causes these deployment tasks to fail<br>
>> because the master has not yet stabilized (might also be geolocation<br>
>> related). We could either address this by fudging the stratum on the<br>
>> master server in the configs or possibly introducing our own more<br>
>> stable local ntp servers. I have a feeling fudging the stratum might<br>
>> be better when we only use the master in our ntp configuration.<br>
>><br>
>> > As this bug is blocker, I propose to merge [1] to better understanding<br>
>> > what's going on. I created custom ISO with this patchset and tried to<br>
>> > run<br>
>> > about 10 BVT tests on this ISO. Absolutely with no luck. So, if we will<br>
>> > merge this, we would catch the problem much faster and understand root<br>
>> > cause.<br>
>> ><br>
>><br>
>> I think we should merge the increased logging patch anyway because<br>
>> it'll be useful in troubleshooting but we also might want to look into<br>
>> getting an ntp peers list added into the snapshot.<br>
>><br>
>> > I appreciate your answers, folks.<br>
>> ><br>
>> ><br>
>> > [0] <a href="https://bugs.launchpad.net/fuel/+bug/1533082" rel="noreferrer" target="_blank">https://bugs.launchpad.net/fuel/+bug/1533082</a><br>
>> > [1] <a href="https://review.openstack.org/#/c/271219/" rel="noreferrer" target="_blank">https://review.openstack.org/#/c/271219/</a><br>
>> > --<br>
>> > with best regards,<br>
>> > Stan.<br>
>> ><br>
>><br>
>> Thanks,<br>
>> -Alex<br>
>><br>
>> __________________________________________________________________________<br>
>> OpenStack Development Mailing List (not for usage questions)<br>
>> Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>
>> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
><br>
><br>
><br>
><br>
> --<br>
> with best regards,<br>
> Stan.<br>
><br>
> __________________________________________________________________________<br>
> OpenStack Development Mailing List (not for usage questions)<br>
> Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>
> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
><br>
<br>
__________________________________________________________________________<br>
OpenStack Development Mailing List (not for usage questions)<br>
Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
</blockquote></div>