Open Stack

Wed Jan 27 09:57:56 UTC 2016

But you've used 'logger -t ntpdate' - this is can fail again and logs can
be empty again.
My opinion we should use output redirection to the log-file directly.

On Wed, Jan 27, 2016 at 11:21 AM, Stanislaw Bogatkin <sbogatkin at mirantis.com
> wrote:

> Yes, I have created custom iso with debug output. It didn't help, so
> another one with strace was created.
> On Jan 27, 2016 00:56, "Alex Schultz" <aschultz at mirantis.com> wrote:
>
>> On Tue, Jan 26, 2016 at 2:16 PM, Stanislaw Bogatkin
>> <sbogatkin at mirantis.com> wrote:
>> > When there is too high strata, ntpdate can understand this and always
>> write
>> > this into its log. In our case there are just no log - ntpdate send
>> first
>> > packet, get an answer - that's all. So, fudging won't save us, as I
>> think.
>> > Also, it's a really bad approach to fudge a server which doesn't have a
>> real
>> > clock onboard.
>>
>> Do you have a debug output of the ntpdate somewhere? I'm not finding
>> it in the bugs or in some of the snapshots for the failures. I did
>> find one snapshot with the -v change that didn't have any response
>> information so maybe it's the other problem where there is some
>> network connectivity isn't working correctly or the responses are
>> getting dropped somewhere?
>>
>> -Alex
>>
>> >
>> > On Tue, Jan 26, 2016 at 10:41 PM, Alex Schultz <aschultz at mirantis.com>
>> > wrote:
>> >>
>> >> On Tue, Jan 26, 2016 at 11:42 AM, Stanislaw Bogatkin
>> >> <sbogatkin at mirantis.com> wrote:
>> >> > Hi guys,
>> >> >
>> >> > for some time we have a bug [0] with ntpdate. It doesn't reproduced
>> 100%
>> >> > of
>> >> > time, but breaks our BVT and swarm tests. There is no exact point
>> where
>> >> > problem root located. To better understand this, some verbosity to
>> >> > ntpdate
>> >> > output was added but in logs we can see only that packet exchange
>> >> > between
>> >> > ntpdate and server was started and was never completed.
>> >> >
>> >>
>> >> So when I've hit this in my local environments there is usually one or
>> >> two possible causes for this. 1) lack of network connectivity so ntp
>> >> server never responds or 2) the stratum is too high.  My assumption is
>> >> that we're running into #2 because of our revert-resume in testing.
>> >> When we resume, the ntp server on the master may take a while to
>> >> become stable. This sync in the deployment uses the fuel master for
>> >> synchronization so if the stratum is too high, it will fail with this
>> >> lovely useless error.  My assumption on what is happening is that
>> >> because we aren't using a set of internal ntp servers but rather
>> >> relying on the standard ntp.org pools.  So when the master is being
>> >> resumed it's struggling to find a good enough set of servers so it
>> >> takes a while to sync. This then causes these deployment tasks to fail
>> >> because the master has not yet stabilized (might also be geolocation
>> >> related).  We could either address this by fudging the stratum on the
>> >> master server in the configs or possibly introducing our own more
>> >> stable local ntp servers. I have a feeling fudging the stratum might
>> >> be better when we only use the master in our ntp configuration.
>> >>
>> >> > As this bug is blocker, I propose to merge [1] to better
>> understanding
>> >> > what's going on. I created custom ISO with this patchset and tried to
>> >> > run
>> >> > about 10 BVT tests on this ISO. Absolutely with no luck. So, if we
>> will
>> >> > merge this, we would catch the problem much faster and understand
>> root
>> >> > cause.
>> >> >
>> >>
>> >> I think we should merge the increased logging patch anyway because
>> >> it'll be useful in troubleshooting but we also might want to look into
>> >> getting an ntp peers list added into the snapshot.
>> >>
>> >> > I appreciate your answers, folks.
>> >> >
>> >> >
>> >> > [0] https://bugs.launchpad.net/fuel/+bug/1533082
>> >> > [1] https://review.openstack.org/#/c/271219/
>> >> > --
>> >> > with best regards,
>> >> > Stan.
>> >> >
>> >>
>> >> Thanks,
>> >> -Alex
>> >>
>> >>
>> __________________________________________________________________________
>> >> OpenStack Development Mailing List (not for usage questions)
>> >> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> >
>> >
>> >
>> >
>> > --
>> > with best regards,
>> > Stan.
>> >
>> >
>> __________________________________________________________________________
>> > OpenStack Development Mailing List (not for usage questions)
>> > Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> >
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>

-- 
Best Regards,
Maksim Malchuk,
Senior DevOps Engineer,
MOS: Product Engineering,
Mirantis, Inc
<vgordon at mirantis.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160127/6201608d/attachment.html>

Open Stack

[openstack-dev] [Fuel][Bugs] Time sync problem when testing.

OpenStack

Community

Documentation

Branding & Legal