[openstack-dev] [Fuel][Bugs] Time sync problem when testing.

Stanislaw Bogatkin sbogatkin at mirantis.com
Wed Jan 27 10:49:47 UTC 2016


>But you've used 'logger -t ntpdate' - this is can fail again and logs can
be empty again.
What do you mean by 'fall again'? Piping to logger uses standard blocking
I/O - logger gets
all the output it can reach, so it get all output strace will produce. If
ntpdate will hang for some
reason - we should see it in strace output. If ntpdate will exit - we will
see this too.

On Wed, Jan 27, 2016 at 12:57 PM, Maksim Malchuk <mmalchuk at mirantis.com>
wrote:

> But you've used 'logger -t ntpdate' - this is can fail again and logs can
> be empty again.
> My opinion we should use output redirection to the log-file directly.
>
>
> On Wed, Jan 27, 2016 at 11:21 AM, Stanislaw Bogatkin <
> sbogatkin at mirantis.com> wrote:
>
>> Yes, I have created custom iso with debug output. It didn't help, so
>> another one with strace was created.
>> On Jan 27, 2016 00:56, "Alex Schultz" <aschultz at mirantis.com> wrote:
>>
>>> On Tue, Jan 26, 2016 at 2:16 PM, Stanislaw Bogatkin
>>> <sbogatkin at mirantis.com> wrote:
>>> > When there is too high strata, ntpdate can understand this and always
>>> write
>>> > this into its log. In our case there are just no log - ntpdate send
>>> first
>>> > packet, get an answer - that's all. So, fudging won't save us, as I
>>> think.
>>> > Also, it's a really bad approach to fudge a server which doesn't have
>>> a real
>>> > clock onboard.
>>>
>>> Do you have a debug output of the ntpdate somewhere? I'm not finding
>>> it in the bugs or in some of the snapshots for the failures. I did
>>> find one snapshot with the -v change that didn't have any response
>>> information so maybe it's the other problem where there is some
>>> network connectivity isn't working correctly or the responses are
>>> getting dropped somewhere?
>>>
>>> -Alex
>>>
>>> >
>>> > On Tue, Jan 26, 2016 at 10:41 PM, Alex Schultz <aschultz at mirantis.com>
>>> > wrote:
>>> >>
>>> >> On Tue, Jan 26, 2016 at 11:42 AM, Stanislaw Bogatkin
>>> >> <sbogatkin at mirantis.com> wrote:
>>> >> > Hi guys,
>>> >> >
>>> >> > for some time we have a bug [0] with ntpdate. It doesn't reproduced
>>> 100%
>>> >> > of
>>> >> > time, but breaks our BVT and swarm tests. There is no exact point
>>> where
>>> >> > problem root located. To better understand this, some verbosity to
>>> >> > ntpdate
>>> >> > output was added but in logs we can see only that packet exchange
>>> >> > between
>>> >> > ntpdate and server was started and was never completed.
>>> >> >
>>> >>
>>> >> So when I've hit this in my local environments there is usually one or
>>> >> two possible causes for this. 1) lack of network connectivity so ntp
>>> >> server never responds or 2) the stratum is too high.  My assumption is
>>> >> that we're running into #2 because of our revert-resume in testing.
>>> >> When we resume, the ntp server on the master may take a while to
>>> >> become stable. This sync in the deployment uses the fuel master for
>>> >> synchronization so if the stratum is too high, it will fail with this
>>> >> lovely useless error.  My assumption on what is happening is that
>>> >> because we aren't using a set of internal ntp servers but rather
>>> >> relying on the standard ntp.org pools.  So when the master is being
>>> >> resumed it's struggling to find a good enough set of servers so it
>>> >> takes a while to sync. This then causes these deployment tasks to fail
>>> >> because the master has not yet stabilized (might also be geolocation
>>> >> related).  We could either address this by fudging the stratum on the
>>> >> master server in the configs or possibly introducing our own more
>>> >> stable local ntp servers. I have a feeling fudging the stratum might
>>> >> be better when we only use the master in our ntp configuration.
>>> >>
>>> >> > As this bug is blocker, I propose to merge [1] to better
>>> understanding
>>> >> > what's going on. I created custom ISO with this patchset and tried
>>> to
>>> >> > run
>>> >> > about 10 BVT tests on this ISO. Absolutely with no luck. So, if we
>>> will
>>> >> > merge this, we would catch the problem much faster and understand
>>> root
>>> >> > cause.
>>> >> >
>>> >>
>>> >> I think we should merge the increased logging patch anyway because
>>> >> it'll be useful in troubleshooting but we also might want to look into
>>> >> getting an ntp peers list added into the snapshot.
>>> >>
>>> >> > I appreciate your answers, folks.
>>> >> >
>>> >> >
>>> >> > [0] https://bugs.launchpad.net/fuel/+bug/1533082
>>> >> > [1] https://review.openstack.org/#/c/271219/
>>> >> > --
>>> >> > with best regards,
>>> >> > Stan.
>>> >> >
>>> >>
>>> >> Thanks,
>>> >> -Alex
>>> >>
>>> >>
>>> __________________________________________________________________________
>>> >> OpenStack Development Mailing List (not for usage questions)
>>> >> Unsubscribe:
>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > with best regards,
>>> > Stan.
>>> >
>>> >
>>> __________________________________________________________________________
>>> > OpenStack Development Mailing List (not for usage questions)
>>> > Unsubscribe:
>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>> >
>>>
>>>
>>> __________________________________________________________________________
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe:
>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
>
> --
> Best Regards,
> Maksim Malchuk,
> Senior DevOps Engineer,
> MOS: Product Engineering,
> Mirantis, Inc
> <vgordon at mirantis.com>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>


-- 
with best regards,
Stan.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160127/f75068db/attachment.html>


More information about the OpenStack-dev mailing list