[openstack-dev] [Fuel][Bugs] Time sync problem when testing.

Maksim Malchuk mmalchuk at mirantis.com
Wed Jan 27 11:25:12 UTC 2016


I think we shouldn't depend on the other services like Syslog and logger
trying to catch the problem and it is better to create the logs ourselves.


On Wed, Jan 27, 2016 at 1:49 PM, Stanislaw Bogatkin <sbogatkin at mirantis.com>
wrote:

> >But you've used 'logger -t ntpdate' - this is can fail again and logs can
> be empty again.
> What do you mean by 'fall again'? Piping to logger uses standard blocking
> I/O - logger gets
> all the output it can reach, so it get all output strace will produce. If
> ntpdate will hang for some
> reason - we should see it in strace output. If ntpdate will exit - we will
> see this too.
>
> On Wed, Jan 27, 2016 at 12:57 PM, Maksim Malchuk <mmalchuk at mirantis.com>
> wrote:
>
>> But you've used 'logger -t ntpdate' - this is can fail again and logs can
>> be empty again.
>> My opinion we should use output redirection to the log-file directly.
>>
>>
>> On Wed, Jan 27, 2016 at 11:21 AM, Stanislaw Bogatkin <
>> sbogatkin at mirantis.com> wrote:
>>
>>> Yes, I have created custom iso with debug output. It didn't help, so
>>> another one with strace was created.
>>> On Jan 27, 2016 00:56, "Alex Schultz" <aschultz at mirantis.com> wrote:
>>>
>>>> On Tue, Jan 26, 2016 at 2:16 PM, Stanislaw Bogatkin
>>>> <sbogatkin at mirantis.com> wrote:
>>>> > When there is too high strata, ntpdate can understand this and always
>>>> write
>>>> > this into its log. In our case there are just no log - ntpdate send
>>>> first
>>>> > packet, get an answer - that's all. So, fudging won't save us, as I
>>>> think.
>>>> > Also, it's a really bad approach to fudge a server which doesn't have
>>>> a real
>>>> > clock onboard.
>>>>
>>>> Do you have a debug output of the ntpdate somewhere? I'm not finding
>>>> it in the bugs or in some of the snapshots for the failures. I did
>>>> find one snapshot with the -v change that didn't have any response
>>>> information so maybe it's the other problem where there is some
>>>> network connectivity isn't working correctly or the responses are
>>>> getting dropped somewhere?
>>>>
>>>> -Alex
>>>>
>>>> >
>>>> > On Tue, Jan 26, 2016 at 10:41 PM, Alex Schultz <aschultz at mirantis.com
>>>> >
>>>> > wrote:
>>>> >>
>>>> >> On Tue, Jan 26, 2016 at 11:42 AM, Stanislaw Bogatkin
>>>> >> <sbogatkin at mirantis.com> wrote:
>>>> >> > Hi guys,
>>>> >> >
>>>> >> > for some time we have a bug [0] with ntpdate. It doesn't
>>>> reproduced 100%
>>>> >> > of
>>>> >> > time, but breaks our BVT and swarm tests. There is no exact point
>>>> where
>>>> >> > problem root located. To better understand this, some verbosity to
>>>> >> > ntpdate
>>>> >> > output was added but in logs we can see only that packet exchange
>>>> >> > between
>>>> >> > ntpdate and server was started and was never completed.
>>>> >> >
>>>> >>
>>>> >> So when I've hit this in my local environments there is usually one
>>>> or
>>>> >> two possible causes for this. 1) lack of network connectivity so ntp
>>>> >> server never responds or 2) the stratum is too high.  My assumption
>>>> is
>>>> >> that we're running into #2 because of our revert-resume in testing.
>>>> >> When we resume, the ntp server on the master may take a while to
>>>> >> become stable. This sync in the deployment uses the fuel master for
>>>> >> synchronization so if the stratum is too high, it will fail with this
>>>> >> lovely useless error.  My assumption on what is happening is that
>>>> >> because we aren't using a set of internal ntp servers but rather
>>>> >> relying on the standard ntp.org pools.  So when the master is being
>>>> >> resumed it's struggling to find a good enough set of servers so it
>>>> >> takes a while to sync. This then causes these deployment tasks to
>>>> fail
>>>> >> because the master has not yet stabilized (might also be geolocation
>>>> >> related).  We could either address this by fudging the stratum on the
>>>> >> master server in the configs or possibly introducing our own more
>>>> >> stable local ntp servers. I have a feeling fudging the stratum might
>>>> >> be better when we only use the master in our ntp configuration.
>>>> >>
>>>> >> > As this bug is blocker, I propose to merge [1] to better
>>>> understanding
>>>> >> > what's going on. I created custom ISO with this patchset and tried
>>>> to
>>>> >> > run
>>>> >> > about 10 BVT tests on this ISO. Absolutely with no luck. So, if we
>>>> will
>>>> >> > merge this, we would catch the problem much faster and understand
>>>> root
>>>> >> > cause.
>>>> >> >
>>>> >>
>>>> >> I think we should merge the increased logging patch anyway because
>>>> >> it'll be useful in troubleshooting but we also might want to look
>>>> into
>>>> >> getting an ntp peers list added into the snapshot.
>>>> >>
>>>> >> > I appreciate your answers, folks.
>>>> >> >
>>>> >> >
>>>> >> > [0] https://bugs.launchpad.net/fuel/+bug/1533082
>>>> >> > [1] https://review.openstack.org/#/c/271219/
>>>> >> > --
>>>> >> > with best regards,
>>>> >> > Stan.
>>>> >> >
>>>> >>
>>>> >> Thanks,
>>>> >> -Alex
>>>> >>
>>>> >>
>>>> __________________________________________________________________________
>>>> >> OpenStack Development Mailing List (not for usage questions)
>>>> >> Unsubscribe:
>>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > with best regards,
>>>> > Stan.
>>>> >
>>>> >
>>>> __________________________________________________________________________
>>>> > OpenStack Development Mailing List (not for usage questions)
>>>> > Unsubscribe:
>>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>> >
>>>>
>>>>
>>>> __________________________________________________________________________
>>>> OpenStack Development Mailing List (not for usage questions)
>>>> Unsubscribe:
>>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>
>>>
>>>
>>> __________________________________________________________________________
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe:
>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Maksim Malchuk,
>> Senior DevOps Engineer,
>> MOS: Product Engineering,
>> Mirantis, Inc
>> <vgordon at mirantis.com>
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
>
> --
> with best regards,
> Stan.
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>


-- 
Best Regards,
Maksim Malchuk,
Senior DevOps Engineer,
MOS: Product Engineering,
Mirantis, Inc
<vgordon at mirantis.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160127/1f076a38/attachment.html>


More information about the OpenStack-dev mailing list