[openstack-dev] [Fuel][Bugs] Time sync problem when testing.
aschultz at mirantis.com
Tue Jan 26 21:54:39 UTC 2016
On Tue, Jan 26, 2016 at 2:16 PM, Stanislaw Bogatkin
<sbogatkin at mirantis.com> wrote:
> When there is too high strata, ntpdate can understand this and always write
> this into its log. In our case there are just no log - ntpdate send first
> packet, get an answer - that's all. So, fudging won't save us, as I think.
> Also, it's a really bad approach to fudge a server which doesn't have a real
> clock onboard.
Do you have a debug output of the ntpdate somewhere? I'm not finding
it in the bugs or in some of the snapshots for the failures. I did
find one snapshot with the -v change that didn't have any response
information so maybe it's the other problem where there is some
network connectivity isn't working correctly or the responses are
getting dropped somewhere?
> On Tue, Jan 26, 2016 at 10:41 PM, Alex Schultz <aschultz at mirantis.com>
>> On Tue, Jan 26, 2016 at 11:42 AM, Stanislaw Bogatkin
>> <sbogatkin at mirantis.com> wrote:
>> > Hi guys,
>> > for some time we have a bug  with ntpdate. It doesn't reproduced 100%
>> > of
>> > time, but breaks our BVT and swarm tests. There is no exact point where
>> > problem root located. To better understand this, some verbosity to
>> > ntpdate
>> > output was added but in logs we can see only that packet exchange
>> > between
>> > ntpdate and server was started and was never completed.
>> So when I've hit this in my local environments there is usually one or
>> two possible causes for this. 1) lack of network connectivity so ntp
>> server never responds or 2) the stratum is too high. My assumption is
>> that we're running into #2 because of our revert-resume in testing.
>> When we resume, the ntp server on the master may take a while to
>> become stable. This sync in the deployment uses the fuel master for
>> synchronization so if the stratum is too high, it will fail with this
>> lovely useless error. My assumption on what is happening is that
>> because we aren't using a set of internal ntp servers but rather
>> relying on the standard ntp.org pools. So when the master is being
>> resumed it's struggling to find a good enough set of servers so it
>> takes a while to sync. This then causes these deployment tasks to fail
>> because the master has not yet stabilized (might also be geolocation
>> related). We could either address this by fudging the stratum on the
>> master server in the configs or possibly introducing our own more
>> stable local ntp servers. I have a feeling fudging the stratum might
>> be better when we only use the master in our ntp configuration.
>> > As this bug is blocker, I propose to merge  to better understanding
>> > what's going on. I created custom ISO with this patchset and tried to
>> > run
>> > about 10 BVT tests on this ISO. Absolutely with no luck. So, if we will
>> > merge this, we would catch the problem much faster and understand root
>> > cause.
>> I think we should merge the increased logging patch anyway because
>> it'll be useful in troubleshooting but we also might want to look into
>> getting an ntp peers list added into the snapshot.
>> > I appreciate your answers, folks.
>> >  https://bugs.launchpad.net/fuel/+bug/1533082
>> >  https://review.openstack.org/#/c/271219/
>> > --
>> > with best regards,
>> > Stan.
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> with best regards,
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
More information about the OpenStack-dev