Open Stack

Tue Jan 26 19:41:14 UTC 2016

On Tue, Jan 26, 2016 at 11:42 AM, Stanislaw Bogatkin
<sbogatkin at mirantis.com> wrote:
> Hi guys,
>
> for some time we have a bug [0] with ntpdate. It doesn't reproduced 100% of
> time, but breaks our BVT and swarm tests. There is no exact point where
> problem root located. To better understand this, some verbosity to ntpdate
> output was added but in logs we can see only that packet exchange between
> ntpdate and server was started and was never completed.
>

So when I've hit this in my local environments there is usually one or
two possible causes for this. 1) lack of network connectivity so ntp
server never responds or 2) the stratum is too high.  My assumption is
that we're running into #2 because of our revert-resume in testing.
When we resume, the ntp server on the master may take a while to
become stable. This sync in the deployment uses the fuel master for
synchronization so if the stratum is too high, it will fail with this
lovely useless error.  My assumption on what is happening is that
because we aren't using a set of internal ntp servers but rather
relying on the standard ntp.org pools.  So when the master is being
resumed it's struggling to find a good enough set of servers so it
takes a while to sync. This then causes these deployment tasks to fail
because the master has not yet stabilized (might also be geolocation
related).  We could either address this by fudging the stratum on the
master server in the configs or possibly introducing our own more
stable local ntp servers. I have a feeling fudging the stratum might
be better when we only use the master in our ntp configuration.

> As this bug is blocker, I propose to merge [1] to better understanding
> what's going on. I created custom ISO with this patchset and tried to run
> about 10 BVT tests on this ISO. Absolutely with no luck. So, if we will
> merge this, we would catch the problem much faster and understand root
> cause.
>

I think we should merge the increased logging patch anyway because
it'll be useful in troubleshooting but we also might want to look into
getting an ntp peers list added into the snapshot.

> I appreciate your answers, folks.
>
>
> [0] https://bugs.launchpad.net/fuel/+bug/1533082
> [1] https://review.openstack.org/#/c/271219/
> --
> with best regards,
> Stan.
>

Thanks,
-Alex

Open Stack

[openstack-dev] [Fuel][Bugs] Time sync problem when testing.

OpenStack

Community

Documentation

Branding & Legal