<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<br>
<br>
<div class="moz-cite-prefix">Le 24/12/2015 02:35, Morgan Fainberg a
écrit :<br>
</div>
<blockquote
cite="mid:CAGnj6atLZw_QWZk+dOMaZcLGxXgOgdndRaKDM8L3n=RruHjCXg@mail.gmail.com"
type="cite">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Wed, Dec 23, 2015 at 10:32 AM, Jay
Pipes <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:jaypipes@gmail.com" target="_blank">jaypipes@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex"><span
class="">On 12/23/2015 12:27 PM, Lars Kellogg-Stedman
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
I've been looking into the startup constraints
involved when launching<br>
Nova services with systemd using Type=notify (which
causes systemd to<br>
wait for an explicit notification from the service
before considering<br>
it to be "started". Some services (e.g.,
nova-conductor) will happily<br>
"start" even if the backing database is currently
unavailable (and<br>
will enter a retry loop waiting for the database).<br>
<br>
Other services -- specifically, nova-scheduler -- will
block waiting<br>
for the database *before* providing systemd with the
necessary<br>
notification.<br>
<br>
nova-scheduler blocks because it wants to initialize a
list of<br>
available aggregates (in
scheduler.host_manager.HostManager.__init__),<br>
which it gets by calling
objects.AggregateList.get_all.<br>
<br>
Does it make sense to block service startup at this
stage? The<br>
database disappearing during runtime isn't a hard
error -- we will<br>
retry and reconnect when it comes back -- so should
the same situation<br>
at startup be a hard error? As an operator, I am more
interested in<br>
"did my configuration files parse correctly?" at
startup, and would<br>
generally prefer the service to start (and permit any
dependent<br>
services to start) even when the database isn't up
(because that's<br>
probably a situation of which I am already aware).<br>
</blockquote>
<br>
</span>
If your configuration file parsed correctly but has the
wrong database connection URI, what good is the service in
an active state? It won't be able to do anything at all.<br>
<br>
This is why I think it's better to have hard checks like
for connections on startup and not have services active if
they won't be able to do anything useful.<span class=""><br>
<br>
</span></blockquote>
<div><br>
</div>
<div>Are you advocating that scheduler bails out and ceases
to run or that it doesn't mark itself as active? I am in
favour of the second scenario but not the first. There are
cases where it would be nice to start the scheduler and
have it at least report "hey I can't contact the DB" but
not mark itself active, but continue to run and on
<interval> report/try to reconnect.<br>
<br>
</div>
<div>It isn't clear which level of "hard check" you're
advocating in your response and I want to clarify for the
sake of conversation. <br>
</div>
<div> </div>
</div>
</div>
</div>
</blockquote>
<br>
So, to be clear, the scheduler calls the DB to get the list of
aggregates and instances for not calling the DB anytime a filter
wants to check those, but rather look at in-memory.<br>
While it means that it's only needed for the above filters, it still
means that if the DB is ill, the scheduler wouldn't work - just
because even if the service is running, any request call to the
scheduler would return an exception.<br>
<br>
So, what's better, you think ? Having a scheduler saying in an error
log "heh cool, the DB is bad, but okay, you can call me" or rather
"meh, you have a config issue, please review it" ?<br>
<br>
to be honest, we can maybe have a better way to document why the
scheduler is not starting when it's not possible to call the DB, but
I'm not sure it's good to have a scheduler resilitient vs. the DB.<br>
<br>
-Sylvain<br>
<br>
<blockquote
cite="mid:CAGnj6atLZw_QWZk+dOMaZcLGxXgOgdndRaKDM8L3n=RruHjCXg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex"><span
class="">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
It would be relatively easy to have the scheduler
lazy-load the list<br>
of aggregates on first references, rather than at
__init__.<br>
</blockquote>
<br>
</span>
Sure, but if the root cause of the issue is a problem due
to misconfigured connection string, then that lazy-load
will just bomb out and the scheduler will be useless
anyway. I'd rather have a fail-early/fast occur here than
a fail-late.<br>
<br>
Best,<br>
-jay<br>
<br>
> I'm not<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex"><span
class="">
familiar enough with the nova code to know if there
would be any<br>
undesirable implications of this behavior. We're
already punting<br>
initializing the list of instances to an asynchronous
task in order to<br>
avoid blocking service startup.<br>
<br>
Does it make sense to permit nova-scheduler to
complete service<br>
startup in the absence of the database (and then retry
the connection<br>
in the background)?<br>
<br>
<br>
<br>
</span>
__________________________________________________________________________<br>
OpenStack Development Mailing List (not for usage
questions)<br>
Unsubscribe: <a moz-do-not-send="true"
href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe"
rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>
<a moz-do-not-send="true"
href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev"
rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
<br>
</blockquote>
<br>
__________________________________________________________________________<br>
OpenStack Development Mailing List (not for usage
questions)<br>
Unsubscribe: <a moz-do-not-send="true"
href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe"
rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>
<a moz-do-not-send="true"
href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev"
rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
</blockquote>
</div>
<br>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: <a class="moz-txt-link-abbreviated" href="mailto:OpenStack-dev-request@lists.openstack.org?subject:unsubscribe">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a>
<a class="moz-txt-link-freetext" href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a>
</pre>
</blockquote>
<br>
</body>
</html>