<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Dec 23, 2015 at 10:32 AM, Jay Pipes <span dir="ltr"><<a href="mailto:jaypipes@gmail.com" target="_blank">jaypipes@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 12/23/2015 12:27 PM, Lars Kellogg-Stedman wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

I've been looking into the startup constraints involved when launching<br>

Nova services with systemd using Type=notify (which causes systemd to<br>

wait for an explicit notification from the service before considering<br>

it to be "started".  Some services (e.g., nova-conductor) will happily<br>

"start" even if the backing database is currently unavailable (and<br>

will enter a retry loop waiting for the database).<br>

<br>

Other services -- specifically, nova-scheduler -- will block waiting<br>

for the database *before* providing systemd with the necessary<br>

notification.<br>

<br>

nova-scheduler blocks because it wants to initialize a list of<br>

available aggregates (in scheduler.host_manager.HostManager.__init__),<br>

which it gets by calling objects.AggregateList.get_all.<br>

<br>

Does it make sense to block service startup at this stage?  The<br>

database disappearing during runtime isn't a hard error -- we will<br>

retry and reconnect when it comes back -- so should the same situation<br>

at startup be a hard error?  As an operator, I am more interested in<br>

"did my configuration files parse correctly?" at startup, and would<br>

generally prefer the service to start (and permit any dependent<br>

services to start) even when the database isn't up (because that's<br>

probably a situation of which I am already aware).<br>

</blockquote>

<br></span>

If your configuration file parsed correctly but has the wrong database connection URI, what good is the service in an active state? It won't be able to do anything at all.<br>

<br>

This is why I think it's better to have hard checks like for connections on startup and not have services active if they won't be able to do anything useful.<span class=""><br>

<br></span></blockquote><div><br></div><div>Are you advocating that scheduler bails out and ceases to run or that it doesn't mark itself as active? I am in favour of the second scenario but not the first. There are cases where it would be nice to start the scheduler and have it at least report "hey I can't contact the DB" but not mark itself active, but continue to run and on <interval> report/try to reconnect.<br><br></div><div>It isn't clear which level of "hard check" you're advocating in your response and I want to clarify for the sake of conversation. <br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

It would be relatively easy to have the scheduler lazy-load the list<br>

of aggregates on first references, rather than at __init__.<br>

</blockquote>

<br></span>

Sure, but if the root cause of the issue is a problem due to misconfigured connection string, then that lazy-load will just bomb out and the scheduler will be useless anyway. I'd rather have a fail-early/fast occur here than a fail-late.<br>

<br>

Best,<br>

-jay<br>

<br>

> I'm not<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">

familiar enough with the nova code to know if there would be any<br>

undesirable implications of this behavior.  We're already punting<br>

initializing the list of instances to an asynchronous task in order to<br>

avoid blocking service startup.<br>

<br>

Does it make sense to permit nova-scheduler to complete service<br>

startup in the absence of the database (and then retry the connection<br>

in the background)?<br>

<br>

<br>

<br></span>

__________________________________________________________________________<br>

OpenStack Development Mailing List (not for usage questions)<br>

Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

<br>

</blockquote>

<br>

__________________________________________________________________________<br>

OpenStack Development Mailing List (not for usage questions)<br>

Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

</blockquote></div><br></div></div>