<html>
  <head>
    <meta content="text/html; charset=windows-1252"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <br>
    <br>
    <div class="moz-cite-prefix">Le 24/12/2015 02:35, Morgan Fainberg a
      écrit :<br>
    </div>
    <blockquote
cite="mid:CAGnj6atLZw_QWZk+dOMaZcLGxXgOgdndRaKDM8L3n=RruHjCXg@mail.gmail.com"
      type="cite">
      <div dir="ltr"><br>
        <div class="gmail_extra"><br>
          <div class="gmail_quote">On Wed, Dec 23, 2015 at 10:32 AM, Jay
            Pipes <span dir="ltr"><<a moz-do-not-send="true"
                href="mailto:jaypipes@gmail.com" target="_blank">jaypipes@gmail.com</a>></span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex"><span
                class="">On 12/23/2015 12:27 PM, Lars Kellogg-Stedman
                wrote:<br>
                <blockquote class="gmail_quote" style="margin:0 0 0
                  .8ex;border-left:1px #ccc solid;padding-left:1ex">
                  I've been looking into the startup constraints
                  involved when launching<br>
                  Nova services with systemd using Type=notify (which
                  causes systemd to<br>
                  wait for an explicit notification from the service
                  before considering<br>
                  it to be "started".  Some services (e.g.,
                  nova-conductor) will happily<br>
                  "start" even if the backing database is currently
                  unavailable (and<br>
                  will enter a retry loop waiting for the database).<br>
                  <br>
                  Other services -- specifically, nova-scheduler -- will
                  block waiting<br>
                  for the database *before* providing systemd with the
                  necessary<br>
                  notification.<br>
                  <br>
                  nova-scheduler blocks because it wants to initialize a
                  list of<br>
                  available aggregates (in
                  scheduler.host_manager.HostManager.__init__),<br>
                  which it gets by calling
                  objects.AggregateList.get_all.<br>
                  <br>
                  Does it make sense to block service startup at this
                  stage?  The<br>
                  database disappearing during runtime isn't a hard
                  error -- we will<br>
                  retry and reconnect when it comes back -- so should
                  the same situation<br>
                  at startup be a hard error?  As an operator, I am more
                  interested in<br>
                  "did my configuration files parse correctly?" at
                  startup, and would<br>
                  generally prefer the service to start (and permit any
                  dependent<br>
                  services to start) even when the database isn't up
                  (because that's<br>
                  probably a situation of which I am already aware).<br>
                </blockquote>
                <br>
              </span>
              If your configuration file parsed correctly but has the
              wrong database connection URI, what good is the service in
              an active state? It won't be able to do anything at all.<br>
              <br>
              This is why I think it's better to have hard checks like
              for connections on startup and not have services active if
              they won't be able to do anything useful.<span class=""><br>
                <br>
              </span></blockquote>
            <div><br>
            </div>
            <div>Are you advocating that scheduler bails out and ceases
              to run or that it doesn't mark itself as active? I am in
              favour of the second scenario but not the first. There are
              cases where it would be nice to start the scheduler and
              have it at least report "hey I can't contact the DB" but
              not mark itself active, but continue to run and on
              <interval> report/try to reconnect.<br>
              <br>
            </div>
            <div>It isn't clear which level of "hard check" you're
              advocating in your response and I want to clarify for the
              sake of conversation. <br>
            </div>
            <div> </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    So, to be clear, the scheduler calls the DB to get the list of
    aggregates and instances for not calling the DB anytime a filter
    wants to check those, but rather look at in-memory.<br>
    While it means that it's only needed for the above filters, it still
    means that if the DB is ill, the scheduler wouldn't work - just
    because even if the service is running, any request call to the
    scheduler would return an exception.<br>
    <br>
    So, what's better, you think ? Having a scheduler saying in an error
    log "heh cool, the DB is bad, but okay, you can call me" or rather
    "meh, you have a config issue, please review it" ?<br>
    <br>
    to be honest, we can maybe have a better way to document why the
    scheduler is not starting when it's not possible to call the DB, but
    I'm not sure it's good to have a scheduler resilitient vs. the DB.<br>
    <br>
    -Sylvain<br>
    <br>
    <blockquote
cite="mid:CAGnj6atLZw_QWZk+dOMaZcLGxXgOgdndRaKDM8L3n=RruHjCXg@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex"><span
                class="">
                <blockquote class="gmail_quote" style="margin:0 0 0
                  .8ex;border-left:1px #ccc solid;padding-left:1ex">
                  It would be relatively easy to have the scheduler
                  lazy-load the list<br>
                  of aggregates on first references, rather than at
                  __init__.<br>
                </blockquote>
                <br>
              </span>
              Sure, but if the root cause of the issue is a problem due
              to misconfigured connection string, then that lazy-load
              will just bomb out and the scheduler will be useless
              anyway. I'd rather have a fail-early/fast occur here than
              a fail-late.<br>
              <br>
              Best,<br>
              -jay<br>
              <br>
              > I'm not<br>
              <blockquote class="gmail_quote" style="margin:0 0 0
                .8ex;border-left:1px #ccc solid;padding-left:1ex"><span
                  class="">
                  familiar enough with the nova code to know if there
                  would be any<br>
                  undesirable implications of this behavior.  We're
                  already punting<br>
                  initializing the list of instances to an asynchronous
                  task in order to<br>
                  avoid blocking service startup.<br>
                  <br>
                  Does it make sense to permit nova-scheduler to
                  complete service<br>
                  startup in the absence of the database (and then retry
                  the connection<br>
                  in the background)?<br>
                  <br>
                  <br>
                  <br>
                </span>
__________________________________________________________________________<br>
                OpenStack Development Mailing List (not for usage
                questions)<br>
                Unsubscribe: <a moz-do-not-send="true"
href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe"
                  rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>
                <a moz-do-not-send="true"
                  href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev"
                  rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
                <br>
              </blockquote>
              <br>
__________________________________________________________________________<br>
              OpenStack Development Mailing List (not for usage
              questions)<br>
              Unsubscribe: <a moz-do-not-send="true"
href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe"
                rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>
              <a moz-do-not-send="true"
                href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev"
                rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
            </blockquote>
          </div>
          <br>
        </div>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: <a class="moz-txt-link-abbreviated" href="mailto:OpenStack-dev-request@lists.openstack.org?subject:unsubscribe">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a>
<a class="moz-txt-link-freetext" href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>