<html>

  <head>

    <meta content="text/html; charset=windows-1252"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <br>

    <br>

    <div class="moz-cite-prefix">Le 24/12/2015 02:35, Morgan Fainberg a

      écrit :<br>

    </div>

    <blockquote

cite="mid:CAGnj6atLZw_QWZk+dOMaZcLGxXgOgdndRaKDM8L3n=RruHjCXg@mail.gmail.com"

      type="cite">

      <div dir="ltr"><br>

        <div class="gmail_extra"><br>

          <div class="gmail_quote">On Wed, Dec 23, 2015 at 10:32 AM, Jay

            Pipes <span dir="ltr"><<a moz-do-not-send="true"

                href="mailto:jaypipes@gmail.com" target="_blank">jaypipes@gmail.com</a>></span>

            wrote:<br>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex"><span

                class="">On 12/23/2015 12:27 PM, Lars Kellogg-Stedman

                wrote:<br>

                <blockquote class="gmail_quote" style="margin:0 0 0

                  .8ex;border-left:1px #ccc solid;padding-left:1ex">

                  I've been looking into the startup constraints

                  involved when launching<br>

                  Nova services with systemd using Type=notify (which

                  causes systemd to<br>

                  wait for an explicit notification from the service

                  before considering<br>

                  it to be "started".  Some services (e.g.,

                  nova-conductor) will happily<br>

                  "start" even if the backing database is currently

                  unavailable (and<br>

                  will enter a retry loop waiting for the database).<br>

                  <br>

                  Other services -- specifically, nova-scheduler -- will

                  block waiting<br>

                  for the database *before* providing systemd with the

                  necessary<br>

                  notification.<br>

                  <br>

                  nova-scheduler blocks because it wants to initialize a

                  list of<br>

                  available aggregates (in

                  scheduler.host_manager.HostManager.__init__),<br>

                  which it gets by calling

                  objects.AggregateList.get_all.<br>

                  <br>

                  Does it make sense to block service startup at this

                  stage?  The<br>

                  database disappearing during runtime isn't a hard

                  error -- we will<br>

                  retry and reconnect when it comes back -- so should

                  the same situation<br>

                  at startup be a hard error?  As an operator, I am more

                  interested in<br>

                  "did my configuration files parse correctly?" at

                  startup, and would<br>

                  generally prefer the service to start (and permit any

                  dependent<br>

                  services to start) even when the database isn't up

                  (because that's<br>

                  probably a situation of which I am already aware).<br>

                </blockquote>

                <br>

              </span>

              If your configuration file parsed correctly but has the

              wrong database connection URI, what good is the service in

              an active state? It won't be able to do anything at all.<br>

              <br>

              This is why I think it's better to have hard checks like

              for connections on startup and not have services active if

              they won't be able to do anything useful.<span class=""><br>

                <br>

              </span></blockquote>

            <div><br>

            </div>

            <div>Are you advocating that scheduler bails out and ceases

              to run or that it doesn't mark itself as active? I am in

              favour of the second scenario but not the first. There are

              cases where it would be nice to start the scheduler and

              have it at least report "hey I can't contact the DB" but

              not mark itself active, but continue to run and on

              <interval> report/try to reconnect.<br>

              <br>

            </div>

            <div>It isn't clear which level of "hard check" you're

              advocating in your response and I want to clarify for the

              sake of conversation. <br>

            </div>

            <div> </div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    So, to be clear, the scheduler calls the DB to get the list of

    aggregates and instances for not calling the DB anytime a filter

    wants to check those, but rather look at in-memory.<br>

    While it means that it's only needed for the above filters, it still

    means that if the DB is ill, the scheduler wouldn't work - just

    because even if the service is running, any request call to the

    scheduler would return an exception.<br>

    <br>

    So, what's better, you think ? Having a scheduler saying in an error

    log "heh cool, the DB is bad, but okay, you can call me" or rather

    "meh, you have a config issue, please review it" ?<br>

    <br>

    to be honest, we can maybe have a better way to document why the

    scheduler is not starting when it's not possible to call the DB, but

    I'm not sure it's good to have a scheduler resilitient vs. the DB.<br>

    <br>

    -Sylvain<br>

    <br>

    <blockquote

cite="mid:CAGnj6atLZw_QWZk+dOMaZcLGxXgOgdndRaKDM8L3n=RruHjCXg@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex"><span

                class="">

                <blockquote class="gmail_quote" style="margin:0 0 0

                  .8ex;border-left:1px #ccc solid;padding-left:1ex">

                  It would be relatively easy to have the scheduler

                  lazy-load the list<br>

                  of aggregates on first references, rather than at

                  __init__.<br>

                </blockquote>

                <br>

              </span>

              Sure, but if the root cause of the issue is a problem due

              to misconfigured connection string, then that lazy-load

              will just bomb out and the scheduler will be useless

              anyway. I'd rather have a fail-early/fast occur here than

              a fail-late.<br>

              <br>

              Best,<br>

              -jay<br>

              <br>

              > I'm not<br>

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex"><span

                  class="">

                  familiar enough with the nova code to know if there

                  would be any<br>

                  undesirable implications of this behavior.  We're

                  already punting<br>

                  initializing the list of instances to an asynchronous

                  task in order to<br>

                  avoid blocking service startup.<br>

                  <br>

                  Does it make sense to permit nova-scheduler to

                  complete service<br>

                  startup in the absence of the database (and then retry

                  the connection<br>

                  in the background)?<br>

                  <br>

                  <br>

                  <br>

                </span>

__________________________________________________________________________<br>

                OpenStack Development Mailing List (not for usage

                questions)<br>

                Unsubscribe: <a moz-do-not-send="true"

href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe"

                  rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>

                <a moz-do-not-send="true"

                  href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev"

                  rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

                <br>

              </blockquote>

              <br>

__________________________________________________________________________<br>

              OpenStack Development Mailing List (not for usage

              questions)<br>

              Unsubscribe: <a moz-do-not-send="true"

href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe"

                rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>

              <a moz-do-not-send="true"

                href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev"

                rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

            </blockquote>

          </div>

          <br>

        </div>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">__________________________________________________________________________

OpenStack Development Mailing List (not for usage questions)

Unsubscribe: <a class="moz-txt-link-abbreviated" href="mailto:OpenStack-dev-request@lists.openstack.org?subject:unsubscribe">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a>

<a class="moz-txt-link-freetext" href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a>

</pre>

    </blockquote>

    <br>

  </body>

</html>