<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">On 05/13/2015 09:51 AM, Simon Pasquier
      wrote:<br>
    </div>
    <blockquote
cite="mid:CAOq3GZU1cB+fAJMxKAZ86+p_69s2Px5YsdVKTTjS0gyav=y0Sg@mail.gmail.com"
      type="cite">
      <meta http-equiv="Context-Type" content="text/html; charset=UTF-8">
      <div dir="ltr"><br>
        <div class="gmail_extra"><br>
          <div class="gmail_quote">On Wed, May 13, 2015 at 3:27 PM,
            David Kranz <span dir="ltr"><<a moz-do-not-send="true"
                href="mailto:dkranz@redhat.com" target="_blank">dkranz@redhat.com</a>></span>
            wrote:<br>
            <blockquote class="gmail_quote">
              <div><span class="">
                  <div>On 05/13/2015 09:06 AM, Simon Pasquier wrote:<br>
                  </div>
                  <blockquote type="cite">
                    <div dir="ltr">
                      <div>
                        <div>
                          <div>
                            <div>Hello,<br>
                              <br>
                              Like many others commented before, I don't
                              quite understand how unique are the
                              Cloudpulse use cases.<br>
                              <br>
                              For operators, I got the feeling that
                              existing solutions fit well:<br>
                              - Traditional monitoring tools (Nagios,
                              Zabbix, ....) are necessary anyway for
                              infrastructure monitoring (CPU, RAM,
                              disks, operating system, RabbitMQ,
                              databases and more) and diagnostic
                              purposes. Adding OpenStack service checks
                              is fairly easy if you already have the
                              toolchain.<br>
                            </div>
                          </div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                </span> Is it really so easy? Rabbitmq has an
                "aliveness" test that is easy to hook into. I don't know
                exactly what it does, other than what the doc says, but
                I should not have to. If I want my standard monitoring
                system to call into a cloud and ask "is nova healthy?",
                "is glance healthy?", etc. are their such calls? <br>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>Regarding RabbitMQ aliveness test, it has its own
              limits (more on that latter, I've got an "interesting"
              RabbitMQ outage that I'm going to discuss in a new thread)
              and it doesn't replicate exactly what the clients (eg
              OpenStack services) are doing.<br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    I'm sure it has limits but my point was that the developers of
    rabbitmq understood that it would be difficult for users to know
    exactly what should be poked at inside to check health, so they
    provide a call to do it. <br>
    <blockquote
cite="mid:CAOq3GZU1cB+fAJMxKAZ86+p_69s2Px5YsdVKTTjS0gyav=y0Sg@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div> <br>
            </div>
            <div>Regarding the service checks, there are already plenty
              of scripts that exist for Nagios, Collectd and so on. Some
              of them are listed in the Wiki [1].<br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    I understand and that is what I meant by "after-market". If some one
    puts a  new feature in service X, that requires some monitoring to
    be healthy, then all those different scripts need to chase after it
    to keep up to date. Poking at service internals to check the health
    of a service is an abstraction violation. As some one on this thread
    said, tempest/rally can be used to check a certain kind of health
    but it is akin to black-box testing whereas health monitoring should
    be more akin to whitebox-testing.<br>
    <blockquote
cite="mid:CAOq3GZU1cB+fAJMxKAZ86+p_69s2Px5YsdVKTTjS0gyav=y0Sg@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div><br>
            </div>
            <blockquote class="gmail_quote">
              <div> <br>
                There are various sets of calls associated with nagios,
                zabbix, etc. but those seem like "after-market" parts
                for a car. Seems to me the services themselves would
                know best how to check if they are healthy, particularly
                as that could change version to version. Has their been
                discussion of adding a health-check (admin) api in each
                service? Lacking that, is there documentation from any
                OpenStack projects about "how to check the health of
                nova"? When I saw this thread start, that is what I
                thought it was going to be about.<span class=""><br>
                </span></div>
            </blockquote>
            <div><br>
            </div>
            <div>Starting with Kilo, you could configure your OpenStack
              API services with the healthcheck middleware [2]. This has
              been inspired by what Swift's been doing for some time now
              [3].IIUC the default healthcheck is minimalist and doesn't
              check that dependent services (like RabbitMQ, database)
              are healthy but the framework is extensible and more
              healthchecks can be added.<br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    I can see that but the real value would be in abstracting the
    details of what it means for a service to be healthy inside the
    implementation and exporting an api. If that were present, the
    question of whether calling it used middleware or not would be
    secondary. I'm not sure what the value-add of middleware would be in
    this case.<br>
    <br>
     -David<br>
    <br>
    <br>
    <br>
    <br>
    <blockquote
cite="mid:CAOq3GZU1cB+fAJMxKAZ86+p_69s2Px5YsdVKTTjS0gyav=y0Sg@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div> </div>
            <blockquote class="gmail_quote">
              <div><span class=""> <br>
                   -David</span>
                <div>
                  <div class="h5"><br>
                  </div>
                </div>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>BR,<br>
            </div>
            <div>Simon<br>
            </div>
            <div><br>
              [1] <a moz-do-not-send="true"
href="https://wiki.openstack.org/wiki/Operations/Tools#Monitoring_and_Trending">https://wiki.openstack.org/wiki/Operations/Tools#Monitoring_and_Trending</a><br>
              [2] <a moz-do-not-send="true"
href="http://docs.openstack.org/developer/oslo.middleware/api.html#oslo_middleware.Healthcheck">http://docs.openstack.org/developer/oslo.middleware/api.html#oslo_middleware.Healthcheck</a><br>
              [3] <a moz-do-not-send="true"
href="http://docs.openstack.org/kilo/config-reference/content/object-storage-healthcheck.html">http://docs.openstack.org/kilo/config-reference/content/object-storage-healthcheck.html</a><br>
               </div>
            <blockquote class="gmail_quote">
              <div>
                <div>
                  <div class="h5"> <br>
                    <blockquote type="cite">
                      <div dir="ltr">
                        <div>
                          <div>
                            <div>
                              <div>- OpenStack projects like Rally or
                                Tempest can generate synthetic loads and
                                run end-to-end tests. Integrating them
                                with a monitoring system isn't terribly
                                difficult either.<br>
                              </div>
                            </div>
                            <br>
                            As far as Monitoring-as-a-service is
                            concerned, do you have plans to
                            integrate/leverage Ceilometer?<br>
                            <br>
                          </div>
                          BR,<br>
                        </div>
                        Simon</div>
                      <div class="gmail_extra"><br>
                        <div class="gmail_quote">On Tue, May 12, 2015 at
                          7:20 PM, Vinod Pandarinathan (vpandari) <span
                            dir="ltr"><<a moz-do-not-send="true"
                              href="mailto:vpandari@cisco.com"
                              target="_blank">vpandari@cisco.com</a>></span>
                          wrote:<br>
                          <blockquote class="gmail_quote">
                            <div>
                              <div>
                                <div> <span>Hello,</span></div>
                                <div> <br>
                                </div>
                                <div>   I'm pleased to announce the
                                  development of a new project called
                                  CloudPulse.  CloudPulse provides
                                  Openstack</div>
                                <div> <span>health-checking services to
                                    both operators, tenants, and
                                    applications. This project will
                                    begin as </span></div>
                                <div> <span>a StackForge project based
                                    upon an empty cookiecutter[1] repo. 
                                    The repos to work in are:</span></div>
                                <div> <span>Server:   </span><span><a
                                      moz-do-not-send="true"
                                      href="https://github.com/stackforge/cloudpulse"
                                      target="_blank">https://github.com/stackforge/cloudpulse</a></span></div>
                                <div> <span>Client:     </span><span><a
                                      moz-do-not-send="true"
                                      href="https://github.com/stackforge/python-cloudpulseclient"
                                      target="_blank">https://github.com/stackforge/python-cloudpulseclient</a></span></div>
                                <div> <br>
                                </div>
                                <div> <span>Please join us via iRC on
                                    #openstack-cloudpulse on freenode.</span></div>
                                <div> <br>
                                </div>
                                <div> <span>I am holding a doodle poll
                                    to select times for our first
                                    meeting the week after summit.  This
                                    doodle poll will close May 24th and
                                    meeting times will be announced on
                                    the mailing list at that time.  At
                                    our first IRC meeting, </span></div>
                                <div> <span>we will draft additional
                                    core team members, so if your
                                    interested in joining a fresh new
                                    development effort, please attend
                                    our first meeting.  </span></div>
                                <div> Please take a moment if your
                                  interested in CloudPulse to fill out
                                  the doodle poll here: </div>
                                <div> <br>
                                </div>
                                <div> <span><a moz-do-not-send="true"
                                      href="https://doodle.com/kcpvzy8kfrxe6rvb"
                                      target="_blank">https://doodle.com/kcpvzy8kfrxe6rvb</a></span></div>
                                <div> <br>
                                </div>
                                <div> The initial core team is composed
                                  of</div>
                                <div> <span>Ajay Kalambur,  </span></div>
                                <div> <span>Behzad Dastur, </span><span>Ian
                                    Wells, </span><span>Pradeep
                                    chandrasekhar, </span><span>Steven
                                    Dake</span><span> and</span><span>
                                    Vinod Pandarinathan</span><span>.</span><span> 
                                    <br>
                                  </span></div>
                                <div> <span>I expect more members to
                                    join during our initial meeting.</span></div>
                                <div> <br>
                                </div>
                                <div>  A little bit about CloudPulse:</div>
                                <div> <span> Cloud operators need
                                    notification of OpenStack failures
                                    before a customer reports the
                                    failure. Cloud operators can then
                                    take timely corrective actions with
                                    minimal disruption to applications. 
                                    Many cloud applications, including </span></div>
                                <div> <span>those I am interested in
                                    (NFV) have very stringent service
                                    level agreements.  Loss of service
                                    can trigger contractual</span></div>
                                <div> <span>costs associated with the
                                    service.  Application high
                                    availability requires an operational
                                    OpenStack Cloud, and the reality</span></div>
                                <div> <span>is that occascionally
                                    OpenStack clouds fail in some
                                    mysterious ways.  This project
                                    intends to identify when those
                                    failures </span></div>
                                <div> <span>occur so corrective actions
                                    may be taken by operators, tenants,
                                    and the applications themselves.</span></div>
                                <div> <span><br>
                                  </span></div>
                                <div> <span></span>OpenStack is
                                  considered healthy when OpenStack API
                                  services respond appropriately. 
                                  Further OpenStack is</div>
                                <div> <span>healthy when network
                                    traffic can be sent between the
                                    tenant networks and </span><span>can
                                    access the Internet.  Finally
                                    OpenStack</span></div>
                                <div> <span>is healthy when all
                                    infrastructure cluster elements are
                                    in an operational state.</span></div>
                                <div> <br>
                                </div>
                                <div> <span>For information about
                                    blueprints check out:</span></div>
                                <div> <span> </span><span><a
                                      moz-do-not-send="true"
                                      href="https://blueprints.launchpad.net/cloudpulse"
                                      target="_blank">https://blueprints.launchpad.net/cloudpulse</a></span></div>
                                <div> <span><a moz-do-not-send="true"
                                      href="https://blueprints.launchpad.net/python-cloudpulseclient"
                                      target="_blank">https://blueprints.launchpad.net/python-cloudpulseclient</a></span></div>
                                <div> <br>
                                </div>
                                <div> For more details, check out our
                                  Wiki:</div>
                                <div> <span><a moz-do-not-send="true"
                                      href="https://wiki.openstack.org/wiki/Cloudpulse"
                                      target="_blank">https://wiki.openstack.org/wiki/Cloudpulse</a></span></div>
                                <div> <br>
                                </div>
                                <div> Plase join the CloudPulse team in
                                  designing and implementing a
                                  world-class Carrier Grade system for
                                  checking</div>
                                <div> <span>the health of OpenStack
                                    clouds.  We look forward to seeing
                                    you on IRC on #openstack-cloudpulse.</span></div>
                                <div> <br>
                                </div>
                                <div> Regards,</div>
                                <div> <span>Vinod Pandarinathan</span></div>
                                <div> <span>[1] </span><span><a
                                      moz-do-not-send="true"
                                      href="https://github.com/openstack-dev/cookiecutter"
                                      target="_blank">https://github.com/openstack-dev/cookiecutter</a></span></div>
                              </div>
                              <div><br>
                              </div>
                            </div>
                            <br>
__________________________________________________________________________<br>
                            OpenStack Development Mailing List (not for
                            usage questions)<br>
                            Unsubscribe: <a moz-do-not-send="true"
href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe"
                              target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>
                            <a moz-do-not-send="true"
                              href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev"
                              target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
                            <br>
                          </blockquote>
                        </div>
                        <br>
                      </div>
                      <br>
                      <fieldset></fieldset>
                      <br>
                      <pre>__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: <a moz-do-not-send="true" href="mailto:OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a>
<a moz-do-not-send="true" href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a>
</pre>
                    </blockquote>
                    <br>
                  </div>
                </div>
              </div>
              <br>
__________________________________________________________________________<br>
              OpenStack Development Mailing List (not for usage
              questions)<br>
              Unsubscribe: <a moz-do-not-send="true"
href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe"
                target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>
              <a moz-do-not-send="true"
                href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev"
                target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>
              <br>
            </blockquote>
          </div>
          <br>
        </div>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: <a class="moz-txt-link-abbreviated" href="mailto:OpenStack-dev-request@lists.openstack.org?subject:unsubscribe">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a>
<a class="moz-txt-link-freetext" href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>