[Openstack-operators] [openstack-dev] [all]Can we get some sanity in the Neutron logs please?

Rochelle Grober rochelle.grober at huawei.com
Fri Nov 20 01:34:05 UTC 2015


Thanks both Armando and Matt!

I am cross posting this to the operators' list (as the main post -- operators, simple reply and no spam to dev).

The logging tag should be a tag in all projects.  I'm glad it's already there in Neutron.  And, this sort of feedback is exactly what we need operators and gate debuggers to provide when they hit stuff in the logs that is either classified at the wrong level (e.g., warning instead of info), is too verbose, generates a traceback rather than an error, or is too chatty, happens too often.

Please file these bugs to the appropriate projects and tag them "logging".  Generally, these are easy fixes, or are "while you're in there" fixes while a bug is being addressed, so reporting them will most often get them fixed quickly.  I know that doesn't help the Ops folks as much as the developers, but it will help the developers when they upgrade to releases the logging has been fixed in.

These sorts of issues are major maintainability issues for operators, so a focus on this for the Mitaka release will help make it a release operators want to upgrade to :-)

Thanks for the ear and for the help with debugging OpenStack issues by improving the signal to noise ratio.

--Rocky

From: Armando M. [mailto:armamig at gmail.com]
Sent: Tuesday, November 10, 2015 11:39 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] Can we get some sanity in the Neutron logs please?



On 10 November 2015 at 11:12, Matt Riedemann <mriedem at linux.vnet.ibm.com<mailto:mriedem at linux.vnet.ibm.com>> wrote:


On 11/10/2015 1:10 PM, Matt Riedemann wrote:


On 11/10/2015 12:51 PM, Armando M. wrote:


On 10 November 2015 at 10:33, Matt Riedemann <mriedem at linux.vnet.ibm.com<mailto:mriedem at linux.vnet.ibm.com>
<mailto:mriedem at linux.vnet.ibm.com<mailto:mriedem at linux.vnet.ibm.com>>> wrote:

    Let me qualify by saying I'm not a Neutron person.

    We know that gate-tempest-dsvm-neutron-full is failing hard as of
    the last 24 hours [1].

    An error that's been showing up in tempest runs with neutron a lot
is:

    "AssertionError: 0 == 0 : No IPv4 addresses found in: []"

    So checking logstash [2] it's hitting a lot. It's only recent
    because that failure message is new to Tempest in the last day or
    so, but it has a lot of hits, so whatever it is, it's failing a lot.

    So the next step is usually digging into service logs looking for
    errors. I check the q-svc logs first. Not many errors but a
    bazillion warnings for things not found (networks and devices). [3]

    For example:

    2015-11-10 17:13:02.542 WARNING neutron.plugins.ml2.rpc
    [req-15a73753-1512-4689-9404-9658a0cd0c09 None None] Device
    aaa525be-14eb-44a5-beb0-ed722896be93 requested by agent
    ovs-agent-devstack-trusty-rax-iad-5785199 not found in database

    2015-11-10 17:14:17.754 WARNING neutron.api.rpc.handlers.dhcp_rpc
    [req-3d7e9848-6151-4780-907f-43f11a2a8545 None None] Network
    b07ad9b2-e63e-4459-879d-3721074704e5 could not be found, it might
    have been deleted concurrently.

    Are several hundred of these warnings useful to an operator trying
    to debug a problem? The point of the CI gate testing is to try and
    simulate a production cloud environment. When something goes wrong,
    you check the logs. With the amount of warning/error level logging
    that is in the neutron logs, finding a real problem is like looking
    for a needle in a haystack. Since everything is async, 404s are
    expected when racing to delete a resource and they should be handled
    gracefully.

    Anyway, the server log isn't useful so I go digging in the agent
    logs and stacktraces there are aplenty. [4]

    Particularly this:

    "Exception: Port tapcea51630-e1 is not ready, resync needed"

    That's due to a new change landing in the last 24 hours [5]. But the
    trace shows up over 16K times since it landed [6].

    Checking the code, it's basically a loop processing events and when
    it hits an event it can't handle, it punts (breaking the loop so you
    don't process the other events after it - which is a bug), and the
    code that eventually handles it is just catching all Exception and
    tracing them out assuming they are really bad.

    At this point, as a non-neutron person, i.e. not well versed in the
    operations of neutron or how to debug it in great detail, I assume
    something is bad here but I don't really know - and the logs are so
    full of noise that I can't distinguish real failures.

    I don't mean to pick on this particular change, but it's a good
    example of a recent thing.

    I'd like to know if this is all known issue or WIP type stuff. I've
    complained about excessively noisey neutron logs in channel before
    and I'm usually told that they are either necessary (for whatever
    reason) or that rather than complain about the verbosity, I should
    fix the race that is causing it - which is not likely to happen
    since I don't have the async rpc happy nature of everything in
    neutron in my head to debug it (I doubt many do).


I am not sure that's a fair statement: we usually pinpoint that just
lowering log levels is not really solving the underlying issue
(whichever it may be), and that comment really should apply to any
project, not just Neutron. That said, we had examples where we took your
input and drove the right fix ourselves.

We have a 'logging' tag for Neutron bugs that we use to identify these
type of cleanups. We'd need your attention to details to alert us of
issues like these; we'll take care of the right fixes. Currently, the
queue is pretty dry. If you can top it up, that'll be great. Going off
on a log cleanup rampage doesn't seem like the best course of action;
I'd rather knock issues one by one as they come, like the one you just
mentioned.

[1] https://bugs.launchpad.net/neutron/+bugs?field.tag=logging

Tagged:

https://bugs.launchpad.net/nova/+bug/1514935

Although it must not be an official tag since it didn't auto-complete
for me, that should be fixed.

Crap, nevermind that, my bad - I had created the bug against nova (force of habit) rather than neutron. Confirmed that 'logging' is a neutron tag in LP.

Ah, gotcha!





    Anyway, this is a plea for sanity in the logs. There are logging
    guidelines for openstack [7]. Let's please abide by them. Let's keep
    operators in mind when we're looking at logs and be proactive about
    making them useful (which includes more granular error handling and
    less global try/except Exception: LOG.exception constructs).

Your point is duly noted. We have documented this, and we are being more
meticulous during reviews.


    [1] http://tinyurl.com/ne3ex4v
    [2]

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message:%5C%22AssertionError:%200%20==%200%20:%20No%20IPv4%20addresses%20found%20in:%20%5B%5D%5C%22%20AND%20tags:%5C%22console%5C%22

    [3]

http://logs.openstack.org/85/239885/2/gate/gate-tempest-dsvm-neutron-full/602d864/logs/screen-q-svc.txt.gz?level=TRACE

    [4]

http://logs.openstack.org/85/239885/2/gate/gate-tempest-dsvm-neutron-full/602d864/logs/screen-q-agt.txt.gz?level=TRACE

    [5] https://review.openstack.org/#/c/164880/
    [6]

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message:%5C%22Exception:%20Port%5C%22%20AND%20message:%5C%22is%20not%20ready,%20resync%20needed%5C%22%20AND%20tags:%5C%22screen-q-agt.txt%5C%22

    [7]

http://specs.openstack.org/openstack/openstack-specs/specs/log-guidelines.html


    --

    Thanks,

    Matt Riedemann



__________________________________________________________________________

    OpenStack Development Mailing List (not for usage questions)
    Unsubscribe:
    OpenStack-dev-request at lists.openstack.org?subject:unsubscribe<http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>

<http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
    http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__________________________________________________________________________

OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
OpenStack-dev-request at lists.openstack.org?subject:unsubscribe<http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


--

Thanks,

Matt Riedemann


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe<http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20151120/196634eb/attachment.html>


More information about the OpenStack-operators mailing list