[openstack-dev] Can we get some sanity in the Neutron logs please?
Rossella Sblendido
rsblendido at suse.com
Wed Nov 11 10:21:07 UTC 2015
Hello Matt,
On 11/10/2015 07:33 PM, Matt Riedemann wrote:
> Let me qualify by saying I'm not a Neutron person.
>
> We know that gate-tempest-dsvm-neutron-full is failing hard as of the
> last 24 hours [1].
>
> An error that's been showing up in tempest runs with neutron a lot is:
>
> "AssertionError: 0 == 0 : No IPv4 addresses found in: []"
>
> So checking logstash [2] it's hitting a lot. It's only recent because
> that failure message is new to Tempest in the last day or so, but it has
> a lot of hits, so whatever it is, it's failing a lot.
>
> So the next step is usually digging into service logs looking for
> errors. I check the q-svc logs first. Not many errors but a bazillion
> warnings for things not found (networks and devices). [3]
>
> For example:
>
> 2015-11-10 17:13:02.542 WARNING neutron.plugins.ml2.rpc
> [req-15a73753-1512-4689-9404-9658a0cd0c09 None None] Device
> aaa525be-14eb-44a5-beb0-ed722896be93 requested by agent
> ovs-agent-devstack-trusty-rax-iad-5785199 not found in database
>
> 2015-11-10 17:14:17.754 WARNING neutron.api.rpc.handlers.dhcp_rpc
> [req-3d7e9848-6151-4780-907f-43f11a2a8545 None None] Network
> b07ad9b2-e63e-4459-879d-3721074704e5 could not be found, it might have
> been deleted concurrently.
>
> Are several hundred of these warnings useful to an operator trying to
> debug a problem? The point of the CI gate testing is to try and simulate
> a production cloud environment. When something goes wrong, you check the
> logs. With the amount of warning/error level logging that is in the
> neutron logs, finding a real problem is like looking for a needle in a
> haystack. Since everything is async, 404s are expected when racing to
> delete a resource and they should be handled gracefully.
>
> Anyway, the server log isn't useful so I go digging in the agent logs
> and stacktraces there are aplenty. [4]
>
> Particularly this:
>
> "Exception: Port tapcea51630-e1 is not ready, resync needed"
>
> That's due to a new change landing in the last 24 hours [5]. But the
> trace shows up over 16K times since it landed [6].
>
> Checking the code, it's basically a loop processing events and when it
> hits an event it can't handle, it punts (breaking the loop so you don't
> process the other events after it - which is a bug), and the code that
> eventually handles it is just catching all Exception and tracing them
> out assuming they are really bad.
As you noticed in the review [1] there was a dependent patch that was
solving this. It was a big and pretty complex change, I tried to split
it in few patches. I should have split it in a better way.
>
> At this point, as a non-neutron person, i.e. not well versed in the
> operations of neutron or how to debug it in great detail, I assume
> something is bad here but I don't really know - and the logs are so full
> of noise that I can't distinguish real failures.
>
> I don't mean to pick on this particular change, but it's a good example
> of a recent thing.
>
> I'd like to know if this is all known issue or WIP type stuff. I've
> complained about excessively noisey neutron logs in channel before and
> I'm usually told that they are either necessary (for whatever reason) or
> that rather than complain about the verbosity, I should fix the race
> that is causing it - which is not likely to happen since I don't have
> the async rpc happy nature of everything in neutron in my head to debug
> it (I doubt many do).
Yes in this case it's a WIP, logs were not meant to stay there, actually
they were cleaned up by the dependent patch.
>
> Anyway, this is a plea for sanity in the logs. There are logging
> guidelines for openstack [7]. Let's please abide by them. Let's keep
> operators in mind when we're looking at logs and be proactive about
> making them useful (which includes more granular error handling and less
> global try/except Exception: LOG.exception constructs).
We are aware of the guidelines and when reviewing code in Neutron we try
to enforce them. We all want better logs :) so please keep giving
feedback. Thanks for raising this point.
cheers,
Rossella
>
> [1] http://tinyurl.com/ne3ex4v
> [2]
> http://logstash.openstack.org/#dashboard/file/logstash.json?query=message:%5C%22AssertionError:%200%20==%200%20:%20No%20IPv4%20addresses%20found%20in:%20%5B%5D%5C%22%20AND%20tags:%5C%22console%5C%22
>
> [3]
> http://logs.openstack.org/85/239885/2/gate/gate-tempest-dsvm-neutron-full/602d864/logs/screen-q-svc.txt.gz?level=TRACE
>
> [4]
> http://logs.openstack.org/85/239885/2/gate/gate-tempest-dsvm-neutron-full/602d864/logs/screen-q-agt.txt.gz?level=TRACE
>
> [5] https://review.openstack.org/#/c/164880/
> [6]
> http://logstash.openstack.org/#dashboard/file/logstash.json?query=message:%5C%22Exception:%20Port%5C%22%20AND%20message:%5C%22is%20not%20ready,%20resync%20needed%5C%22%20AND%20tags:%5C%22screen-q-agt.txt%5C%22
>
> [7]
> http://specs.openstack.org/openstack/openstack-specs/specs/log-guidelines.html
>
>
More information about the OpenStack-dev
mailing list