[neutron][gate] verbose q-svc log files and e-r indexing
Hi all, Recently we've noticed elastic search indexing is behind 115 hours [1] and we looked for abnormally large log files being generated in the gate. We found that the q-svc log is very large, one example being 71.6M [2]. There is a lot of Time-Cost profiling output in the log, like this: Aug 17 14:22:23.210076 ubuntu-bionic-ovh-bhs1-0019298855 neutron-server[5168]: DEBUG neutron_lib.utils.helpers [req-75719db1-4abf-4500-bb0a-6d24e82cd4fd req-d88e7052-7da9-4bc9-8b35-5730ae76dcad service neutron] Time-cost: call 48e628cc-8c3a-408d-a36f-b219524480e0 function apply_funcs start {{(pid=5554) wrapper /usr/local/lib/python3.6/dist-packages/neutron_lib/utils/helpers.py:218}} We saw that there was a recent-ish change to remove some of the profiling output [3] but it was only for the get_objects method. Looking at the total number of lines in the file vs the number of lines without apply_funcs Time-Cost output: $ wc -l screen-q-svc.txt 186387 screen-q-svc.txt $ grep -v "function apply_funcs" screen-q-svc.txt|wc -l 102593 Would it be possible to remove this profiling output from the gate log to give elastic search indexing a better chance at keeping up? Or is there something else I've missed that could be made less verbose in the logging? Thanks for your help. Cheers, -melanie [1] http://status.openstack.org/elastic-recheck [2] https://b6ba3b9af8fd7de57099-18aa39cea11f738aa67ebd6bc9fb5e4c.ssl.cf2.rackcd... [3] https://review.opendev.org/741540
Hi, I opened LP for that [1] and I will propose some fix for it ASAP. On Mon, Aug 17, 2020 at 10:50:15AM -0700, melanie witt wrote:
Hi all,
Recently we've noticed elastic search indexing is behind 115 hours [1] and we looked for abnormally large log files being generated in the gate.
We found that the q-svc log is very large, one example being 71.6M [2]. There is a lot of Time-Cost profiling output in the log, like this:
Aug 17 14:22:23.210076 ubuntu-bionic-ovh-bhs1-0019298855 neutron-server[5168]: DEBUG neutron_lib.utils.helpers [req-75719db1-4abf-4500-bb0a-6d24e82cd4fd req-d88e7052-7da9-4bc9-8b35-5730ae76dcad service neutron] Time-cost: call 48e628cc-8c3a-408d-a36f-b219524480e0 function apply_funcs start {{(pid=5554) wrapper /usr/local/lib/python3.6/dist-packages/neutron_lib/utils/helpers.py:218}}
We saw that there was a recent-ish change to remove some of the profiling output [3] but it was only for the get_objects method.
Looking at the total number of lines in the file vs the number of lines without apply_funcs Time-Cost output:
$ wc -l screen-q-svc.txt 186387 screen-q-svc.txt
$ grep -v "function apply_funcs" screen-q-svc.txt|wc -l 102593
Would it be possible to remove this profiling output from the gate log to give elastic search indexing a better chance at keeping up? Or is there something else I've missed that could be made less verbose in the logging?
Thanks for your help.
Cheers, -melanie
[1] http://status.openstack.org/elastic-recheck [2] https://b6ba3b9af8fd7de57099-18aa39cea11f738aa67ebd6bc9fb5e4c.ssl.cf2.rackcd... [3] https://review.opendev.org/741540
[1] https://bugs.launchpad.net/neutron/+bug/1892017 -- Slawek Kaplonski Principal software engineer Red Hat
Hi, I proposed patch [1] which seems that decreased size of the neutron-server log a bit - see [2] but it's still about 40M :/ [1] https://review.opendev.org/#/c/730879/ [2] https://48dcf568cd222acfbfb6-11d92d8452a346ca231ad13d26a55a7d.ssl.cf2.rackcd... On Tue, Aug 18, 2020 at 12:33:23PM +0200, Slawek Kaplonski wrote:
Hi,
I opened LP for that [1] and I will propose some fix for it ASAP.
On Mon, Aug 17, 2020 at 10:50:15AM -0700, melanie witt wrote:
Hi all,
Recently we've noticed elastic search indexing is behind 115 hours [1] and we looked for abnormally large log files being generated in the gate.
We found that the q-svc log is very large, one example being 71.6M [2]. There is a lot of Time-Cost profiling output in the log, like this:
Aug 17 14:22:23.210076 ubuntu-bionic-ovh-bhs1-0019298855 neutron-server[5168]: DEBUG neutron_lib.utils.helpers [req-75719db1-4abf-4500-bb0a-6d24e82cd4fd req-d88e7052-7da9-4bc9-8b35-5730ae76dcad service neutron] Time-cost: call 48e628cc-8c3a-408d-a36f-b219524480e0 function apply_funcs start {{(pid=5554) wrapper /usr/local/lib/python3.6/dist-packages/neutron_lib/utils/helpers.py:218}}
We saw that there was a recent-ish change to remove some of the profiling output [3] but it was only for the get_objects method.
Looking at the total number of lines in the file vs the number of lines without apply_funcs Time-Cost output:
$ wc -l screen-q-svc.txt 186387 screen-q-svc.txt
$ grep -v "function apply_funcs" screen-q-svc.txt|wc -l 102593
Would it be possible to remove this profiling output from the gate log to give elastic search indexing a better chance at keeping up? Or is there something else I've missed that could be made less verbose in the logging?
Thanks for your help.
Cheers, -melanie
[1] http://status.openstack.org/elastic-recheck [2] https://b6ba3b9af8fd7de57099-18aa39cea11f738aa67ebd6bc9fb5e4c.ssl.cf2.rackcd... [3] https://review.opendev.org/741540
[1] https://bugs.launchpad.net/neutron/+bug/1892017
-- Slawek Kaplonski Principal software engineer Red Hat
-- Slawek Kaplonski Principal software engineer Red Hat
On 8/18/20 08:00, Slawek Kaplonski wrote:
Hi,
I proposed patch [1] which seems that decreased size of the neutron-server log a bit - see [2] but it's still about 40M :/
[1] https://review.opendev.org/#/c/730879/ [2] https://48dcf568cd222acfbfb6-11d92d8452a346ca231ad13d26a55a7d.ssl.cf2.rackcd...
Thanks for jumping in to help, Slawek! Indeed your proposed patch improves things from 60M-70M => 40M (good!). With your patch applied, the most frequent potential log message I see now is like this: Aug 18 14:40:21.294549 ubuntu-bionic-rax-iad-0019321276 neutron-server[5829]: DEBUG neutron_lib.callbacks.manager [None req-eadfbe92-eaee-4e3e-a5c0-f18aa8ba9772 None None] Notify callbacks ['neutron.services.segments.db._update_segment_host_mapping_for_agent-8764691834039', 'neutron.plugins.ml2.plugin.Ml2Plugin._retry_binding_revived_agents-4033733'] for agent, after_update {{(pid=6206) _notify_loop /opt/stack/neutron-lib/neutron_lib/callbacks/manager.py:193}} with the line count difference being with and without: $ wc -l "screen-q-svc.txt" 102493 screen-q-svc.txt $ grep -v "neutron_lib.callbacks.manager" "screen-q-svc.txt" |wc -l 83261 so I suppose we could predict a decrease in file size of about 40M => 32M if we were able to remove the neutron_lib.callbacks.manager output. But I'm not sure whether that's a critical debugging element or not. -melanie
Hi, On Tue, Aug 18, 2020 at 02:10:40PM -0700, melanie witt wrote:
On 8/18/20 08:00, Slawek Kaplonski wrote:
Hi,
I proposed patch [1] which seems that decreased size of the neutron-server log a bit - see [2] but it's still about 40M :/
[1] https://review.opendev.org/#/c/730879/ [2] https://48dcf568cd222acfbfb6-11d92d8452a346ca231ad13d26a55a7d.ssl.cf2.rackcd...
Thanks for jumping in to help, Slawek! Indeed your proposed patch improves things from 60M-70M => 40M (good!).
With your patch applied, the most frequent potential log message I see now is like this:
Aug 18 14:40:21.294549 ubuntu-bionic-rax-iad-0019321276 neutron-server[5829]: DEBUG neutron_lib.callbacks.manager [None req-eadfbe92-eaee-4e3e-a5c0-f18aa8ba9772 None None] Notify callbacks ['neutron.services.segments.db._update_segment_host_mapping_for_agent-8764691834039', 'neutron.plugins.ml2.plugin.Ml2Plugin._retry_binding_revived_agents-4033733'] for agent, after_update {{(pid=6206) _notify_loop /opt/stack/neutron-lib/neutron_lib/callbacks/manager.py:193}}
with the line count difference being with and without:
$ wc -l "screen-q-svc.txt" 102493 screen-q-svc.txt
$ grep -v "neutron_lib.callbacks.manager" "screen-q-svc.txt" |wc -l 83261
so I suppose we could predict a decrease in file size of about 40M => 32M if we were able to remove the neutron_lib.callbacks.manager output.
I was looking at this again today but I'm really not sure if we should get rid of those messages from the log. For now I think that indexing of screen-q-svc.txt file is disabled so this size of the log shouldn't be big problem (I hope) and I would like to not remove any other debug messages from it if that will not be really necessary.
But I'm not sure whether that's a critical debugging element or not.
-melanie
-- Slawek Kaplonski Principal software engineer Red Hat
On Fri, Aug 28, 2020, at 3:04 AM, Slawek Kaplonski wrote:
Hi,
On Tue, Aug 18, 2020 at 02:10:40PM -0700, melanie witt wrote:
On 8/18/20 08:00, Slawek Kaplonski wrote:
Hi,
I proposed patch [1] which seems that decreased size of the neutron-server log a bit - see [2] but it's still about 40M :/
[1] https://review.opendev.org/#/c/730879/ [2] https://48dcf568cd222acfbfb6-11d92d8452a346ca231ad13d26a55a7d.ssl.cf2.rackcd...
Thanks for jumping in to help, Slawek! Indeed your proposed patch improves things from 60M-70M => 40M (good!).
With your patch applied, the most frequent potential log message I see now is like this:
Aug 18 14:40:21.294549 ubuntu-bionic-rax-iad-0019321276 neutron-server[5829]: DEBUG neutron_lib.callbacks.manager [None req-eadfbe92-eaee-4e3e-a5c0-f18aa8ba9772 None None] Notify callbacks ['neutron.services.segments.db._update_segment_host_mapping_for_agent-8764691834039', 'neutron.plugins.ml2.plugin.Ml2Plugin._retry_binding_revived_agents-4033733'] for agent, after_update {{(pid=6206) _notify_loop /opt/stack/neutron-lib/neutron_lib/callbacks/manager.py:193}}
with the line count difference being with and without:
$ wc -l "screen-q-svc.txt" 102493 screen-q-svc.txt
$ grep -v "neutron_lib.callbacks.manager" "screen-q-svc.txt" |wc -l 83261
so I suppose we could predict a decrease in file size of about 40M => 32M if we were able to remove the neutron_lib.callbacks.manager output.
I was looking at this again today but I'm really not sure if we should get rid of those messages from the log. For now I think that indexing of screen-q-svc.txt file is disabled so this size of the log shouldn't be big problem (I hope) and I would like to not remove any other debug messages from it if that will not be really necessary.
Maybe as an option we split this into two log files. One that is INFO and above and the other that includes everything with DEBUG? Then we can index the INFO and above contents only. One thing to keep in mind here is that this system tends to act like a canary for when our logs would create problems for people in production. The q-svc logs here are significantly more chatty than the other services. Not necessarily a problem, but don't be surprised if people notice after they upgrade and start complaining.
But I'm not sure whether that's a critical debugging element or not.
-melanie
-- Slawek Kaplonski Principal software engineer Red Hat
participants (3)
-
Clark Boylan
-
melanie witt
-
Slawek Kaplonski