[openstack-dev] Jenkins test logs and their retention period
sean at dague.net
Mon Mar 24 10:49:57 UTC 2014
Here is some preliminary views (it currently ignores the ceilometer
logs, I haven't had a chance to dive in there yet).
It actually looks like a huge part of the issue is olso.messaging, the
bulk of screen-n-cond is oslo.messaging debug errors. It seems that in
debug mode oslo.messaging is basically a 100% trace mode, which include
logging every time a UUID is created and every payload.
I'm not convinced why that's a useful. We don't log every sql statement
we run (with full payload).
The recent integration of oslo.messaging would also explain the new
growth of logs.
Other issues include other oslo utils that have really verbose debug
modes. Like lockutils emitting 4 DEBUG messages for every lock acquired.
Part of the challenge is turning off DEBUG is currently embedded in code
in oslo log, which makes it kind of awkward to set sane log levels for
included libraries because it requires an oslo round trip with code to
all the projects to do it.
On 03/21/2014 07:23 PM, Clark Boylan wrote:
> Hello everyone,
> Back at the Portland summit the Infra team committed to archiving six months
> of test logs for Openstack. Since then we have managed to do just that.
> However, more recently we have seen the growth rate on those logs continue
> to grow beyond what is a currently sustainable level.
> For reasons, we currently store logs on a filesystem backed by cinder
> volumes. Rackspace limits the size and number of volumes attached to a
> single host meaning the upper bound on the log archive filesystem is ~12TB
> and we are almost there. You can see real numbers and pretty graphs on our
> cacti server .
> Long term we are trying to move to putting all of the logs in swift, but it
> turns out there are some use case issues we need to sort out around that
> before we can do so (but this is being worked on so should happen). Until
> that day arrives we need to work on logging more smartly, and if we can't do
> that we will have to reduce the log retention period.
> So what can you do? Well it appears that our log files may need a diet. I
> have listed the worst offenders below (after a small sampling, there may be
> more) and it would be great if we could go through those with a comb and
> figure out if we are logging actually useful data. The great thing about
> doing this is it will make lives better for deployers of Openstack too.
> Some initial checking indicates a lot of this noise may be related to
> ceilometer. It looks like it is logging AMQP stuff frequently and inflating
> the logs of individual services as it polls them.
> Offending files from tempest tests:
> screen-n-cond.txt.gz 7.3M
> screen-ceilometer-collector.txt.gz 6.0M
> screen-n-api.txt.gz 3.7M
> screen-n-cpu.txt.gz 3.6M
> tempest.txt.gz 2.7M
> screen-ceilometer-anotification.txt.gz 1.9M
> subunit_log.txt.gz 1.5M
> screen-g-api.txt.gz 1.4M
> screen-ceilometer-acentral.txt.gz 1.4M
> screen-n-net.txt.gz 1.4M
> from: http://logs.openstack.org/52/81252/2/gate/gate-tempest-dsvm-full/488bc4e/logs/?C=S;O=D
> Unittest offenders:
> Nova subunit_log.txt.gz 14M
> Neutron subunit_log.txt.gz 7.8M
> Keystone subunit_log.txt.gz 4.8M
> Note all of the above files are compressed with gzip -9 and the filesizes
> above reflect compressed file sizes.
> Debug logs are important to you guys when dealing with Jenkins results. We
> want your feedback on how we can make this better for everyone.
>  http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=717&rra_id=all
> Thank you,
> Clark Boylan
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
Samsung Research America
sean at dague.net / sean.dague at samsung.com
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 482 bytes
Desc: OpenPGP digital signature
More information about the OpenStack-dev