[devstack][infra] POST_FAILURE on export-devstack-journal : Export journal
Hi, one of our jobs (python-tempestconf project) is frequently failing with POST_FAILURE [1] during the following task: export-devstack-journal : Export journal I'm bringing this to a broader audience as we're not sure where exactly the issue might be. Did you encounter a similar issue lately or in the past? [1] https://zuul.opendev.org/t/openstack/builds?job_name=python-tempestconf-tempest-devstack-admin-plugins&project=osf/python-tempestconf Thanks for any advice, -- Martin Kopec
I am testing whether replacing xz with gzip would solve the problem [1] [2]. [1] https://review.opendev.org/c/openstack/devstack/+/784964 [2] https://review.opendev.org/c/osf/python-tempestconf/+/784967 -yoctozepto On Tue, Apr 6, 2021 at 1:21 PM Martin Kopec <mkopec@redhat.com> wrote:
Hi,
one of our jobs (python-tempestconf project) is frequently failing with POST_FAILURE [1] during the following task:
export-devstack-journal : Export journal
I'm bringing this to a broader audience as we're not sure where exactly the issue might be.
Did you encounter a similar issue lately or in the past?
Thanks for any advice, -- Martin Kopec
On Tue, Apr 6, 2021, at 8:14 AM, Radosław Piliszek wrote:
I am testing whether replacing xz with gzip would solve the problem [1] [2].
The reason we used xz is that the files are very large and gz compression is very poor compared to xz for these files and these files are not really human readable as is (you need to load them into journald first). Let's test it and see what the gz file sizes look like but if they are still quite large then this is unlikely to be an appropriate fix.
[1] https://review.opendev.org/c/openstack/devstack/+/784964 [2] https://review.opendev.org/c/osf/python-tempestconf/+/784967
-yoctozepto
On Tue, Apr 6, 2021 at 1:21 PM Martin Kopec <mkopec@redhat.com> wrote:
Hi,
one of our jobs (python-tempestconf project) is frequently failing with POST_FAILURE [1] during the following task:
export-devstack-journal : Export journal
I'm bringing this to a broader audience as we're not sure where exactly the issue might be.
Did you encounter a similar issue lately or in the past?
Thanks for any advice, -- Martin Kopec
On 2021-04-06 13:21:17 +0200 (+0200), Martin Kopec wrote:
one of our jobs (python-tempestconf project) is frequently failing with POST_FAILURE [1] during the following task:
export-devstack-journal : Export journal
I'm bringing this to a broader audience as we're not sure where exactly the issue might be.
Did you encounter a similar issue lately or in the past?
Looking at the error, I strongly suspect memory exhaustion. We could try tuning xz to use less memory when compressing. -- Jeremy Stanley
On Tue, Apr 6, 2021 at 6:02 PM Jeremy Stanley <fungi@yuggoth.org> wrote:
Looking at the error, I strongly suspect memory exhaustion. We could try tuning xz to use less memory when compressing.
That was my hunch as well, hence why I test using gzip. On Tue, Apr 6, 2021 at 5:51 PM Clark Boylan <cboylan@sapwetik.org> wrote:
On Tue, Apr 6, 2021, at 8:14 AM, Radosław Piliszek wrote:
I am testing whether replacing xz with gzip would solve the problem [1] [2].
The reason we used xz is that the files are very large and gz compression is very poor compared to xz for these files and these files are not really human readable as is (you need to load them into journald first). Let's test it and see what the gz file sizes look like but if they are still quite large then this is unlikely to be an appropriate fix.
Let's see how bad the file sizes are. If they are acceptable, we can keep gzip and be happy. Otherwise we try to tune the params to make xz a better citizen as fungi suggested. -yoctozepto
On Tue, Apr 6, 2021 at 6:11 PM Radosław Piliszek <radoslaw.piliszek@gmail.com> wrote:
On Tue, Apr 6, 2021 at 5:51 PM Clark Boylan <cboylan@sapwetik.org> wrote:
On Tue, Apr 6, 2021, at 8:14 AM, Radosław Piliszek wrote:
I am testing whether replacing xz with gzip would solve the problem [1] [2].
The reason we used xz is that the files are very large and gz compression is very poor compared to xz for these files and these files are not really human readable as is (you need to load them into journald first). Let's test it and see what the gz file sizes look like but if they are still quite large then this is unlikely to be an appropriate fix.
Let's see how bad the file sizes are.
devstack.journal.gz 23.6M Less than all the other logs together, I would not mind. I wonder how it is in other jobs (this is from the failing one). -yoctozepto
On Tue, Apr 6, 2021, at 9:15 AM, Radosław Piliszek wrote:
On Tue, Apr 6, 2021 at 6:11 PM Radosław Piliszek <radoslaw.piliszek@gmail.com> wrote:
On Tue, Apr 6, 2021 at 5:51 PM Clark Boylan <cboylan@sapwetik.org> wrote:
On Tue, Apr 6, 2021, at 8:14 AM, Radosław Piliszek wrote:
I am testing whether replacing xz with gzip would solve the problem [1] [2].
The reason we used xz is that the files are very large and gz compression is very poor compared to xz for these files and these files are not really human readable as is (you need to load them into journald first). Let's test it and see what the gz file sizes look like but if they are still quite large then this is unlikely to be an appropriate fix.
Let's see how bad the file sizes are.
devstack.journal.gz 23.6M
Less than all the other logs together, I would not mind. I wonder how it is in other jobs (this is from the failing one).
There does seem to be a range (likely due to how much the job workload causes logging to happen in journald) from about a few megabytes to eighty something MB [3]. This is probably acceptable. Just keep an eye out for jobs that end up with much larger file sizes and we can reevaluate if we notice them. [3] https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/z...
On Tue, Apr 6, 2021, at 9:11 AM, Radosław Piliszek wrote:
On Tue, Apr 6, 2021 at 6:02 PM Jeremy Stanley <fungi@yuggoth.org> wrote:
Looking at the error, I strongly suspect memory exhaustion. We could try tuning xz to use less memory when compressing.
Worth noting that we continue to suspect memory pressure, and in particular diving into swap, for random failures that appear timing or performance related. I still think it would be a helpful exercise for OpenStack to look at its memory consumption (remember end users will experience this too) and see if there are any unexpected areas of memory use. I think the last time i skimmed logs the privsep daemon was a large consumer because we separate instance is run for each service and they all add up.
That was my hunch as well, hence why I test using gzip.
On Tue, Apr 6, 2021 at 5:51 PM Clark Boylan <cboylan@sapwetik.org> wrote:
On Tue, Apr 6, 2021, at 8:14 AM, Radosław Piliszek wrote:
I am testing whether replacing xz with gzip would solve the problem [1] [2].
The reason we used xz is that the files are very large and gz compression is very poor compared to xz for these files and these files are not really human readable as is (you need to load them into journald first). Let's test it and see what the gz file sizes look like but if they are still quite large then this is unlikely to be an appropriate fix.
Let's see how bad the file sizes are. If they are acceptable, we can keep gzip and be happy. Otherwise we try to tune the params to make xz a better citizen as fungi suggested.
-yoctozepto
hmm, seems like we have hit the issue again, however in a different job now: Latest logs: https://zuul.opendev.org/t/openstack/build/0565c3d252194f9ba67f4af20e8be65d Link to the review where it occurred: https://review.opendev.org/c/osf/refstack-client/+/788743 On Tue, 6 Apr 2021 at 18:47, Clark Boylan <cboylan@sapwetik.org> wrote:
On Tue, Apr 6, 2021, at 9:11 AM, Radosław Piliszek wrote:
On Tue, Apr 6, 2021 at 6:02 PM Jeremy Stanley <fungi@yuggoth.org> wrote:
Looking at the error, I strongly suspect memory exhaustion. We could try tuning xz to use less memory when compressing.
Worth noting that we continue to suspect memory pressure, and in particular diving into swap, for random failures that appear timing or performance related. I still think it would be a helpful exercise for OpenStack to look at its memory consumption (remember end users will experience this too) and see if there are any unexpected areas of memory use. I think the last time i skimmed logs the privsep daemon was a large consumer because we separate instance is run for each service and they all add up.
That was my hunch as well, hence why I test using gzip.
On Tue, Apr 6, 2021 at 5:51 PM Clark Boylan <cboylan@sapwetik.org>
wrote:
On Tue, Apr 6, 2021, at 8:14 AM, Radosław Piliszek wrote:
I am testing whether replacing xz with gzip would solve the problem
[1] [2].
The reason we used xz is that the files are very large and gz
compression is very poor compared to xz for these files and these files are not really human readable as is (you need to load them into journald first). Let's test it and see what the gz file sizes look like but if they are still quite large then this is unlikely to be an appropriate fix.
Let's see how bad the file sizes are. If they are acceptable, we can keep gzip and be happy. Otherwise we try to tune the params to make xz a better citizen as fungi suggested.
-yoctozepto
-- Martin
On 2021-05-07 13:48:36 +0200 (+0200), Martin Kopec wrote:
hmm, seems like we have hit the issue again, however in a different job now: Latest logs: https://zuul.opendev.org/t/openstack/build/0565c3d252194f9ba67f4af20e8be65d Link to the review where it occurred: https://review.opendev.org/c/osf/refstack-client/+/788743 [...]
It was addressed in the master branch a month ago with https://review.opendev.org/784964 wasn't backported to any older branches (or if it was then the backports haven't merged yet). Looking at the zuul._inheritance_path from the inventory for your build, it seems to have used stable/wallaby of devstack rather than master, which explains why you're still seeing xzip used. -- Jeremy Stanley
right, thank you. I've proposed a backport to wallaby: https://review.opendev.org/c/openstack/devstack/+/790353 and verifying it solves the problem here: https://review.opendev.org/c/osf/refstack-client/+/788743 On Fri, 7 May 2021 at 15:35, Jeremy Stanley <fungi@yuggoth.org> wrote:
On 2021-05-07 13:48:36 +0200 (+0200), Martin Kopec wrote:
hmm, seems like we have hit the issue again, however in a different job now: Latest logs:
https://zuul.opendev.org/t/openstack/build/0565c3d252194f9ba67f4af20e8be65d
Link to the review where it occurred: https://review.opendev.org/c/osf/refstack-client/+/788743 [...]
It was addressed in the master branch a month ago with https://review.opendev.org/784964 wasn't backported to any older branches (or if it was then the backports haven't merged yet). Looking at the zuul._inheritance_path from the inventory for your build, it seems to have used stable/wallaby of devstack rather than master, which explains why you're still seeing xzip used. -- Jeremy Stanley
-- Martin
participants (4)
-
Clark Boylan
-
Jeremy Stanley
-
Martin Kopec
-
Radosław Piliszek