[devstack][infra] POST_FAILURE on export-devstack-journal : Export journal

Martin Kopec

6 Apr 2021 6 Apr '21

3:21 a.m.

Hi, one of our jobs (python-tempestconf project) is frequently failing with POST_FAILURE [1] during the following task: export-devstack-journal : Export journal I'm bringing this to a broader audience as we're not sure where exactly the issue might be. Did you encounter a similar issue lately or in the past? [1] https://zuul.opendev.org/t/openstack/builds?job_name=python-tempestconf-tempest-devstack-admin-plugins&project=osf/python-tempestconf Thanks for any advice, -- Martin Kopec

Attachments:

attachment.html (text/html — 1016 bytes)

Show replies by date

Radosław Piliszek

6 Apr 6 Apr

7:14 a.m.

I am testing whether replacing xz with gzip would solve the problem [1] [2]. [1] https://review.opendev.org/c/openstack/devstack/+/784964 [2] https://review.opendev.org/c/osf/python-tempestconf/+/784967 -yoctozepto On Tue, Apr 6, 2021 at 1:21 PM Martin Kopec <mkopec@redhat.com> wrote:

...

Hi,

one of our jobs (python-tempestconf project) is frequently failing with POST_FAILURE [1] during the following task:

export-devstack-journal : Export journal

I'm bringing this to a broader audience as we're not sure where exactly the issue might be.

Did you encounter a similar issue lately or in the past?

[1] https://zuul.opendev.org/t/openstack/builds?job_name=python-tempestconf-tempest-devstack-admin-plugins&project=osf/python-tempestconf

Thanks for any advice, -- Martin Kopec

Clark Boylan

7:51 a.m.

On Tue, Apr 6, 2021, at 8:14 AM, Radosław Piliszek wrote:

...

I am testing whether replacing xz with gzip would solve the problem [1] [2].

The reason we used xz is that the files are very large and gz compression is very poor compared to xz for these files and these files are not really human readable as is (you need to load them into journald first). Let's test it and see what the gz file sizes look like but if they are still quite large then this is unlikely to be an appropriate fix.

...

[1] https://review.opendev.org/c/openstack/devstack/+/784964 [2] https://review.opendev.org/c/osf/python-tempestconf/+/784967

-yoctozepto

On Tue, Apr 6, 2021 at 1:21 PM Martin Kopec <mkopec@redhat.com> wrote:

...
Hi,

one of our jobs (python-tempestconf project) is frequently failing with POST_FAILURE [1] during the following task:

export-devstack-journal : Export journal

I'm bringing this to a broader audience as we're not sure where exactly the issue might be.

Did you encounter a similar issue lately or in the past?

[1] https://zuul.opendev.org/t/openstack/builds?job_name=python-tempestconf-tempest-devstack-admin-plugins&project=osf/python-tempestconf

Thanks for any advice, -- Martin Kopec

Jeremy Stanley

8:02 a.m.

On 2021-04-06 13:21:17 +0200 (+0200), Martin Kopec wrote:

...

one of our jobs (python-tempestconf project) is frequently failing with POST_FAILURE [1] during the following task:

export-devstack-journal : Export journal

I'm bringing this to a broader audience as we're not sure where exactly the issue might be.

Did you encounter a similar issue lately or in the past?

[1] https://zuul.opendev.org/t/openstack/builds?job_name=python-tempestconf-tempest-devstack-admin-plugins&project=osf/python-tempestconf

Looking at the error, I strongly suspect memory exhaustion. We could try tuning xz to use less memory when compressing. -- Jeremy Stanley

Radosław Piliszek

8:11 a.m.

On Tue, Apr 6, 2021 at 6:02 PM Jeremy Stanley <fungi@yuggoth.org> wrote:

...

Looking at the error, I strongly suspect memory exhaustion. We could try tuning xz to use less memory when compressing.

That was my hunch as well, hence why I test using gzip. On Tue, Apr 6, 2021 at 5:51 PM Clark Boylan <cboylan@sapwetik.org> wrote:

...

On Tue, Apr 6, 2021, at 8:14 AM, Radosław Piliszek wrote:

...
I am testing whether replacing xz with gzip would solve the problem [1] [2].

The reason we used xz is that the files are very large and gz compression is very poor compared to xz for these files and these files are not really human readable as is (you need to load them into journald first). Let's test it and see what the gz file sizes look like but if they are still quite large then this is unlikely to be an appropriate fix.

Let's see how bad the file sizes are. If they are acceptable, we can keep gzip and be happy. Otherwise we try to tune the params to make xz a better citizen as fungi suggested. -yoctozepto

Radosław Piliszek

8:15 a.m.

On Tue, Apr 6, 2021 at 6:11 PM Radosław Piliszek <radoslaw.piliszek@gmail.com> wrote:

...

On Tue, Apr 6, 2021 at 5:51 PM Clark Boylan <cboylan@sapwetik.org> wrote:

...
On Tue, Apr 6, 2021, at 8:14 AM, Radosław Piliszek wrote:

...
I am testing whether replacing xz with gzip would solve the problem [1] [2].

The reason we used xz is that the files are very large and gz compression is very poor compared to xz for these files and these files are not really human readable as is (you need to load them into journald first). Let's test it and see what the gz file sizes look like but if they are still quite large then this is unlikely to be an appropriate fix.

Let's see how bad the file sizes are.

devstack.journal.gz 23.6M Less than all the other logs together, I would not mind. I wonder how it is in other jobs (this is from the failing one). -yoctozepto

Clark Boylan

8:39 a.m.

On Tue, Apr 6, 2021, at 9:15 AM, Radosław Piliszek wrote:

...

On Tue, Apr 6, 2021 at 6:11 PM Radosław Piliszek <radoslaw.piliszek@gmail.com> wrote:

...
On Tue, Apr 6, 2021 at 5:51 PM Clark Boylan <cboylan@sapwetik.org> wrote:

...
On Tue, Apr 6, 2021, at 8:14 AM, Radosław Piliszek wrote:

...
I am testing whether replacing xz with gzip would solve the problem [1] [2].

The reason we used xz is that the files are very large and gz compression is very poor compared to xz for these files and these files are not really human readable as is (you need to load them into journald first). Let's test it and see what the gz file sizes look like but if they are still quite large then this is unlikely to be an appropriate fix.

Let's see how bad the file sizes are.

devstack.journal.gz 23.6M

Less than all the other logs together, I would not mind. I wonder how it is in other jobs (this is from the failing one).

There does seem to be a range (likely due to how much the job workload causes logging to happen in journald) from about a few megabytes to eighty something MB [3]. This is probably acceptable. Just keep an eye out for jobs that end up with much larger file sizes and we can reevaluate if we notice them. [3] https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/z...

Clark Boylan

8:46 a.m.

On Tue, Apr 6, 2021, at 9:11 AM, Radosław Piliszek wrote:

...

On Tue, Apr 6, 2021 at 6:02 PM Jeremy Stanley <fungi@yuggoth.org> wrote:

...
Looking at the error, I strongly suspect memory exhaustion. We could try tuning xz to use less memory when compressing.

Worth noting that we continue to suspect memory pressure, and in particular diving into swap, for random failures that appear timing or performance related. I still think it would be a helpful exercise for OpenStack to look at its memory consumption (remember end users will experience this too) and see if there are any unexpected areas of memory use. I think the last time i skimmed logs the privsep daemon was a large consumer because we separate instance is run for each service and they all add up.

...

That was my hunch as well, hence why I test using gzip.

On Tue, Apr 6, 2021 at 5:51 PM Clark Boylan <cboylan@sapwetik.org> wrote:

...
On Tue, Apr 6, 2021, at 8:14 AM, Radosław Piliszek wrote:

...
I am testing whether replacing xz with gzip would solve the problem [1] [2].

The reason we used xz is that the files are very large and gz compression is very poor compared to xz for these files and these files are not really human readable as is (you need to load them into journald first). Let's test it and see what the gz file sizes look like but if they are still quite large then this is unlikely to be an appropriate fix.

Let's see how bad the file sizes are. If they are acceptable, we can keep gzip and be happy. Otherwise we try to tune the params to make xz a better citizen as fungi suggested.

-yoctozepto

Martin Kopec

7 May 7 May

3:48 a.m.

hmm, seems like we have hit the issue again, however in a different job now: Latest logs: https://zuul.opendev.org/t/openstack/build/0565c3d252194f9ba67f4af20e8be65d Link to the review where it occurred: https://review.opendev.org/c/osf/refstack-client/+/788743 On Tue, 6 Apr 2021 at 18:47, Clark Boylan <cboylan@sapwetik.org> wrote:

...

On Tue, Apr 6, 2021, at 9:11 AM, Radosław Piliszek wrote:

...
On Tue, Apr 6, 2021 at 6:02 PM Jeremy Stanley <fungi@yuggoth.org> wrote:

...
Looking at the error, I strongly suspect memory exhaustion. We could try tuning xz to use less memory when compressing.

Worth noting that we continue to suspect memory pressure, and in particular diving into swap, for random failures that appear timing or performance related. I still think it would be a helpful exercise for OpenStack to look at its memory consumption (remember end users will experience this too) and see if there are any unexpected areas of memory use. I think the last time i skimmed logs the privsep daemon was a large consumer because we separate instance is run for each service and they all add up.

...
That was my hunch as well, hence why I test using gzip.

On Tue, Apr 6, 2021 at 5:51 PM Clark Boylan <cboylan@sapwetik.org>

wrote:

...
...
On Tue, Apr 6, 2021, at 8:14 AM, Radosław Piliszek wrote:

...
I am testing whether replacing xz with gzip would solve the problem

[1] [2].

...
The reason we used xz is that the files are very large and gz

compression is very poor compared to xz for these files and these files are not really human readable as is (you need to load them into journald first). Let's test it and see what the gz file sizes look like but if they are still quite large then this is unlikely to be an appropriate fix.

Let's see how bad the file sizes are. If they are acceptable, we can keep gzip and be happy. Otherwise we try to tune the params to make xz a better citizen as fungi suggested.

-yoctozepto

-- Martin

Jeremy Stanley

5:34 a.m.

On 2021-05-07 13:48:36 +0200 (+0200), Martin Kopec wrote:

...

hmm, seems like we have hit the issue again, however in a different job now: Latest logs: https://zuul.opendev.org/t/openstack/build/0565c3d252194f9ba67f4af20e8be65d Link to the review where it occurred: https://review.opendev.org/c/osf/refstack-client/+/788743 [...]

It was addressed in the master branch a month ago with https://review.opendev.org/784964 wasn't backported to any older branches (or if it was then the backports haven't merged yet). Looking at the zuul._inheritance_path from the inventory for your build, it seems to have used stable/wallaby of devstack rather than master, which explains why you're still seeing xzip used. -- Jeremy Stanley

Martin Kopec

9 May 9 May

12:54 a.m.

right, thank you. I've proposed a backport to wallaby: https://review.opendev.org/c/openstack/devstack/+/790353 and verifying it solves the problem here: https://review.opendev.org/c/osf/refstack-client/+/788743 On Fri, 7 May 2021 at 15:35, Jeremy Stanley <fungi@yuggoth.org> wrote:

...

On 2021-05-07 13:48:36 +0200 (+0200), Martin Kopec wrote:

...
hmm, seems like we have hit the issue again, however in a different job now: Latest logs:

https://zuul.opendev.org/t/openstack/build/0565c3d252194f9ba67f4af20e8be65d

...
Link to the review where it occurred: https://review.opendev.org/c/osf/refstack-client/+/788743 [...]

It was addressed in the master branch a month ago with https://review.opendev.org/784964 wasn't backported to any older branches (or if it was then the backports haven't merged yet). Looking at the zuul._inheritance_path from the inventory for your build, it seems to have used stable/wallaby of devstack rather than master, which explains why you're still seeing xzip used. -- Jeremy Stanley

-- Martin

1633

Age (days ago)

1666

Last active (days ago)

List overview

Download

10 comments

4 participants

participants (4)

Clark Boylan
Jeremy Stanley
Martin Kopec
Radosław Piliszek

[devstack][infra] POST_FAILURE on export-devstack-journal : Export journal

tags

participants (4)