[OpenStack-Infra] [nodepool.o.o] Image builder cleanup

Ian Wienand iwienand at redhat.com
Mon Nov 7 05:08:00 UTC 2016


Hi all,

I noticed that nodepool was failing to build, out of space again.  We
haven't had a build in about 3 days.

Unlike last time, there wasn't anything to cleanup in the cache; it
all seemed to be images.

---
ianw at nodepool:/opt$ sudo du -sh ./*/
16G	./dib_cache/
12G	./dib_tmp/
704K	./gear/
16K	./lost+found/
7.2M	./nodepool/
914G	./nodepool_dib/
66M	./system-config/
5.6G	./test_images/
---

The image list at the time I started looked like

nodepool at nodepool:~$ nodepool dib-image-list
2016-11-06 23:41:11,267 INFO gear.Connection.nodepool: Disconnected from zuul.openstack.org port 4730
2016-11-06 23:41:11,311 INFO gear.Connection.nodepool: Connected to zuul.openstack.org port 4730
+------+----------------+---------------------------------------------+------------+-------+-------------+
| ID   | Image          | Filename                                    | Version    | State | Age         |
+------+----------------+---------------------------------------------+------------+-------+-------------+
| 1357 | centos-7       | /opt/nodepool_dib/centos-7-1477305240       | 1477305240 | ready | 13:08:03:05 |
| 1415 | centos-7       | /opt/nodepool_dib/centos-7-1478169240       | 1478169240 | ready | 03:08:42:23 |
| 1355 | debian-jessie  | /opt/nodepool_dib/debian-jessie-1477305240  | 1477305240 | ready | 13:09:21:24 |
| 1413 | debian-jessie  | /opt/nodepool_dib/debian-jessie-1478169240  | 1478169240 | ready | 03:09:58:31 |
| 1411 | fedora-23      | /opt/nodepool_dib/fedora-23-1478169240      | 1478169240 | ready | 03:11:41:23 |
| 1418 | fedora-23      | /opt/nodepool_dib/fedora-23-1478255640      | 1478255640 | ready | 02:11:38:16 |
| 1354 | fedora-24      | /opt/nodepool_dib/fedora-24-1477305240      | 1477305240 | ready | 13:10:35:50 |
| 1361 | fedora-24      | /opt/nodepool_dib/fedora-24-1477391640      | 1477391640 | ready | 12:10:28:52 |
| 1342 | ubuntu-precise | /opt/nodepool_dib/ubuntu-precise-1477132440 | 1477132440 | ready | 15:07:10:03 |
| 1349 | ubuntu-precise | /opt/nodepool_dib/ubuntu-precise-1477218840 | 1477218840 | ready | 14:07:24:06 |
| 1344 | ubuntu-trusty  | /opt/nodepool_dib/ubuntu-trusty-1477132440  | 1477132440 | ready | 15:04:45:19 |
| 1416 | ubuntu-trusty  | /opt/nodepool_dib/ubuntu-trusty-1478169240  | 1478169240 | ready | 03:06:59:33 |
| 1345 | ubuntu-xenial  | /opt/nodepool_dib/ubuntu-xenial-1477132440  | 1477132440 | ready | 15:03:23:41 |
| 1417 | ubuntu-xenial  | /opt/nodepool_dib/ubuntu-xenial-1478169240  | 1478169240 | ready | 03:05:01:40 |
+------+----------------+---------------------------------------------+------------+-------+-------------+

Well there was a lot of left-over builds in /opt/nodepool_dib, which
I've dumped into /opt/nodepool_dib/ianw-cleanup-2016-11.07.txt

I removed all the old builds listed in that file (i.e. all builds not
listed above).  This got us to a usable amount of free space

 /dev/mapper/main-nodepoolbuild 1008G  579G  430G  58% /opt

I then noticed that nodepool was stuck building *a lot* of old images

 nodepool at nodepool:/opt/nodepool_dib$ nodepool image-list | grep building | wc -l
 826

I went through an did an image-delete on each of these building
instances to clear things out.  I have started some image builds now
to see what the deal is.  I will keep an eye on them.

It's currently very hard to debug the upload process.  I'll soon propose some changes
to split the upload logs out into provider log files, similar to the way we split the
build logs out into separate files.  I think this will help to diagnose issues on
specific providers much quicker.

-i



More information about the OpenStack-Infra mailing list