[Openstack-operators] /var/lib/nova/instances fs filled up corrupting my Linux instances

Michael Still mikal at stillhq.com
Thu Mar 14 15:00:42 UTC 2013


On Thu, Mar 14, 2013 at 10:51 AM, Joe Topjian <joe.topjian at cybera.ca> wrote:
> https://bugs.launchpad.net/nova/+bug/1126375
>
> Admittedly, the bug report does not explain the scenario in detail, but I
> noted "No matter how many precautions are taken, some scenarios will still
> slip by." which I still firmly believe. My intention was to push for the
> cleanup to be turned off by default before discussing the possible ways it
> would't work as expected. I felt that if by simply describing the scenario,
> that single scenario would be accounted for but thought would not go into
> any other ways it could happen (I feel this is what happened with the NeCTAR
> incident).
>
> I fully admit to being difficult with this, but it's something I believe
> strongly in. I have never run into another service or package that has a
> task enabled by default which deletes (rather than archives or recycles)
> data. I am all for these types of cleanup tasks, but feel they must be
> opt-in.

I vetoed that review and I'd do it again. Nothing you have said has
convinced me that the cleaner should be turned off by default. What
went wrong with Nectar is that they deployed code without testing it
in their environment. We've already discussed my feelings about that.

Frankly, I think its much worse to disable it and have compute nodes
fill their disks, than to have automated cleanup. The whole point of
cloud infrastructure is to manage machines so you don't have to.
Performing a manual cleanup on 10,000 compute nodes is not something
we should force operators to do.

A simple example of something which cleans caches would be squid. I'm
sure I can find other examples trivially. We can't ship software with
an unbounded disk cache -- its guaranteed to hurt users.

If you find bugs, report the actual bug. The devs aren't clairvoyant,
and we can only fix things we're told about. I've now spent a year
talking to ops people and trying to get them to help us help them
(tagging bugs with the ops tag for example). I've spent the majority
of my development time trying to make things easier for ops folks. I
am very offended that you think we're deliberately trying to make your
life harder, and to be honest it makes me wonder why I bother spending
time trying to help.

Michael



More information about the OpenStack-operators mailing list