[Openstack-operators] /var/lib/nova/instances fs filled up corrupting my Linux instances

Michael Still mikal at stillhq.com
Fri Mar 15 01:13:46 UTC 2013


On Thu, Mar 14, 2013 at 8:50 PM, Blair Bethwaite
<blair.bethwaite at gmail.com> wrote:

> I think Joe has hit on the crux of this here. There are so many permutations
> of possible deployments that it just isn't reasonable to test everything,
> let alone even think of all the possibilities, e.g., to continue with the
> current example, what happens if I have several different pools of shared
> storage servicing distinct sets of compute nodes under the same Nova
> deployment...? So it seems reasonable when looking at issues like this that
> they should be viewed through the prism of a novice OS operator (experienced
> sysadmin but little domain knowledge).
>
> For this particular problem I don't think either the on or off default is
> satisfactory. Nova needs to know whether it's dealing with shared storage or
> not. Where it is, the default should be off, where it's not, on.

Grizzly attempts to handle this very situation -- it tries to learn
the topology of your storage and do sensible things. Unfortunately Joe
was aware of a bug but didn't report it, and now we don't have time to
fix it for grizzly RC1, which is a shame. Perhaps we'll get it fixed
for RC2, although it still seems that a bug hasn't been filed.

I'd expect to see this feature on by default in grizzly. Its been off
for two releases, which should be ample time for operators to have run
it in their labs. Its now time to deploy it for real. Its important
that we enable it because nova is too hard to configure correctly, and
we need to reduce the number of "make it work" flags that operators
need to add to deployments. Ultimately nova is in the business of
managing resources on compute nodes -- its creating and destroying
VMs, disk volumes, iscsi endpoints and so forth. Cached disk images
are no different.

I think my overall learning from this thread is that there's no point
disabling features for a few releases so that operators can test in a
staged manner -- the reality is that testing doesn't occur.

Michael



More information about the OpenStack-operators mailing list