[openstack-dev] [nova] Fixing the console.log grows forever bug.

Dave Walker email at daviey.com
Mon Dec 8 13:20:19 UTC 2014


On 8 December 2014 at 10:33, Daniel P. Berrange <berrange at redhat.com> wrote:
> On Sat, Dec 06, 2014 at 04:38:52PM +1100, Tony Breeds wrote:
>> Hi All,
>>     In the most recent team meeting we briefly discussed: [1] where the
>> console.log grows indefinitely, eventually causing guest stalls.  I mentioned
>> that I was working on a spec to fix this issue.
>>
>> My original plan was fairly similar to [2]  In that we'd switch libvirt/qemu to
>> using a unix domain socket and write a simple helper to read from that socket
>> and write to disk.  That helper would close and reopen the on disk file upon
>> receiving a HUP (so logrotate just works).   Life would be good. and we could
>> all move on.
>>
>> However I was encouraged to investigate fixing this in qemu, such that qemu
>> could process the HUP and make life better for all.  This is certainly doable
>> and I'm happy[3] to do this work.  I've floated the idea past qemu-devel and
>> they seem okay with the idea.  My main concern is in lag and supporting
>> qemu/libvirt that can't handle this option.
>
> As mentioned in my reply on qemu-devel, I think the right long term solution
> for this is to fix it in libvirt. We have a general security goal to remove
> QEMU's ability to open any files whatsoever, instead having it receive all
> host resources as pre-opened file descriptors from libvirt. So what we
> anticipate is a new libvirt daemon for processing logs, virtlogd. Anywhere
> where QEMU currently gets a file to log to (<serial> devices, and its
> stdout/stderr), it would instead be given a FD that's connected to virtlogd.
> virtlogd would simply write the data out to file & would be able to close
> & re-open files to integrate with logrotate.
>
>> For the sake of discussion  I'll lay out my best guess right now on fixing this
>> in qemu.
>>
>> qemu 2.2.0 /should/ release this year the ETA is 2014-12-09[4] so the fix I'm
>> proposing would be available in qemu 2.3.0 which I think will be available in
>> June/July 2015.  So we'd be into 'L' development before this fix is available
>> and possibly 'M' before the community distros (Fedora and  Ubuntu)[5] include
>> and almost certainly longer for Enterprise distros.  Along with the qemu
>> development I expect there to be some libvirt development as well but right now
>> I don't think that's critical to the feature or this discussion.
>>
>> So if that timeline is approximately correct:
>>
>> - Can we wait this long to fix the bug?  As opposed to having it squashed in Kilo.
>> - What do we do in nova for the next ~12 months while know there isn't a qemu to fix this?
>> - Then once there is a qemu that fixes the issue, do we just say 'thou must use
>>   qemu 2.3.0' or would nova still need to support old and new qemu's ?
>
> FWIW, by comparison libvirt is on a monthly release schedule, so a fix done in
> libvirt has potential to be available sooner, though obviously there's bigger
> dev work to be done in libvirt for this.
>
> Regards,
> Daniel

Hey,

This thread started by suggesting having a scheduled task to read from
a unix socket.  I don't think this can really be considered an
acceptable fix, as the guest does indeed lock up when the buffer is
full.

Initially, I proposed a quick fix for this back in 2011 which provided
a config option to enable a kernel level ring buffer via a
non-mainline module called emlog.  This was not merged for
understandable reasons.  (pre gerrit) -
http://bazaar.launchpad.net/~davewalker/nova/832507_with_emlog/revision/1509/nova/virt/libvirt/connection.py

Later that same year, Robie Basak presented a change which introduced
similar logic ringbuffer support in the nova code itself making use of
eventlet. This seems quite a reasonable fix, but there was concern it
might lock-up guests.. https://review.openstack.org/#/c/706/

I think shortly after this, it was pretty widely agreed that fixing
this in Nova is not the correct layer.  Personally, I struggle
thinking qemu or libvirt is right layer either.  I can't think that
treating a console as a flat log file is the best default behavior.

I still quite like the emlog approach, as having a ringbuffer device
type in the kernel provides exactly what we need and is pretty simple
to implement.

Does anyone know if this generic ringbuffer kernel support was
proposed to mainline kernel?

--
Kind Regards,
Dave Walker



More information about the OpenStack-dev mailing list