[OpenStack-Infra] Zuul memory leak

Mikhail Medvedev mihailmed at gmail.com
Tue Mar 8 16:13:20 UTC 2016


On Mon, Mar 7, 2016 at 6:04 PM, Joshua Hesketh <joshua.hesketh at gmail.com> wrote:
> Hi Mikhail,
>
> Okay thanks, that's helpful.
>
> You mentioned that you might try restarting zuul periodically to see if that
> helps. Perhaps instead you could do a reload (or HUP) first to see if that
> clears the cache and alleviates the issue for you?

SIGHUP (kill -1) does get configuration reloaded (according to logs), but
I saw no immediate effect on memory footprint. At the time of test,
zuul-server was at
3GB (while 24 hours earlier it was at 2GB). Unfortunately I had to restart
zuul-server due to unrelated problems, so now I need to wait some time before
being able to test again. I would definitely go the periodic SIGHUP
route if it proves to
work, that is a good idea.

>
> Cheers,
> Josh
>
> On Tue, Mar 8, 2016 at 10:53 AM, Mikhail Medvedev <mihailmed at gmail.com>
> wrote:
>>
>> Hi Josh,
>>
>> On Mon, Mar 7, 2016 at 5:25 PM, Joshua Hesketh <joshua.hesketh at gmail.com>
>> wrote:
>> > Hi Mikhail,
>> >
>> > Thank you for the extra details. I'll continue to look into this.
>> >
>> > With the daily bumps when you do the log rotation, I assume you aren't
>> > reloading zuul at that point and the freed memory is likely due to
>> > another
>> > process?
>>
>> I was puzzled by the bumps, and checked the syslog. They are definitely
>> due to
>> "run-parts --report /etc/cron.daily" being triggered at 06:25, and not
>> zuul reloads.
>> The memory bumps could be due to any of the cron jobs. logrotate seemed
>> likely.
>> For the record:
>>
>> root at zuul:~# ls /etc/cron.daily
>> apache2  apport  apt  aptitude  bsdmainutils  dpkg  exim4-base
>> logrotate  man-db  mlocate  ntp  passwd  update-notifier-common
>> upstart
>>
>> I have also confirmed there were no changes to zuul layout for the
>> interval that
>> the graph shows.
>>
>> >
>> > Cheers,
>> > Josh
>> >
>> > On Tue, Mar 8, 2016 at 10:17 AM, Mikhail Medvedev <mihailmed at gmail.com>
>> > wrote:
>> >>
>> >> On Wed, Feb 10, 2016 at 10:57 AM, James E. Blair <corvus at inaugust.com>
>> >> wrote:
>> >> > Michael Still <mikal at stillhq.com> writes:
>> >> >
>> >> >> On Tue, Feb 9, 2016 at 4:59 AM, Joshua Hesketh
>> >> >> <joshua.hesketh at gmail.com>
>> >> >> wrote:
>> >> >>
>> >> >>> On Thu, Feb 4, 2016 at 2:44 AM, James E. Blair
>> >> >>> <corvus at inaugust.com>
>> >> >>> wrote:
>> >> >>>>
>> >> >>>> On the subject of clearing the cache more often, I think we may
>> >> >>>> not
>> >> >>>> want
>> >> >>>> to wipe out the cache more often than we do now -- in fact, I
>> >> >>>> think
>> >> >>>> we
>> >> >>>> may want to look into ways to keep from doing even that, because
>> >> >>>> whenever we reload now, Zuul slows down considerably as it has to
>> >> >>>> query
>> >> >>>> Gerrit again for all of the data previously in its cache.
>> >> >>>>
>> >> >>>
>> >> >>> I can see a lot of 3rd parties or simpler CI's not needing to
>> >> >>> reload
>> >> >>> zuul
>> >> >>> very often so this cache would never get cleared. Perhaps cached
>> >> >>> objects
>> >> >>> should have an expiry time (of a day or so) and can be cleaned up
>> >> >>> periodically? Additionally if clearing the cache on a reload is
>> >> >>> causing
>> >> >>> pain maybe we should move the cache into the scheduler and keep it
>> >> >>> between
>> >> >>> reloads?
>> >> >>>
>> >> >>
>> >> >> Do you guys use oslo at all? I ask because the olso memcache stuff
>> >> >> does
>> >> >> exactly this, so it should be trivial to implement if you don't mind
>> >> >> depending on oslo.
>> >> >
>> >> > One of the main things we use the cache for is to ensure that every
>> >> > change is represented by a single Change object in Zuul's memory.
>> >> > The
>> >> > graph of enqueued Items link to their respective Changes which may
>> >> > link
>> >> > to each other due to dependencies.  When something changes in Gerrit,
>> >> > we
>> >> > want that reflected immediately and consistently in all of the
>> >> > objects
>> >> > in that graph.  Using the cache means that every time we add a new
>> >> > Change object to that graph, we use the same object for a given
>> >> > change.
>> >> >
>> >> > This is why we can't use time-based expiry -- we must not drop
>> >> > objects
>> >> > from the cache if they are still in the graph.  Otherwise we will
>> >> > create
>> >> > new duplicative objects and the ones still in the graph will not be
>> >> > updated.
>> >> >
>> >> > Perhaps we should change these objects to something more ephemeral
>> >> > that
>> >> > can proxy for some other mechanism that can operate more like a
>> >> > traditional cache (with time-based expiry).  But I think changes to
>> >> > this
>> >> > system should happen in Zuulv3 -- it works well enough for Zuulv2 for
>> >> > now.
>> >> >
>> >> > -Jim
>> >> >
>> >>
>> >> We are one of third-party CIs and using "Zuul version: 2.1.1.dev123",
>> >> which is one commit after [1]. That one commit after is not in tree - I
>> >> am
>> >> applying [2] on top.
>> >>
>> >> The VM has 8GB of RAM. zuul-server memory footprint goes up
>> >> consistently
>> >> over
>> >> the course of a week. Normally it takes about 3-4 days to get over to
>> >> 3Gb.
>> >> About a week ago I witnessed zuul-server get to 95% of RAM, at which
>> >> point
>> >> kernel started killing other processes. The graph [3] memory [3], and
>> >> it
>> >> reflects zuul-server consumption. The daily bumps on the graph are
>> >> daily
>> >> cron
>> >> doing log rotation etc, possibly flushing caches.
>> >>
>> >> I can not say 100% that it is still the leak. Could simply be that
>> >> zuul-server
>> >> requires more ram now.
>> >>
>> >> [1]
>> >>
>> >> https://review.openstack.org/#q,I81ee47524cda71a500c55a95a2280f491b1b63d9,n,z
>> >> [2]
>> >>
>> >> https://review.openstack.org/#q,If3a418fa2d4993a149d454e02a9b26529e4b6825,n,z
>> >> [3] http://imgur.com/SzqSA1H
>> >>
>> >> Mikhail Medvedev (mmedvede)
>> >>
>> >> _______________________________________________
>> >> OpenStack-Infra mailing list
>> >> OpenStack-Infra at lists.openstack.org
>> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
>> >
>> >
>
>



More information about the OpenStack-Infra mailing list