<div dir="ltr">Hi Mikhail,<div><br></div><div>Okay thanks, that's helpful.</div><div><br></div><div>You mentioned that you might try restarting zuul periodically to see if that helps. Perhaps instead you could do a reload (or HUP) first to see if that clears the cache and alleviates the issue for you?</div><div><br></div><div>Cheers,<br>Josh</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Mar 8, 2016 at 10:53 AM, Mikhail Medvedev <span dir="ltr"><<a href="mailto:mihailmed@gmail.com" target="_blank">mihailmed@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Josh,<br>
<span class=""><br>
On Mon, Mar 7, 2016 at 5:25 PM, Joshua Hesketh <<a href="mailto:joshua.hesketh@gmail.com">joshua.hesketh@gmail.com</a>> wrote:<br>
> Hi Mikhail,<br>
><br>
> Thank you for the extra details. I'll continue to look into this.<br>
><br>
> With the daily bumps when you do the log rotation, I assume you aren't<br>
> reloading zuul at that point and the freed memory is likely due to another<br>
> process?<br>
<br>
</span>I was puzzled by the bumps, and checked the syslog. They are definitely due to<br>
"run-parts --report /etc/cron.daily" being triggered at 06:25, and not<br>
zuul reloads.<br>
The memory bumps could be due to any of the cron jobs. logrotate seemed likely.<br>
For the record:<br>
<br>
root@zuul:~# ls /etc/cron.daily<br>
apache2 apport apt aptitude bsdmainutils dpkg exim4-base<br>
logrotate man-db mlocate ntp passwd update-notifier-common<br>
upstart<br>
<br>
I have also confirmed there were no changes to zuul layout for the interval that<br>
the graph shows.<br>
<div class="HOEnZb"><div class="h5"><br>
><br>
> Cheers,<br>
> Josh<br>
><br>
> On Tue, Mar 8, 2016 at 10:17 AM, Mikhail Medvedev <<a href="mailto:mihailmed@gmail.com">mihailmed@gmail.com</a>><br>
> wrote:<br>
>><br>
>> On Wed, Feb 10, 2016 at 10:57 AM, James E. Blair <<a href="mailto:corvus@inaugust.com">corvus@inaugust.com</a>><br>
>> wrote:<br>
>> > Michael Still <<a href="mailto:mikal@stillhq.com">mikal@stillhq.com</a>> writes:<br>
>> ><br>
>> >> On Tue, Feb 9, 2016 at 4:59 AM, Joshua Hesketh<br>
>> >> <<a href="mailto:joshua.hesketh@gmail.com">joshua.hesketh@gmail.com</a>><br>
>> >> wrote:<br>
>> >><br>
>> >>> On Thu, Feb 4, 2016 at 2:44 AM, James E. Blair <<a href="mailto:corvus@inaugust.com">corvus@inaugust.com</a>><br>
>> >>> wrote:<br>
>> >>>><br>
>> >>>> On the subject of clearing the cache more often, I think we may not<br>
>> >>>> want<br>
>> >>>> to wipe out the cache more often than we do now -- in fact, I think<br>
>> >>>> we<br>
>> >>>> may want to look into ways to keep from doing even that, because<br>
>> >>>> whenever we reload now, Zuul slows down considerably as it has to<br>
>> >>>> query<br>
>> >>>> Gerrit again for all of the data previously in its cache.<br>
>> >>>><br>
>> >>><br>
>> >>> I can see a lot of 3rd parties or simpler CI's not needing to reload<br>
>> >>> zuul<br>
>> >>> very often so this cache would never get cleared. Perhaps cached<br>
>> >>> objects<br>
>> >>> should have an expiry time (of a day or so) and can be cleaned up<br>
>> >>> periodically? Additionally if clearing the cache on a reload is<br>
>> >>> causing<br>
>> >>> pain maybe we should move the cache into the scheduler and keep it<br>
>> >>> between<br>
>> >>> reloads?<br>
>> >>><br>
>> >><br>
>> >> Do you guys use oslo at all? I ask because the olso memcache stuff does<br>
>> >> exactly this, so it should be trivial to implement if you don't mind<br>
>> >> depending on oslo.<br>
>> ><br>
>> > One of the main things we use the cache for is to ensure that every<br>
>> > change is represented by a single Change object in Zuul's memory. The<br>
>> > graph of enqueued Items link to their respective Changes which may link<br>
>> > to each other due to dependencies. When something changes in Gerrit, we<br>
>> > want that reflected immediately and consistently in all of the objects<br>
>> > in that graph. Using the cache means that every time we add a new<br>
>> > Change object to that graph, we use the same object for a given change.<br>
>> ><br>
>> > This is why we can't use time-based expiry -- we must not drop objects<br>
>> > from the cache if they are still in the graph. Otherwise we will create<br>
>> > new duplicative objects and the ones still in the graph will not be<br>
>> > updated.<br>
>> ><br>
>> > Perhaps we should change these objects to something more ephemeral that<br>
>> > can proxy for some other mechanism that can operate more like a<br>
>> > traditional cache (with time-based expiry). But I think changes to this<br>
>> > system should happen in Zuulv3 -- it works well enough for Zuulv2 for<br>
>> > now.<br>
>> ><br>
>> > -Jim<br>
>> ><br>
>><br>
>> We are one of third-party CIs and using "Zuul version: 2.1.1.dev123",<br>
>> which is one commit after [1]. That one commit after is not in tree - I am<br>
>> applying [2] on top.<br>
>><br>
>> The VM has 8GB of RAM. zuul-server memory footprint goes up consistently<br>
>> over<br>
>> the course of a week. Normally it takes about 3-4 days to get over to 3Gb.<br>
>> About a week ago I witnessed zuul-server get to 95% of RAM, at which point<br>
>> kernel started killing other processes. The graph [3] memory [3], and it<br>
>> reflects zuul-server consumption. The daily bumps on the graph are daily<br>
>> cron<br>
>> doing log rotation etc, possibly flushing caches.<br>
>><br>
>> I can not say 100% that it is still the leak. Could simply be that<br>
>> zuul-server<br>
>> requires more ram now.<br>
>><br>
>> [1]<br>
>> <a href="https://review.openstack.org/#q,I81ee47524cda71a500c55a95a2280f491b1b63d9,n,z" rel="noreferrer" target="_blank">https://review.openstack.org/#q,I81ee47524cda71a500c55a95a2280f491b1b63d9,n,z</a><br>
>> [2]<br>
>> <a href="https://review.openstack.org/#q,If3a418fa2d4993a149d454e02a9b26529e4b6825,n,z" rel="noreferrer" target="_blank">https://review.openstack.org/#q,If3a418fa2d4993a149d454e02a9b26529e4b6825,n,z</a><br>
>> [3] <a href="http://imgur.com/SzqSA1H" rel="noreferrer" target="_blank">http://imgur.com/SzqSA1H</a><br>
>><br>
>> Mikhail Medvedev (mmedvede)<br>
>><br>
>> _______________________________________________<br>
>> OpenStack-Infra mailing list<br>
>> <a href="mailto:OpenStack-Infra@lists.openstack.org">OpenStack-Infra@lists.openstack.org</a><br>
>> <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra</a><br>
><br>
><br>
</div></div></blockquote></div><br></div>