[openstack-dev] [ceilometer]ceilometer-collector high CPU usage

Gyorgy Szombathelyi gyorgy.szombathelyi at doclerholding.com
Wed Feb 17 15:46:29 UTC 2016


Hi all,

I did some more debugging with pdb, and seems to be the problem somehow connected to this eventlet issue:
https://github.com/eventlet/eventlet/issues/30

I don't have a clue if it has any connections to the Rabbit heartbeat thing, but if I change the self.wait(0)
to self.wait(0.1) in eventlet/hubs/hub.py, then the CPU usage drops significantly.

Br,
György

> -----Original Message-----
> From: Gyorgy Szombathelyi
> [mailto:gyorgy.szombathelyi at doclerholding.com]
> Sent: 2016 február 17, szerda 14:47
> To: 'openstack-dev at lists.openstack.org' <openstack-
> dev at lists.openstack.org>
> Subject: Re: [openstack-dev] [ceilometer]ceilometer-collector high CPU
> usage
> 
> >
> > hi,
> Hi Gordon,
> 
> >
> > this seems to be similar to a bug we were tracking in earlier[1].
> > basically, any service with a listener never seemed to idle properly.
> >
> > based on earlier investigation, we found it relates to the heartbeat
> > functionality in oslo.messaging. i'm not entirely sure if it's because
> > of it or some combination of things including it. the short answer, is
> > to disable heartbeat by setting heartbeat_timeout_threshold = 0 and
> > see if it fixes your cpu usage. you can track the comments in bug.
> 
> As I see in the bug report, you mention that the problem is only with the
> notification agent, and the collector is fine. I'm in an entirely opposite else
> situtation.
> 
> starce-ing the two processes:
> 
> Notification agent:
> ----------------------
> epoll_wait(4, {}, 1023, 43)             = 0
> epoll_wait(4, {}, 1023, 0)              = 0
> epoll_ctl(4, EPOLL_CTL_DEL, 8,
> {EPOLLWRNORM|EPOLLMSG|EPOLLERR|EPOLLHUP|EPOLLRDHUP|EPOLLON
> ESHOT|EPOLLET|0x1ec88000, {u32=32738, u64=24336577484324834}}) = 0
> recvfrom(8, 0x7fe2da3a4084, 7, 0, 0, 0) = -1 EAGAIN (Resource temporarily
> unavailable) epoll_ctl(4, EPOLL_CTL_ADD, 8,
> {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP, {u32=8,
> u64=40046962262671368}}) = 0
> epoll_wait(4, {}, 1023, 1)              = 0
> epoll_ctl(4, EPOLL_CTL_DEL, 24,
> {EPOLLWRNORM|EPOLLMSG|EPOLLERR|EPOLLHUP|EPOLLRDHUP|EPOLLON
> ESHOT|EPOLLET|0x1ec88000, {u32=32738, u64=24336577484324834}}) = 0
> recvfrom(24, 0x7fe2da3a4084, 7, 0, 0, 0) = -1 EAGAIN (Resource temporarily
> unavailable) epoll_ctl(4, EPOLL_CTL_ADD, 24,
> {EPOLLIN|EPOLLPRI|EPOLLERR|EPOLLHUP, {u32=24,
> u64=40046962262671384}}) = 0
> epoll_wait(4, {}, 1023, 0)              = 0
> 
> ceilometer-collector:
> -------------------------
> epoll_wait(4, {}, 1023, 0)              = 0
> epoll_wait(4, {}, 1023, 0)              = 0
> epoll_wait(4, {}, 1023, 0)              = 0
> epoll_wait(4, {}, 1023, 0)              = 0
> epoll_wait(4, {}, 1023, 0)              = 0
> epoll_wait(4, {}, 1023, 0)              = 0
> epoll_wait(4, {}, 1023, 0)              = 0
> epoll_wait(4, {}, 1023, 0)              = 0
> epoll_wait(4, {}, 1023, 0)              = 0
> epoll_wait(4, {}, 1023, 0)              = 0
> 
> So the notification agent do something at least between the crazy epoll()s.
> 
> It is the same with or without the heartbeat_timeout_threshold = 0 in
> [oslo_messaging_rabbit].
> Then something must be still wrong with the listeners, the bug[1] should not
> be closed, I think.
> 
> Br,
> György
> 
> >
> > [1] https://bugs.launchpad.net/oslo.messaging/+bug/1478135
> >
> > On 17/02/2016 4:14 AM, Gyorgy Szombathelyi wrote:
> > > Hi!
> > >
> > > Excuse me, if the following question/problem is a basic one, already
> > > known problem, or even a bad setup on my side.
> > >
> > > I just noticed that the most CPU consuming process in an idle
> > > OpenStack cluster is ceilometer-collector. When there are only
> > > 10-15 samples/minute, it just constantly eats about 15-20% CPU.
> > >
> > > I started to debug, and noticed that it epoll()s constantly with a
> > > zero timeout, so it seems it just polls for events in a tight loop.
> > > I found out that the _maybe_ the python side of the problem is
> > > oslo_messaging.get_notification_listener() with the eventlet executor.
> > > A quick search showed that this function is only used in
> > > aodh_listener and ceilometer_collector, and both are using
> > > relatively high CPU even if they're just 'listening'.
> > >
> > > My skills for further debugging is limited, but I'm just curious why
> > > this listener uses so much CPU, while other executors, which are
> > > using eventlet, are not that bad. Excuse me, if it was a basic
> > > question, already known problem, or even a bad setup on my side.
> > >
> > > Br,
> > > György
> > >
> > >
> >
> __________________________________________________________
> > ____________
> > > ____ OpenStack Development Mailing List (not for usage questions)
> > > Unsubscribe:
> > > OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > >
> >
> > --
> > gord
> >
> >
> __________________________________________________________
> > ________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: OpenStack-dev-
> > request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> __________________________________________________________
> ________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-
> request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list