[openstack-dev] [nova][ceilometer] FloatingIp pollster spamming n-api logs (bug 1328694)

Eoghan Glynn eglynn at redhat.com
Sat Jun 14 14:22:25 UTC 2014



> On 6/12/2014 10:31 AM, John Garbutt wrote:
> > On 11 June 2014 20:07, Joe Gordon <joe.gordon0 at gmail.com> wrote:
> >> On Wed, Jun 11, 2014 at 11:38 AM, Matt Riedemann
> >> <mriedem at linux.vnet.ibm.com> wrote:
> >>> On 6/11/2014 10:01 AM, Eoghan Glynn wrote:
> >>>> Thanks for bringing this to the list Matt, comments inline ...
> >>>>
> >>>>> tl;dr: some pervasive changes were made to nova to enable polling in
> >>>>> ceilometer which broke some things and in my opinion shouldn't have
> >>>>> been
> >>>>> merged as a bug fix but rather should have been a blueprint.
> >>>>>
> >>>>> ===
> >>>>>
> >>>>> The detailed version:
> >>>>>
> >>>>> I opened bug 1328694 [1] yesterday and found that came back to some
> >>>>> changes made in ceilometer for bug 1262124 [2].
> >>>>>
> >>>>> Upon further inspection, the original ceilometer bug 1262124 made some
> >>>>> changes to the nova os-floating-ips API extension and the database API
> >>>>> [3], and changes to python-novaclient [4] to enable ceilometer to use
> >>>>> the new API changes (basically pass --all-tenants when listing floating
> >>>>> IPs).
> >>>>>
> >>>>> The original nova change introduced bug 1328694 which spams the
> >>>>> nova-api
> >>>>> logs due to the ceilometer change [5] which does the polling, and right
> >>>>> now in the gate ceilometer is polling every 15 seconds.
> >>>>
> >>>>
> >>>> IIUC that polling cadence in the gate is in the process of being
> >>>> reverted
> >>>> to the out-of-the-box default of 600s.
> >>>>
> >>>>> I pushed a revert in ceilometer to fix the spam bug and a separate
> >>>>> patch
> >>>>> was pushed to nova to fix the problem in the network API.
> >>>>
> >>>>
> >>>> Thank you for that. The revert is just now approved on the ceilometer
> >>>> side,
> >>>> and is wending its merry way through the gate.
> >>>>
> >>>>> The bigger problem I see here is that these changes were all made under
> >>>>> the guise of a bug when I think this is actually a blueprint.  We have
> >>>>> changes to the nova API, changes to the nova database API, CLI changes,
> >>>>> potential performance impacts (ceilometer can be hitting the nova
> >>>>> database a lot when polling here), security impacts (ceilometer needs
> >>>>> admin access to the nova API to list floating IPs for all tenants),
> >>>>> documentation impacts (the API and CLI changes are not documented),
> >>>>> etc.
> >>>>>
> >>>>> So right now we're left with, in my mind, two questions:
> >>>>>
> >>>>> 1. Do we just fix the spam bug 1328694 and move on, or
> >>>>> 2. Do we revert the nova API/CLI changes and require this goes through
> >>>>> the nova-spec blueprint review process, which should have happened in
> >>>>> the first place.
> >>>>
> >>>>
> >>>> So just to repeat the points I made on the unlogged #os-nova IRC channel
> >>>> earlier, for posterity here ...
> >>>>
> >>>> Nova already exposed an all_tenants flag in multiple APIs (servers,
> >>>> volumes,
> >>>> security-groups etc.) and these would have:
> >>>>
> >>>>     (a) generally pre-existed ceilometer's usage of the corresponding
> >>>>     APIs
> >>>>
> >>>> and:
> >>>>
> >>>>     (b) been tracked and proposed at the time via straight-forward LP
> >>>> bugs,
> >>>>         as  opposed to being considered blueprint material
> >>>>
> >>>> So the manner of the addition of the all_tenants flag to the
> >>>> floating_ips
> >>>> API looks like it just followed existing custom & practice.
> >>>>
> >>>> Though that said, the blueprint process and in particular the nova-specs
> >>>> aspect, has been tightened up since then.
> >>>>
> >>>> My preference would be to fix the issue in the underlying API, but to
> >>>> use
> >>>> this as "a teachable moment" ... i.e. to require more oversight (in the
> >>>> form of a reviewed & approved BP spec) when such API changes are
> >>>> proposed
> >>>> in the future.
> >>>>
> >>>> Cheers,
> >>>> Eoghan
> >>>>
> >>>>> Are there other concerns here?  If there are no major objections to the
> >>>>> code that's already merged, then #2 might be excessive but we'd still
> >>>>> need docs changes.
> >>>>>
> >>>>> I've already put this on the nova meeting agenda for tomorrow.
> >>>>>
> >>>>> [1] https://bugs.launchpad.net/ceilometer/+bug/1328694
> >>>>> [2] https://bugs.launchpad.net/nova/+bug/1262124
> >>>>> [3] https://review.openstack.org/#/c/81429/
> >>>>> [4] https://review.openstack.org/#/c/83660/
> >>>>> [5] https://review.openstack.org/#/c/83676/
> >>>>>
> >>>>> --
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> Matt Riedemann
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> OpenStack-dev mailing list
> >>>>> OpenStack-dev at lists.openstack.org
> >>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>>>>
> >>>>
> >>>> _______________________________________________
> >>>> OpenStack-dev mailing list
> >>>> OpenStack-dev at lists.openstack.org
> >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>>>
> >>>
> >>> While there is precedent for --all-tenants with some of the other APIs,
> >>> I'm concerned about where this stops.  When ceilometer wants polling on
> >>> some
> >>> other resources that the nova API exposes, will it need the same thing?
> >>> Doing all of this polling for resources in all tenants in nova puts an
> >>> undue
> >>> burden on the nova API and the database.
> >>>
> >>> Can we do something with notifications here instead?  That's where the
> >>> nova-spec process would have probably caught this.
> >>
> >> ++ to notifications and not polling.
> >
> > Yeah, I think we need to revert this, and go through the specs
> > process. Its been released in Juno-1 now, so this revert feels bad,
> > but perhaps its the best of a bad situation?
> >
> > Word of caution, we need to get notifications versioned correctly if
> > we want this as a more formal external "API". I think Heat have
> > similar issues in this area, efficiently knowing about something
> > happening in Nova. So we do need to "solve this".
> >
> > Some kind of callback to ceilometer's REST API sounds attractive, but
> > messes up dependencies. Maybe implemented as a plugin to a Nova
> > "notification system", could work better somehow? A versioned
> > interface that issues "server life cycle" events (some versioned
> > subset of notifications), and allow services to implement their own
> > plugins to that interface. But maybe we should probably just version
> > the current notifications, rather than re-invent another interface.
> >
> > Thanks,
> > John
> >
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> 
> The revert patch is here:
> 
> https://review.openstack.org/#/c/99938/

As stated on gerrit ...

I would disagree with jumping to revert this API flag, for the following reasons:

1. the underlying bug has a fix under review:
   https://review.openstack.org/83667

2. reverting has the effect of retroactively applying the policy introduced in:
   http://lists.openstack.org/pipermail/openstack-dev/2014-June/037536.html

3. there are existing all_tenants flags on other nova APIs

4. surely the core problem here is the *rate* at which this API was called by
   ceilometer in the gate, and the statndard way to protect a service in that
   case is to use rate-limiting, as opposed to just removing the API?

Thanks,
Eoghan



More information about the OpenStack-dev mailing list