[openstack-dev] [nova][ceilometer] FloatingIp pollster spamming n-api logs (bug 1328694)
John Garbutt
john at johngarbutt.com
Thu Jun 12 15:31:25 UTC 2014
On 11 June 2014 20:07, Joe Gordon <joe.gordon0 at gmail.com> wrote:
> On Wed, Jun 11, 2014 at 11:38 AM, Matt Riedemann
> <mriedem at linux.vnet.ibm.com> wrote:
>> On 6/11/2014 10:01 AM, Eoghan Glynn wrote:
>>> Thanks for bringing this to the list Matt, comments inline ...
>>>
>>>> tl;dr: some pervasive changes were made to nova to enable polling in
>>>> ceilometer which broke some things and in my opinion shouldn't have been
>>>> merged as a bug fix but rather should have been a blueprint.
>>>>
>>>> ===
>>>>
>>>> The detailed version:
>>>>
>>>> I opened bug 1328694 [1] yesterday and found that came back to some
>>>> changes made in ceilometer for bug 1262124 [2].
>>>>
>>>> Upon further inspection, the original ceilometer bug 1262124 made some
>>>> changes to the nova os-floating-ips API extension and the database API
>>>> [3], and changes to python-novaclient [4] to enable ceilometer to use
>>>> the new API changes (basically pass --all-tenants when listing floating
>>>> IPs).
>>>>
>>>> The original nova change introduced bug 1328694 which spams the nova-api
>>>> logs due to the ceilometer change [5] which does the polling, and right
>>>> now in the gate ceilometer is polling every 15 seconds.
>>>
>>>
>>> IIUC that polling cadence in the gate is in the process of being reverted
>>> to the out-of-the-box default of 600s.
>>>
>>>> I pushed a revert in ceilometer to fix the spam bug and a separate patch
>>>> was pushed to nova to fix the problem in the network API.
>>>
>>>
>>> Thank you for that. The revert is just now approved on the ceilometer
>>> side,
>>> and is wending its merry way through the gate.
>>>
>>>> The bigger problem I see here is that these changes were all made under
>>>> the guise of a bug when I think this is actually a blueprint. We have
>>>> changes to the nova API, changes to the nova database API, CLI changes,
>>>> potential performance impacts (ceilometer can be hitting the nova
>>>> database a lot when polling here), security impacts (ceilometer needs
>>>> admin access to the nova API to list floating IPs for all tenants),
>>>> documentation impacts (the API and CLI changes are not documented), etc.
>>>>
>>>> So right now we're left with, in my mind, two questions:
>>>>
>>>> 1. Do we just fix the spam bug 1328694 and move on, or
>>>> 2. Do we revert the nova API/CLI changes and require this goes through
>>>> the nova-spec blueprint review process, which should have happened in
>>>> the first place.
>>>
>>>
>>> So just to repeat the points I made on the unlogged #os-nova IRC channel
>>> earlier, for posterity here ...
>>>
>>> Nova already exposed an all_tenants flag in multiple APIs (servers,
>>> volumes,
>>> security-groups etc.) and these would have:
>>>
>>> (a) generally pre-existed ceilometer's usage of the corresponding APIs
>>>
>>> and:
>>>
>>> (b) been tracked and proposed at the time via straight-forward LP
>>> bugs,
>>> as opposed to being considered blueprint material
>>>
>>> So the manner of the addition of the all_tenants flag to the floating_ips
>>> API looks like it just followed existing custom & practice.
>>>
>>> Though that said, the blueprint process and in particular the nova-specs
>>> aspect, has been tightened up since then.
>>>
>>> My preference would be to fix the issue in the underlying API, but to use
>>> this as "a teachable moment" ... i.e. to require more oversight (in the
>>> form of a reviewed & approved BP spec) when such API changes are proposed
>>> in the future.
>>>
>>> Cheers,
>>> Eoghan
>>>
>>>> Are there other concerns here? If there are no major objections to the
>>>> code that's already merged, then #2 might be excessive but we'd still
>>>> need docs changes.
>>>>
>>>> I've already put this on the nova meeting agenda for tomorrow.
>>>>
>>>> [1] https://bugs.launchpad.net/ceilometer/+bug/1328694
>>>> [2] https://bugs.launchpad.net/nova/+bug/1262124
>>>> [3] https://review.openstack.org/#/c/81429/
>>>> [4] https://review.openstack.org/#/c/83660/
>>>> [5] https://review.openstack.org/#/c/83676/
>>>>
>>>> --
>>>>
>>>> Thanks,
>>>>
>>>> Matt Riedemann
>>>>
>>>>
>>>> _______________________________________________
>>>> OpenStack-dev mailing list
>>>> OpenStack-dev at lists.openstack.org
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>
>>>
>>> _______________________________________________
>>> OpenStack-dev mailing list
>>> OpenStack-dev at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>
>> While there is precedent for --all-tenants with some of the other APIs,
>> I'm concerned about where this stops. When ceilometer wants polling on some
>> other resources that the nova API exposes, will it need the same thing?
>> Doing all of this polling for resources in all tenants in nova puts an undue
>> burden on the nova API and the database.
>>
>> Can we do something with notifications here instead? That's where the
>> nova-spec process would have probably caught this.
>
> ++ to notifications and not polling.
Yeah, I think we need to revert this, and go through the specs
process. Its been released in Juno-1 now, so this revert feels bad,
but perhaps its the best of a bad situation?
Word of caution, we need to get notifications versioned correctly if
we want this as a more formal external "API". I think Heat have
similar issues in this area, efficiently knowing about something
happening in Nova. So we do need to "solve this".
Some kind of callback to ceilometer's REST API sounds attractive, but
messes up dependencies. Maybe implemented as a plugin to a Nova
"notification system", could work better somehow? A versioned
interface that issues "server life cycle" events (some versioned
subset of notifications), and allow services to implement their own
plugins to that interface. But maybe we should probably just version
the current notifications, rather than re-invent another interface.
Thanks,
John
More information about the OpenStack-dev
mailing list