[openstack-dev] [Heat] event table is a ticking time bomb
Angus Salkeld
asalkeld at redhat.com
Mon Aug 12 23:06:29 UTC 2013
On 13/08/13 10:48 +1200, Steve Baker wrote:
>On 08/13/2013 10:39 AM, Angus Salkeld wrote:
>> On 12/08/13 16:52 -0400, Doug Hellmann wrote:
>>> On Mon, Aug 12, 2013 at 4:11 PM, Clint Byrum <clint at fewbar.com> wrote:
>>>
>>>> Excerpts from Doug Hellmann's message of 2013-08-12 12:08:58 -0700:
>>>> > On Fri, Aug 9, 2013 at 11:56 AM, Clint Byrum <clint at fewbar.com>
>>>> wrote:
>>>> >
>>>> > > Excerpts from Sandy Walsh's message of 2013-08-09 06:16:55 -0700:
>>>> > > >
>>>> > > > On 08/08/2013 11:36 PM, Angus Salkeld wrote:
>>>> > > > > On 08/08/13 13:16 -0700, Clint Byrum wrote:
>>>> > > > >> Last night while reviewing a feature which would add more
>>>> events
>>>> to
>>>> > > the
>>>> > > > >> event table, it dawned on me that the event table really
>>>> must be
>>>> > > removed.
>>>> > > > >
>>>> > > > >
>>>> > > > >>
>>>> > > > >> https://bugs.launchpad.net/heat/+bug/1209492
>>>> > > > >>
>>>> > > > >> tl;dr: users can write an infinite number of rows to the event
>>>> table
>>>> > > at
>>>> > > > >> a fairly alarming rate just by creating and updating a very
>>>> large
>>>> > > stack
>>>> > > > >> that has no resources that cost any time or are even billable
>>>> (like an
>>>> > > > >> autoscaling launch configuration).
>>>> > > > >>
>>>> > > > >> The table has no purge function, so the only way to clear
>>>> out old
>>>> > > events
>>>> > > > >> is to delete the stack, or manually remove them directly in
>>>> the
>>>> > > database.
>>>> > > > >>
>>>> > > > >> We've all been through this before, logging to a database
>>>> seems
>>>> great
>>>> > > > >> until you actually do it.
>>>> > > > >>
>>>> > > > >> I have some ideas for how to solve it, but I wanted to get
>>>> a wider
>>>> > > > >> audience:
>>>> > > > >>
>>>> > > > >> 1) Make the event list a ring buffer. Have rows 0 -
>>>> $MAX_BUFFER_SIZE
>>>> > > in
>>>> > > > >> each stack, and simply write each new event to the next open
>>>> position,
>>>> > > > >> wrapping at $MAX_BUFFER_SIZE. Pros: little change to
>>>> current code,
>>>> > > > >> just need an offset column added and code that will
>>>> properly wrap
>>>> to 0
>>>> > > > >> at $MAX_BUFFER_SIZE. Cons: still can incur heavy transactional
>>>> load on
>>>> > > > >> the database server.A
>>>> > > > >>
>>>> > > > >> 1.b) Same, but instead of rows, just maintain a blob and
>>>> append
>>>> the
>>>> > > rows
>>>> > > > >> as json list. Lowers transactional load but would push some
>>>> load
>>>> onto
>>>> > > > >> the API servers and such to parse these out, and would make
>>>> pagination
>>>> > > > >> challenging. Blobs also can be a drain on DB server
>>>> performance.
>>>> > > > >>
>>>> > > > >> 2) Write a purge script. Delete old ones. Pros: No code
>>>> change,
>>>> just
>>>> > > > >> new code to do purging. Cons: same as 1, plus more
>>>> vulnerability
>>>> to an
>>>> > > > >> aggressive attacker who can fit a lot of data in between
>>>> purges.
>>>> Also
>>>> > > > >> large scale deletes can be really painful (see: keystone
>>>> sql token
>>>> > > > >> backend).
>>>> > > > >>
>>>> > > > >> 3) Log events to Swift. I can't seem to find information on
>>>> how/if
>>>> > > > >> appending works there. Tons of tiny single-row files is an
>>>> option,
>>>> > > but I
>>>> > > > >> want to hear from people with more swift knowledge if that
>>>> is a
>>>> > > viable,
>>>> > > > >> performant option. Pros: Scale to the moon. Can charge
>>>> tenant for
>>>> > > usage
>>>> > > > >> and let them purge events as needed. Cons: Adds swift as a
>>>> requirement
>>>> > > > >> of Heat.
>>>> > > > >>
>>>> > > > >> 4) Provide a way for users to receive logs via HTTP POST.
>>>> Pros:
>>>> Simple
>>>> > > > >> and punts the problem to the users. Cons: users will be SoL if
>>>> they
>>>> > > > >> don't have a place to have logs posted to.
>>>> > > > >>
>>>> > > > >> 5) Provide a way for users to receive logs via messaging
>>>> service
>>>> like
>>>> > > > >> Marconi. Pros/Cons: same as HTTP, but perhaps a little more
>>>> confusing
>>>> > > > >> and ambitious given Marconi's short existence.
>>>> > > > >>
>>>> > > > >> 6) Provide a pluggable backend for logging. This seems like
>>>> the
>>>> way
>>>> > > most
>>>> > > > >> OpenStack projects solve these issues, which is to let the
>>>> deployers
>>>> > > > >> choose and/or provide their own way to handle a sticky
>>>> problem.
>>>> Pros:
>>>> > > > >> Simple and flexible for the future. Cons: Would require
>>>> writing at
>>>> > > least
>>>> > > > >> one backend provider that does what the previous 5 options
>>>> suggest.
>>>> > > > >>
>>>> > > > >> To be clear: Heat cannot really exist without this, as it
>>>> is the
>>>> only
>>>> > > way
>>>> > > > >> to find out what your stack is doing or has done.
>>>> > > > >
>>>> > > > > btw Clint I have ditched that "Recorder" patch as Ceilometer is
>>>> > > > > getting a Alarm History api soon, so we can defer to that
>>>> for that
>>>> > > > > functionality (alarm transitions).
>>>> > > > >
>>>> > > > > But we still need a better way to record events/logs for the
>>>> user.
>>>> > > > > So I make this blueprint a while ago:
>>>> > > > > https://blueprints.launchpad.net/heat/+spec/user-visible-logs
>>>> > > > >
>>>> > > > > I am becomming more in favor of user options rather than
>>>> deployer
>>>> > > > > options if possible. So provide resources for Marconi,
>>>> Meniscus and
>>>> > > > > what ever...
>>>> > > > > Although what is nice about Marconi is you could then hook
>>>> up what
>>>> > > > > ever you want to it.
>>>> > > >
>>>> > > > Logs are one thing (and Meniscus is a great choice for that), but
>>>> events
>>>> > > > are the very thing CM is designed to handle. Wouldn't it make
>>>> sense
>>>> to
>>>> > > > push them back into there?
>>>> > > >
>>>> > >
>>>> > > I'm not sure these events make sense in the current Ceilometer (I
>>>> assume
>>>> > > that is "CM" above) context. These events are:
>>>> > >
>>>> > > ... Creating stack A
>>>> > > ... Creating stack A resource A
>>>> > > ... Created stack A resource A
>>>> > > ... Created stack A
>>>> > >
>>>> > > Users will want to be able to see all of the events for a stack,
>>>> and
>>>> > > likely we need to be able to paginate through them as well.
>>>> > >
>>>> > > They are fundamental and low level enough for Heat that I'm not
>>>> sure
>>>> > > putting them in Ceilometer makes much sense, but maybe I don't
>>>> understand
>>>> > > Ceilometer.. or "CM" is something else entirely. :)
>>>> > >
>>>> >
>>>> > CM is indeed ceilometer.
>>>> >
>>>> > The plan for the event API there is to make it admin-only (at
>>>> least for
>>>> > now). If this is data the user wants to see, that may change the
>>>> plan for
>>>> > the API or may mean storing it in ceilometer isn't a good fit.
>>>> >
>>>>
>>>> Visibility into these events is critical to tracking the progress of
>>>> any action done to a Heat stack:
>>>>
>>>>
>>>> +---------------------+----+------------------------+--------------------+----------------------+
>>>>
>>>> | logical_resource_id | id | resource_status_reason |
>>>> resource_status |
>>>> event_time |
>>>>
>>>> +---------------------+----+------------------------+--------------------+----------------------+
>>>>
>>>> | AccessPolicy | 24 | state changed |
>>>> CREATE_IN_PROGRESS |
>>>> 2013-08-12T19:45:36Z |
>>>> | AccessPolicy | 25 | state changed |
>>>> CREATE_COMPLETE |
>>>> 2013-08-12T19:45:36Z |
>>>> | User | 26 | state changed |
>>>> CREATE_IN_PROGRESS |
>>>> 2013-08-12T19:45:36Z |
>>>> | Key | 28 | state changed |
>>>> CREATE_IN_PROGRESS |
>>>> 2013-08-12T19:45:38Z |
>>>> | User | 27 | state changed |
>>>> CREATE_COMPLETE |
>>>> 2013-08-12T19:45:38Z |
>>>> | Key | 29 | state changed |
>>>> CREATE_COMPLETE |
>>>> 2013-08-12T19:45:39Z |
>>>> | notcompute | 30 | state changed |
>>>> CREATE_IN_PROGRESS |
>>>> 2013-08-12T19:45:40Z |
>>>>
>>>> +---------------------+----+------------------------+--------------------+----------------------+
>>>>
>>>>
>>>> So unless there is a plan to make this a user centric service, it does
>>>> not seem like a good fit.
>>>>
>>>> > Are these "events" transmitted in the same way as notifications?
>>>> If so,
>>>> we
>>>> > may already have the data.
>>>> >
>>>>
>>>> The Heat engine records them while working on the stack. They have a
>>>> fairly narrow, well defined interface, so it should be fairly easy to
>>>> address the storage issue with a backend abstraction.
>>>>
>>>
>>> OK. The term "event" frequently means "notification" for ceilometer,
>>> but it
>>> sounds like it's completely different in this case.
>>
>> Yeah, not really related. But we need to add RPC Notifications to Heat
>> soon so people can bill on a stack basis (create/update/delete/exist).
>>
>> What we are talking about here is logging really but we want to get it
>> to the end user (that does not have access to the infrastructure's
>> syslog).
>>
>Do we really want to enable having a billing policy on stacks? A stack
>invokes resources which have their own billing usage policies.
If we send notifications then deployers can at least descide that
themselves. At the moment they can't as we don't generate that info.
It's not really our descision whether to bill or not. But there
might be some usecases for this (special rates on specific templates
etc...).
-Angus
>
>_______________________________________________
>OpenStack-dev mailing list
>OpenStack-dev at lists.openstack.org
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-dev
mailing list