[openstack-dev] [Heat] event table is a ticking time bomb
Steve Baker
sbaker at redhat.com
Mon Aug 12 22:48:01 UTC 2013
On 08/13/2013 10:39 AM, Angus Salkeld wrote:
> On 12/08/13 16:52 -0400, Doug Hellmann wrote:
>> On Mon, Aug 12, 2013 at 4:11 PM, Clint Byrum <clint at fewbar.com> wrote:
>>
>>> Excerpts from Doug Hellmann's message of 2013-08-12 12:08:58 -0700:
>>> > On Fri, Aug 9, 2013 at 11:56 AM, Clint Byrum <clint at fewbar.com>
>>> wrote:
>>> >
>>> > > Excerpts from Sandy Walsh's message of 2013-08-09 06:16:55 -0700:
>>> > > >
>>> > > > On 08/08/2013 11:36 PM, Angus Salkeld wrote:
>>> > > > > On 08/08/13 13:16 -0700, Clint Byrum wrote:
>>> > > > >> Last night while reviewing a feature which would add more
>>> events
>>> to
>>> > > the
>>> > > > >> event table, it dawned on me that the event table really
>>> must be
>>> > > removed.
>>> > > > >
>>> > > > >
>>> > > > >>
>>> > > > >> https://bugs.launchpad.net/heat/+bug/1209492
>>> > > > >>
>>> > > > >> tl;dr: users can write an infinite number of rows to the event
>>> table
>>> > > at
>>> > > > >> a fairly alarming rate just by creating and updating a very
>>> large
>>> > > stack
>>> > > > >> that has no resources that cost any time or are even billable
>>> (like an
>>> > > > >> autoscaling launch configuration).
>>> > > > >>
>>> > > > >> The table has no purge function, so the only way to clear
>>> out old
>>> > > events
>>> > > > >> is to delete the stack, or manually remove them directly in
>>> the
>>> > > database.
>>> > > > >>
>>> > > > >> We've all been through this before, logging to a database
>>> seems
>>> great
>>> > > > >> until you actually do it.
>>> > > > >>
>>> > > > >> I have some ideas for how to solve it, but I wanted to get
>>> a wider
>>> > > > >> audience:
>>> > > > >>
>>> > > > >> 1) Make the event list a ring buffer. Have rows 0 -
>>> $MAX_BUFFER_SIZE
>>> > > in
>>> > > > >> each stack, and simply write each new event to the next open
>>> position,
>>> > > > >> wrapping at $MAX_BUFFER_SIZE. Pros: little change to
>>> current code,
>>> > > > >> just need an offset column added and code that will
>>> properly wrap
>>> to 0
>>> > > > >> at $MAX_BUFFER_SIZE. Cons: still can incur heavy transactional
>>> load on
>>> > > > >> the database server.A
>>> > > > >>
>>> > > > >> 1.b) Same, but instead of rows, just maintain a blob and
>>> append
>>> the
>>> > > rows
>>> > > > >> as json list. Lowers transactional load but would push some
>>> load
>>> onto
>>> > > > >> the API servers and such to parse these out, and would make
>>> pagination
>>> > > > >> challenging. Blobs also can be a drain on DB server
>>> performance.
>>> > > > >>
>>> > > > >> 2) Write a purge script. Delete old ones. Pros: No code
>>> change,
>>> just
>>> > > > >> new code to do purging. Cons: same as 1, plus more
>>> vulnerability
>>> to an
>>> > > > >> aggressive attacker who can fit a lot of data in between
>>> purges.
>>> Also
>>> > > > >> large scale deletes can be really painful (see: keystone
>>> sql token
>>> > > > >> backend).
>>> > > > >>
>>> > > > >> 3) Log events to Swift. I can't seem to find information on
>>> how/if
>>> > > > >> appending works there. Tons of tiny single-row files is an
>>> option,
>>> > > but I
>>> > > > >> want to hear from people with more swift knowledge if that
>>> is a
>>> > > viable,
>>> > > > >> performant option. Pros: Scale to the moon. Can charge
>>> tenant for
>>> > > usage
>>> > > > >> and let them purge events as needed. Cons: Adds swift as a
>>> requirement
>>> > > > >> of Heat.
>>> > > > >>
>>> > > > >> 4) Provide a way for users to receive logs via HTTP POST.
>>> Pros:
>>> Simple
>>> > > > >> and punts the problem to the users. Cons: users will be SoL if
>>> they
>>> > > > >> don't have a place to have logs posted to.
>>> > > > >>
>>> > > > >> 5) Provide a way for users to receive logs via messaging
>>> service
>>> like
>>> > > > >> Marconi. Pros/Cons: same as HTTP, but perhaps a little more
>>> confusing
>>> > > > >> and ambitious given Marconi's short existence.
>>> > > > >>
>>> > > > >> 6) Provide a pluggable backend for logging. This seems like
>>> the
>>> way
>>> > > most
>>> > > > >> OpenStack projects solve these issues, which is to let the
>>> deployers
>>> > > > >> choose and/or provide their own way to handle a sticky
>>> problem.
>>> Pros:
>>> > > > >> Simple and flexible for the future. Cons: Would require
>>> writing at
>>> > > least
>>> > > > >> one backend provider that does what the previous 5 options
>>> suggest.
>>> > > > >>
>>> > > > >> To be clear: Heat cannot really exist without this, as it
>>> is the
>>> only
>>> > > way
>>> > > > >> to find out what your stack is doing or has done.
>>> > > > >
>>> > > > > btw Clint I have ditched that "Recorder" patch as Ceilometer is
>>> > > > > getting a Alarm History api soon, so we can defer to that
>>> for that
>>> > > > > functionality (alarm transitions).
>>> > > > >
>>> > > > > But we still need a better way to record events/logs for the
>>> user.
>>> > > > > So I make this blueprint a while ago:
>>> > > > > https://blueprints.launchpad.net/heat/+spec/user-visible-logs
>>> > > > >
>>> > > > > I am becomming more in favor of user options rather than
>>> deployer
>>> > > > > options if possible. So provide resources for Marconi,
>>> Meniscus and
>>> > > > > what ever...
>>> > > > > Although what is nice about Marconi is you could then hook
>>> up what
>>> > > > > ever you want to it.
>>> > > >
>>> > > > Logs are one thing (and Meniscus is a great choice for that), but
>>> events
>>> > > > are the very thing CM is designed to handle. Wouldn't it make
>>> sense
>>> to
>>> > > > push them back into there?
>>> > > >
>>> > >
>>> > > I'm not sure these events make sense in the current Ceilometer (I
>>> assume
>>> > > that is "CM" above) context. These events are:
>>> > >
>>> > > ... Creating stack A
>>> > > ... Creating stack A resource A
>>> > > ... Created stack A resource A
>>> > > ... Created stack A
>>> > >
>>> > > Users will want to be able to see all of the events for a stack,
>>> and
>>> > > likely we need to be able to paginate through them as well.
>>> > >
>>> > > They are fundamental and low level enough for Heat that I'm not
>>> sure
>>> > > putting them in Ceilometer makes much sense, but maybe I don't
>>> understand
>>> > > Ceilometer.. or "CM" is something else entirely. :)
>>> > >
>>> >
>>> > CM is indeed ceilometer.
>>> >
>>> > The plan for the event API there is to make it admin-only (at
>>> least for
>>> > now). If this is data the user wants to see, that may change the
>>> plan for
>>> > the API or may mean storing it in ceilometer isn't a good fit.
>>> >
>>>
>>> Visibility into these events is critical to tracking the progress of
>>> any action done to a Heat stack:
>>>
>>>
>>> +---------------------+----+------------------------+--------------------+----------------------+
>>>
>>> | logical_resource_id | id | resource_status_reason |
>>> resource_status |
>>> event_time |
>>>
>>> +---------------------+----+------------------------+--------------------+----------------------+
>>>
>>> | AccessPolicy | 24 | state changed |
>>> CREATE_IN_PROGRESS |
>>> 2013-08-12T19:45:36Z |
>>> | AccessPolicy | 25 | state changed |
>>> CREATE_COMPLETE |
>>> 2013-08-12T19:45:36Z |
>>> | User | 26 | state changed |
>>> CREATE_IN_PROGRESS |
>>> 2013-08-12T19:45:36Z |
>>> | Key | 28 | state changed |
>>> CREATE_IN_PROGRESS |
>>> 2013-08-12T19:45:38Z |
>>> | User | 27 | state changed |
>>> CREATE_COMPLETE |
>>> 2013-08-12T19:45:38Z |
>>> | Key | 29 | state changed |
>>> CREATE_COMPLETE |
>>> 2013-08-12T19:45:39Z |
>>> | notcompute | 30 | state changed |
>>> CREATE_IN_PROGRESS |
>>> 2013-08-12T19:45:40Z |
>>>
>>> +---------------------+----+------------------------+--------------------+----------------------+
>>>
>>>
>>> So unless there is a plan to make this a user centric service, it does
>>> not seem like a good fit.
>>>
>>> > Are these "events" transmitted in the same way as notifications?
>>> If so,
>>> we
>>> > may already have the data.
>>> >
>>>
>>> The Heat engine records them while working on the stack. They have a
>>> fairly narrow, well defined interface, so it should be fairly easy to
>>> address the storage issue with a backend abstraction.
>>>
>>
>> OK. The term "event" frequently means "notification" for ceilometer,
>> but it
>> sounds like it's completely different in this case.
>
> Yeah, not really related. But we need to add RPC Notifications to Heat
> soon so people can bill on a stack basis (create/update/delete/exist).
>
> What we are talking about here is logging really but we want to get it
> to the end user (that does not have access to the infrastructure's
> syslog).
>
Do we really want to enable having a billing policy on stacks? A stack
invokes resources which have their own billing usage policies.
More information about the OpenStack-dev
mailing list