[openstack-dev] [oslo.db] innodb OPTIMIZE TABLE ?

Gorka Eguileor geguileo at redhat.com
Mon Apr 9 09:53:06 UTC 2018


On 06/04, Michael Bayer wrote:
> On Wed, Apr 4, 2018 at 5:00 AM, Gorka Eguileor <geguileo at redhat.com> wrote:
> > On 03/04, Jay Pipes wrote:
> >> On 04/03/2018 11:07 AM, Michael Bayer wrote:
> >> > The MySQL / MariaDB variants we use nowadays default to
> >> > innodb_file_per_table=ON and we also set this flag to ON in installer
> >> > tools like TripleO.     The reason we like file per table is so that
> >> > we don't grow an enormous ibdata file that can't be shrunk without
> >> > rebuilding the database.  Instead, we have lots of little .ibd
> >> > datafiles for each table throughout each openstack database.
> >> >
> >> > But now we have the issue that these files also can benefit from
> >> > periodic optimization which can shrink them and also have a beneficial
> >> > effect on performance.   The OPTIMIZE TABLE statement achieves this,
> >> > but as would be expected it itself can lock tables for potentially a
> >> > long time.   Googling around reveals a lot of controversy, as various
> >> > users and publications suggest that OPTIMIZE is never needed and would
> >> > have only a negligible effect on performance.   However here we seek
> >> > to use OPTIMIZE so that we can reclaim disk space on tables that have
> >> > lots of DELETE activity, such as keystone "token" and ceilometer
> >> > "sample".
> >> >
> >> > Questions for the group:
> >> >
> >> > 1. is OPTIMIZE table worthwhile to be run for tables where the
> >> > datafile has grown much larger than the number of rows we have in the
> >> > table?
> >>
> >> Possibly, though it's questionable to use MySQL/InnoDB for storing transient
> >> data that is deleted often like ceilometer samples and keystone tokens. A
> >> much better solution is to use RDBMS partitioning so you can simply ALTER
> >> TABLE .. DROP PARTITION those partitions that are no longer relevant (and
> >> don't even bother DELETEing individual rows) or, in the case of Ceilometer
> >> samples, don't use a traditional RDBMS for timeseries data at all...
> >>
> >> But since that is unfortunately already the case, yes it is probably a good
> >> idea to OPTIMIZE TABLE on those tables.
> >>
> >> > 2. from people's production experience how safe is it to run OPTIMIZE,
> >> > e.g. how long is it locking tables, etc.
> >>
> >> Is it safe? Yes.
> >>
> >> Does it lock the entire table for the duration of the operation? No. It uses
> >> online DDL operations:
> >>
> >> https://dev.mysql.com/doc/refman/5.7/en/innodb-file-defragmenting.html
> >>
> >> Note that OPTIMIZE TABLE is mapped to ALTER TABLE tbl_name FORCE for InnoDB
> >> tables.
> >>
> >> > 3. is there a heuristic we can use to measure when we might run this
> >> > -.e.g my plan is we measure the size in bytes of each row in a table
> >> > and then compare that in some ratio to the size of the corresponding
> >> > .ibd file, if the .ibd file is N times larger than the logical data
> >> > size we run OPTIMIZE ?
> >>
> >> I don't believe so, no. Most things I see recommended is to simply run
> >> OPTIMIZE TABLE in a cron job on each table periodically.
> >>
> >> > 4. I'd like to propose this job of scanning table datafile sizes in
> >> > ratio to logical data sizes, then running OPTIMIZE, be a utility
> >> > script that is delivered via oslo.db, and would run for all innodb
> >> > tables within a target MySQL/ MariaDB server generically.  That is, I
> >> > really *dont* want this to be a script that Keystone, Nova, Ceilometer
> >> > etc. are all maintaining delivering themselves.   this should be done
> >> > as a generic pass on a whole database (noting, again, we are only
> >> > running it for very specific InnoDB tables that we observe have a poor
> >> > logical/physical size ratio).
> >>
> >> I don't believe this should be in oslo.db. This is strictly the purview of
> >> deployment tools and should stay there, IMHO.
> >>
> >
> > Hi,
> >
> > As far as I know most projects do "soft deletes" where we just flag the
> > rows as deleted and don't remove them from the DB, so it's only when we
> > use a management tool and run the "purge" command that we actually
> > remove these rows.
> >
> > Since running the optimize without purging would be meaningless, I'm
> > wondering if we should trigger the OPTIMIZE also within the purging
> > code.  This way we could avoid innefective runs of the optimize command
> > when no purge has happened and even when we do the optimization we could
> > skip the ratio calculation altogether for tables where no rows have been
> > deleted (the ratio hasn't changed).
> >
>
> the issue is that this OPTIMIZE will block on Galera unless it is run
> on a per-individual node basis along with the changing of the
> wsrep_OSU_method parameter, this is way out of scope both to be
> redundantly hardcoded in multiple openstack projects, as well as
> there's no portable way for Keystone and others to get at the
> individual Galera node addresses.    Putting it in oslo.db would at
> least be a place that most of this logic can live but even then it
> needs to run for multiple Galera nodes and needs to have
> deployment-specific configuration.   *unless* we say, the OPTIMIZE
> here will short for a purged table, let's just let it block.
>

I see... What about a hybrid solution?  Use the alter table as mentioned
in the comment [1] to not block the table for systems that support it,
and going with the RSU mode when it's not supported?


[1] https://mariadb.com/kb/en/library/optimize-table/#comment_3191


>
> > Ideally the ratio calculation and optimization code would be provided by
> > oslo.db to reduce code duplication between projects.
>
> I was hoping to have this be part of oslo.db but there's disagreement on that :)
>
> If this can't be in oslo.db then the biggest issue facing me on this
> is building out a new application and getting it packaged since this
> feature has no home, unless I can ship it as some kind of script
> packaged in tripleo.
>
>

I think the oslo.db home you proposed has the great benefit of making it
available in all deployments regardless of the installer, if that's not
possible I would go with the TripleO script before creating yet another
project that needs to be packaged and maintained.

Cheers,
Gorka.


> >
> >
> >> > 5. for Galera this gets more tricky, as we might want to run OPTIMIZE
> >> > on individual nodes directly.  The script at [1] illustrates how to
> >> > run this on individual nodes one at a time.
> >> >
> >> > More succinctly, the Q is:
> >> >
> >> > a. OPTIMIZE, yes or no?
> >>
> >> Yes.
> >>
> >> > b. oslo.db script to run generically, yes or no?
> >>
> >> No. Just have Triple-O install galera_innoptimizer and run it in a cron job.
> >>
> >> Best,
> >> -jay
> >>
> >> > thanks for your thoughts!
> >> >
> >> >
> >> >
> >> > [1] https://github.com/deimosfr/galera_innoptimizer
> >> >
> >> > __________________________________________________________________________
> >> > OpenStack Development Mailing List (not for usage questions)
> >> > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> >> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >> >
> >>
> >> __________________________________________________________________________
> >> OpenStack Development Mailing List (not for usage questions)
> >> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> > __________________________________________________________________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list