[ops] database archiving tool

Pierre-Samuel LE STANG pierre-samuel.le-stang at corp.ovh.com
Wed May 22 08:34:37 UTC 2019


Thomas Goirand <zigo at debian.org> wrote on mer. [2019-mai-22 09:09:25 +0200]:
> On 5/9/19 5:14 PM, Pierre-Samuel LE STANG wrote:
> > Hi all,
> >
> > At OVH we needed to write our own tool that archive data from OpenStack
> > databases to prevent some side effect related to huge tables (slower response
> > time, changing MariaDB query plan) and to answer to some legal aspects.
> >
> > So we started to write a python tool which is called OSArchiver that I briefly
> > presented at Denver few days ago in the "Optimizing OpenStack at large scale"
> > talk. We think that this tool could be helpful to other and are ready to open
> > source it, first we would like to get the opinion of the ops community about
> > that tool.
> >
> > To sum-up OSArchiver is written to work regardless of Openstack project. The
> > tool relies on the fact that soft deleted data are recognizable because of
> > their 'deleted' column which is set to 1 or uuid and 'deleted_at' column which
> > is set to the date of deletion.
> >
> > The points to have in mind about OSArchiver:
> > * There is no knowledge of business objects
> > * One table might be archived if it contains 'deleted' column
> > * Children rows are archived before parents rows
> > * A row can not be deleted if it fails to be archived
> >
> > Here are features already implemented:
> > * Archive data in an other database and/or file (actually SQL and CSV
> > formats are supported) to be easily imported
> > * Delete data from Openstack databases
> > * Customizable (retention, exclude DBs, exclude tables, bulk insert/delete)
> > * Multiple archiving configuration
> > * Dry-run mode
> > * Easily extensible, you can add your own destination module (other file
> > format, remote storage etc...)
> > * Archive and/or delete only mode
> >
> > It also means that by design you can run osarchiver not only on OpenStack
> > databases but also on archived OpenStack databases.
> >
> > Thanks in advance for your feedbacks.
> >
>
> Hi Pierre,
>
> That's really the kind of project that I would prefer not to have to
> exist. By this I mean, it'd be a lot nicer if this could be taken care
> of project by project, with something like what Nova does (ie:
> nova-manage db archive_deleted_rows).
>
> In such configuration, that's typically something that could be added as
> a cron job, automatically configured by packages.
>
> Now, a question for other OPS reading this thread: how long should be
> the retention? In Debian, we use to have the unsaid policy that we don't
> want too much retention, to project privacy. Though in operation, we may
> need at least a few weeks of history, so we can do support. If I was to
> configure a cron job for nova, for example, what parameter should I set
> to --before (when #556751 is merged)? My instinct would be:
>
> nova-manage db archive_deleted_rows \
> 	--before $(date -d "-1 month" +%Y-%m-%d)
>
> Your thoughts everyone?

Hi Thomas,

Thanks for your feedback, I really appreciate it. The tool is designed to be
customized and to fit your needs. It means that you can have one configuration
per project or one configuration for all projects.

So you might imagine having a configuration for glance which exclude images
table and one configuration for nova with a higher or lower retention.

--
PS



More information about the openstack-discuss mailing list