[openstack-dev] [swift] LTFS integration with OpenStack Swift for scenario like - Data Archival as a Service .

Tim Bell Tim.Bell at cern.ch
Mon Nov 17 18:43:57 UTC 2014


> -----Original Message-----
> From: Christian Schwede [mailto:christian.schwede at enovance.com]
> Sent: 17 November 2014 14:36
> To: openstack-dev at lists.openstack.org
> Subject: Re: [openstack-dev] [swift] LTFS integration with OpenStack Swift for
> scenario like - Data Archival as a Service .
> 
> On 14.11.14 20:43, Tim Bell wrote:
> > It would need to be tiered (i.e. migrate whole collections rather than
> > files) and a local catalog would be needed to map containers to tapes.
> > Timeouts would be an issue since we are often waiting hours for recall
> > (to ensure that multiple recalls for the same tape are grouped).
> >
> > It is not an insolvable problem but it is not just a 'use LTFS' answer.
> 
> There were some ad-hoc discussions during the last summit about using Swift
> (API) to access data that stored on tape. At the same time we talked about
> possible data migrations from one storage policy to another, and this might be
> an option to think about.
> 
> Something like this:
> 
> 1. Data is stored in a container with a Storage Policy (SP) that defines a time-
> based data migration to some other place 2. After some time, data is migrated
> to tape, and only some stubs (zero-byte objects) are left on disk.
> 3. If a client requests such an object the clients gets an error stating that the
> object is temporarily not available (unfortunately there is no suitable http
> response code for this yet) 4. At this time the object is scheduled to be restored
> from tape 5. Finally the object is read from tape and stored on disk again. Will be
> deleted again from disk after some time.
> 
> Using this approach there are only smaller modifications inside Swift required,
> for example to send a notification to an external consumer to migrate data
> forth and back and to handle requests for empty stub files.
> The migration itself should be done by an external worker, that works with
> existing solutions from tape vendors.
> 
> Just an idea, but might be worth to investigate further (because more and more
> people seem to be interested in this, and especially from the science
> community).
> 

This sounds something like DMAPI (http://en.wikipedia.org/wiki/DMAPI) .... there may be some concepts from that which would help to construct an API definition for the driver.

If you work on the basis that a container is either online or offline, you would need a basic data store which told you which robot/tape held that container and some method for handling containers spanning multiple tapes or multiple containers on a tape.

The semantics when a new object was added to a container would also be needed for different scenarios (such as container offline already and container being archived/recalled).

Other operations needed would be to move a container to new media (such as when recycling old tapes), initialising tape media as being empty but defined in the system, handling container deletion on offline media (with associated garbage collection), validation of an offline tape, ...
 
Certainly not impossible and lots of prior art in the various HSM systems such as HPSS or CERN's CASTOR.

Tim

> Christian
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list