Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EPrints-Archivematica Export Structure Compatibility #29

Closed
photomedia opened this issue Jun 26, 2018 · 3 comments
Closed

EPrints-Archivematica Export Structure Compatibility #29

photomedia opened this issue Jun 26, 2018 · 3 comments
Labels
Confirmed: Out-of-scope Use case will not be included in the upcoming version of the spec or implementation notes.

Comments

@photomedia
Copy link

I apologize in advance if this is not the best place to raise this question - if that is the case, please direct me to the more appropriate place (I did notice that there are a number of different email lists and a slack channel for this group).

I was introduced to OCFL at OR2018, and I immediately saw the potential to have this inform something that I am working on as well as be a bridge across repository systems. At the same OR2018, I co-presented a proposal for an export format for EPrints-to-Archivematica, for preservation. This format uses a folder stucture, and ideally, it would be optimal if this folder structure was compatible with the OCFL.

Here are the details of the proposal: https://spectrum.library.concordia.ca/983933/

Right away, I see two places where there is a divergence between that and OCFL, and I want to explore/discuss it:

  1. The last modified date is placed right into the folder name of the top level object in our proposal. This also means that the entire object is replicated whenever any modification is made. This is not efficient in terms of storage space, but it has its own advantages of clarity and ease of retrieval later on. The OCFL uses a sequential "version 1...x" folder with changed files only.

  2. In our proposal, BagIt is used for creating manifests - whereas in OCFL uses the inventory.jsonld format for this.

I suppose that I am looking to understand the reasoning behind OCFL's choices, and if these are compelling, possibly modify my proposal/plan.

@ahankinson
Copy link
Contributor

Hi Tomasz!

OCFL joins the lineage of BagIt-inspired specs. It is most like the Moab spec developed by Richard Anderson (http://journal.code4lib.org/articles/8482), but is currently being developed to address some of the problems and potential optimizations identified by Moab's implementers. The inventory.jsonld file that sits in the root is envisioned as a means of tracking the contents of an object, in much the same way as the manifest.txt file does in a BagIt bag.

The main advantage of OCFL over BagIt is the ability to store versioned contents. We are trying to bake versioning in to the spec so that the changes to an object over time can be programmatically determined. As you might imagine, however, the addition of versioning brings with it a host of issues that BagIt didn't have to deal with.

The problem of file modification and content duplication is a tricky one, and one that we are currently trying to figure out (see #26, for example). Since we're looking at including in-scope files and data collections that may (potentially) be petabytes in size, storage efficiency is a high priority. To this end, and combined with our work on versioning, we are looking at methods of forward-versioning and content addressability (through hashes) to address this.

If you are looking for further discussion, I would encourage you to join the OCFL Community Google Group and join our monthly community calls. (https://groups.google.com/forum/#!search/ocfl-community). Or hop onto the Slack channel.

@zimeon
Copy link
Contributor

zimeon commented Sep 5, 2018

We are using Archivematica at Cornell and my hope is that we will take the Archivematica AIP produced and make that a version of an OCFL object in archival storage. (If we update the content and reprocess that would then become v2 etc.)

@ahankinson
Copy link
Contributor

F2F 2018.09.05: Discussion of this issue resulted in a decision that the spec would support this, but this does not have a direct bearing on the shape of the spec.

@ahankinson ahankinson added the Confirmed: Out-of-scope Use case will not be included in the upcoming version of the spec or implementation notes. label Sep 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Confirmed: Out-of-scope Use case will not be included in the upcoming version of the spec or implementation notes.
Projects
None yet
Development

No branches or pull requests

3 participants