Keith's Assignment 13

Standards-based Interfaces for Harvesting and Obtaining Assets from Digital Repositories
Jeroen Baekert
University of Ghent, 2006

Interoperability between distributed digital repositories is a challenging problem. Developing protocols for requesting data and responding to those requests, however, is only part of the solution. Since the landscape of formats for storing digital resources and their metadata is wide and varied, difficulty arises in providing a standard specification for how repositories systems will compose requests and respond to queries for information from other repositories. In his dissertation, "Standards-based approaches for Harvesting and Obtaining Assets from Digital Repositories", Baekert proposes using existing protocols and resource packaging formats for promoting greater interoperability between digital repositories. Specifically, he suggests that there are four dimensions to such interoperability : data modeling, resource packaging, harvesting protocol and context-sensitive resource exchange.

The dissertation is broken up into three segments. The first segment provides a review of the proposed solution based on existing standards for storing resources within and communicating information between digital repositories. The OAIS, or Open Archival Information Reference Model, is an ISO standard the author proposes as an approach to data representation, identification, and versioning of digital assets within a repository. The author further suggests MPEG-21 DIDL, or the Motion Picture Experts Group Digital Item Declaration, as a packaging format, since the format is robust, has a large standards body behind it and includes large industry buy-in to assure continued success and growth. For approaching the harvesting protocol dimension, the author selects the OAI-PMH, or the Open Archives Initiative Protocol for Metadata Harvesting, as the communication protocol to promote interoperability between repositories. OAI-PMH has been widely adopted as the protocol for exchanging resource metadata among digital repositories. Finally, the OpenURL Framework, the NISO (National Information Standards Organization) standard for specifying and communicating context-sensitive services is used as a final dimension of developing a standards-backed approach to digital resource harvesting. The first segment, thus lays the foundation for how these components come together to form the basis of the author’s proposed standards-based interoperability framework.

The second segment of the dissertation describes two experiments performed at Los Alamos National Laboratory (LANL) in New Mexico. The first experiment exercises the core features of the framework suggested in the initial segment of the work using the LANL digital repository for scientific publications. This experiment consisted of developing the necessary software interfaces and exploiting the existing architecture (aDORe) of the LANL system to demonstrate the technical and practical feasibility of the proposed solution. The interoperability component of the experiment capitalized on specific features of the LANL digital repository environment – that multiple repositories existed within the broader repository, providing a way to exercise the proposed concepts. The second experiment tested the proposed framework by connecting to the American Physical Society (APS) digital repository, which contains hundreds of thousands of papers and publications available in digital form. The experiment tested the interoperability framework in the context of repository mirroring – the core APS repository assets were being mirrored at another repository located within LANL. For both experiments, the author analyzed the components of each experiment in the context of the four dimensions (data modeling, resource packaging, harvesting protocol and context-sensitive resource exchange).

The third segment of the work discusses generalizing the proposed framework in the context of three widely used digital repository systems : FEDORA, ePrints and D-Space. The author carefully considers the capabilities of each system and analyzes how each system would deal with the solution which was constructed within his proposed solution, which was based on a LANL developed system called aDORe. In particularly, he methodically addresses each of the dimensions of his solution and the shortcomings or strengths of each system in the context of his own experiments with aDORe.

I felt this dissertation added a substantive contribution to the area of interoperability for digital repositories. One of the core contributions is the careful consideration of content harvesting as well as metadata. Since most harvesting protocols are based solely on metadata harvesting, this work takes a bold leap forward in developing standards-based extensions on top of the widely used harvesting protocol OAI-PMH along with OpenURL to devise a sound solution to content and metadata harvesting. MPEG-21 is a relatively new format, so it is my belief that choosing this newer, more forward looking format is a wise decision. Since the author has made contribution to the MPEG-21 specification, choosing this format, may, on the other hand, be a way to utilize a solution that the author is more familiar with, though I did not think that was the case. Since metadata formats are difficult to agree on from institution to institution, it remains largely irrelevant at the moment whether MPEG-21, METS (Metadata Encoding and Transmission Standard) or something else is chosen, though I would have liked to have seen a deeper discussion of this point, since many institutions will not consider using MPEG-21 in the near future, having made huge investments on other formats.

Last modified 4 December 2007 at 2:12 pm by K:M