I2: Background

Posted by Michael Giarlo on May 19, 2009

[Series]

This is the first in a series of posts about institutional identifiers[1].

In my last post, I alluded to some documentation that I've written. That was somewhat misleading, which will soon be apparent, but I liked the parallel construction I had going, and I am but a slave to orderliness.

For about the past six months, I have been working with a NISO group looking into how institutions are identified within information systems:

The I2 (Institutional Identifiers — pronounced "I 2") working group will build on work from the Journal Supply Chain Efficiency Improvement Pilot (http://www.journalsupplychain.com/), which demonstrated the improved efficiencies of using an institutional identifier in the journal supply chain. The NISO working group will develop a standard for an institutional identifier that can be implemented in all library and publishing environments. The standard will include definition of the metadata required to be collected with the identifier and what uses can be made of that metadata. …

The I2 group is split into a few subgroups which have been charged with looking into how institutional identifiers are used in particular scenarios. These scenarios are e-resources, repositories and e-learning systems, and library resource workflows. The scenario names pain me a bit, but so be it; this is our industry, and there are bigger windmills to tilt at.

I am currently co-chairing the subgroup looking at repositories and e-learning, and apparently I am its "tech lead." I don't want to get caught up on names and roles and titles, though; this series isn't about those at all. I'm just setting the scene and explaining why my head's in this space and laying bare my stake in the issue.

The remainder of this series will provide a bit more detail on the issues around institutional identifiers, share how the repository subgroup is grappling with identifier issues and engaging the repository community to assess needs, propose an approach for an identifier system that may meet said needs, and explore what seems to be the thorniest issue[2].

Notes
  1. I offer that very tentatively, knowing what a spectacular failure my last attempt at a series was. []
  2. Hint: management. I know, "duh," right? []


Plugin updates

Posted by Michael Giarlo on November 16, 2008

I finally pushed out some embarrassingly outdated WordPress plugin updates a few moments ago.

  • Updated unAPI plugin with a patch contributed by Jay Luker that removes the hard-coded "wp_" table prefix. The updated version of the plugin has been tagged as 1.4.1.
  • Updated LinkPURL plugin with a patch contributed by Mark Matienzo that enables partial redirects. I made some additional tweaks to the plugin to make this feature configurable via the WordPress management UI. This has been tagged as 1.1.
  • Created a new unAPI plugin branch for Mark Matienzo's Scriblio-oriented tweaks. The branch is called 1.4.1-anarchivist-scriblio and it contains the scriblio.diff file. I have yet to integrate the diffs, as the file that was patched has changed since the patch was issued. If anyone is interested in working on unAPI/Scriblio integration, please get in touch with me.

And here is my to-do list which I hope will keep me honest.

  • Update OAI-ORE plugin to support version 1.0 of the ORE specification.
  • Add per-post (and per-page?) resource maps that wrap all embedded images and links.
  • Enable "cool URIs" for all resource maps.

It is my hope that I'll get to those sometime before the summer begins. :)

Use cases for Handle identifiers?

Posted by Michael Giarlo on October 05, 2007

Reading Adam Smith's D-Lib article has got me thinking about identifiers again. I don't agree with some of the assertions in the section titled "A Persistent Identifier Primer" — URIs are in fact persistent; we just break them through poor management — and so I'm led to a fundamental question: what are the good use cases for Handle (or ARK, or PURL) identifiers?

I get the need for persistent and globally unique identifiers; I'm just wondering why one needs special software with a separate URI namespace to gain persistence.

One potential use case might be resources that are outside of the organization's control — i.e., licensed content from vendors — but surely folks are using Handles for many resources that are created and managed within the organization. And I'm curious why they have decided that Handles are more durable than native URIs (the URIs to which Handles redirect), and how they deal with the problem of downstream (post-redirection) citation and bookmarking. How useful is this sort of identifier scheme if your users never even see the supposedly more persistent URI for a resource?

As a former proponent of Handles and ARKs, this may seem like a hypocritical question to pose. If I had to answer my own question, I would say that Handles seem like a good option because they save you some work and headaches in the short-term; you don't need to get together with your web team and come up with a scalable and sustainable URI policy; just assign native URIs in the usual haphazard way and generate Handles to compensate for a lack of identifier policies.

But if you're already making an organizational commitment to identifier persistence — and if you're rolling out Handles, I'd wager that's likely — why not do so by minting carefully-considered cool URIs? Less management and technology overhead and less confusion for your users are two good reasons to consider it.

Digital preservation for archivists

Posted by Michael Giarlo on June 12, 2007

At long last, the paper that Ron Jantz and I wrote for the Journal of Archival Organization has been published in a special double issue. It's titled "Digital Archiving and Preservation: Technologies and Processes for a Trusted Repository" and is intended to be a fairly nitty-gritty piece on digital preservation (in the context of trusted repositories) for archivists. The abstract:

This article examines what is implied by the term "trusted" in the phrase "trusted digital repositories." Digital repositories should be able to preserve electronic materials for periods at least comparable to existing preservation methods. Our collective lack of experience with preserving digital objects and consensus about the reliability of our technological infrastructure raises questions about how we should proceed with digital-based preservation practices, an emerging role for academic libraries and archival institutions. This article reviews issues relating to building a trusted digital repository, highlighting some of the issues raised and possible solutions proposed by the authors in their work of implementing and acculturating a digital repository at Rutgers University Libraries.

This special double-issue of JAO will also be released in the manuscript, "Archives and the Digital Library."

Thanks to editors Bill Landis, Robin Chandler, Tom Frusciano, and Caryn Radick for seeing this through. And of course to Ron Jantz for getting me interested in this crazy stuff at a time when I had no direction or interest in my career.

Identifier Persistence: Fundamentals

Posted by Michael Giarlo on June 05, 2007

A friend and former colleague asked if I would comment on a chapter in her upcoming book on digital rights management and I agreed. The chapter is about identification and authenticity of web resources. Throughout my review of the chapter, I kept coming back to a couple of very basic notions that underlie any effort to provide persistent identifiers for web resources. These notions are, to my mind, central to identifier persistence, and any other concerns rely upon this foundation:

  1. Identifier persistence requires an organizational commitment. Persistence cannot be ensured by a few renegades in the skunk-works, nor can it be mandated from on high without the support of those who manage the identifiers or produce web resources. All individuals involved in the life-cycle of web resources must be committed to persistence in perpetuity if true persistence of identifiers is to be achieved.
  2. No technology, no standard, no identifier scheme, no information architecture will get you persistence. Whether you choose native URIs, Handles, DOIs, PURLs, ARKs, UUIDs, or XRIs, you will never achieve identifier persistence without active management of your identifiers and web resources. This requires the aforementioned organizational commitment since such management cannot occur without sufficient resources. Management of web resources and identifiers requires time and due diligence and those don't come for free.

And, at the risk of being reductive, that's about it. Once you've got an organizational commitment and a person or team to manage your identifiers and web resources, the rest of the decisions are secondary. If you like semantically meaningful URLs that redirect, choose Handles; if you prefer opaque identifiers, go with ARKs; if you don't want to run your own software, consider PURLs. At that point, it really doesn't matter which scheme you choose, as long as its characteristics match your organization's values. You've already done the heavy lifting; rest easy.