Exploring curation micro-services
As far as I'm concerned, the most exciting developments this year in repositories and digital curation have come out of the California Digital Library. It has been impossible not to notice their papers and presentations. Put simply, their idea is that digital curation is enabled by "micro-services" built upon well-known abstractions such as the filesystem. The benefits are obvious: filesystem tools are ubiquitous and cross-platform, and there are strong market forces to ensure the filesystem persists. The idea is radically simple and straightforward, though many questions remain about such a paradigm. I'll return to those later.
If you have not yet taken a look at CDL's curation micro-service specifications, most of which may be printed on as few as one or two sheets of paper, see the Digital Library Building Blocks.
My co-workers in the LC Repository Development Center have been chatting about these specs on and off throughout the year. After months of procrastinating, I finally read all of the specs on Thursday; it's wonderful that you can do so in the course of one reading session, I might add. Yesterday a bunch of us RDCers got together to chat (informally) about the specs: what they're for, how they work, and how they interact with one another. I learn by doing, by examples, so I combed through each of the specs in advance of our meeting and tried to construct a minimal repository[1] based on micro-services.
Continue reading…
Notes
- Perhaps it's more in line with the specs to refer to this space as "a managed filesystem that drives repository and curation services," given the CDL philosophy that preservation is not a place/repository. But it's easier to say "repository," so there you go. [↩]
Sustaining digital libraries
About a month ago, I read on my colleague's blog that the Emory University Digital Library published a new book on sustaining digital libraries. I've finally started reading it and figured I would post a note here.
The articles of this monograph provide resources for digital library stakeholders who seek to better understand how to effectively evolve such efforts from short-term projects to long-term sustainable programs. The monograph includes contributions from leaders in major digital libraries that have made such transitions or which are systematically considering the question of programmatic sustainability, including representatives from the National Digital Infrastructure and Information Preservation Program (NDIIPP) and the National Science Digital Library (NSDL).
I might also note that the book is available for free as a PDF.
So far I've read the introduction by the editors and the abstract from Leslie's paper, and the book looks like a high-quality read from cover to cover, with articles based on actual digital library experience. It's a pragmatic approach for how to sustain digital library initiatives, looking beyond technical concerns towards the more challenging social and economic ones. To some extent, we are getting pretty good at preserving bits and relationships between collections of bits — it is yet to be seen how good we will be at preserving the preservation systems themselves.
Jythons and Javas and bears, oh my!
It's hard to believe but I've been at the new job for six months already, a full half-year come the 29th. Some days it seems like I've been here forever; others like I'm still a rank newb. I haven't written terribly much about what I've been up to (but I assure you I've been busy). Let me rectify that.
The Transfer Problem
Two of the projects I've been working on relate to a fairly general problem that we like to call "transfer," which revolves around, well, transferring files to and fro. Sounds simple. Is simple. That is, until you start thinking about preservation and accounting for a highly heterogeneous network with idiosyncratic nodes, esoteric storage software, and differential firewall rules. And that's where it gets interesting (and problematic). Continue reading…
Digital preservation for archivists
At long last, the paper that Ron Jantz and I wrote for the Journal of Archival Organization has been published in a special double issue. It's titled "Digital Archiving and Preservation: Technologies and Processes for a Trusted Repository" and is intended to be a fairly nitty-gritty piece on digital preservation (in the context of trusted repositories) for archivists. The abstract:
This article examines what is implied by the term "trusted" in the phrase "trusted digital repositories." Digital repositories should be able to preserve electronic materials for periods at least comparable to existing preservation methods. Our collective lack of experience with preserving digital objects and consensus about the reliability of our technological infrastructure raises questions about how we should proceed with digital-based preservation practices, an emerging role for academic libraries and archival institutions. This article reviews issues relating to building a trusted digital repository, highlighting some of the issues raised and possible solutions proposed by the authors in their work of implementing and acculturating a digital repository at Rutgers University Libraries.
This special double-issue of JAO will also be released in the manuscript, "Archives and the Digital Library."
Thanks to editors Bill Landis, Robin Chandler, Tom Frusciano, and Caryn Radick for seeing this through. And of course to Ron Jantz for getting me interested in this crazy stuff at a time when I had no direction or interest in my career.
Identifier Persistence: Fundamentals
A friend and former colleague asked if I would comment on a chapter in her upcoming book on digital rights management and I agreed. The chapter is about identification and authenticity of web resources. Throughout my review of the chapter, I kept coming back to a couple of very basic notions that underlie any effort to provide persistent identifiers for web resources. These notions are, to my mind, central to identifier persistence, and any other concerns rely upon this foundation:
- Identifier persistence requires an organizational commitment. Persistence cannot be ensured by a few renegades in the skunk-works, nor can it be mandated from on high without the support of those who manage the identifiers or produce web resources. All individuals involved in the life-cycle of web resources must be committed to persistence in perpetuity if true persistence of identifiers is to be achieved.
- No technology, no standard, no identifier scheme, no information architecture will get you persistence. Whether you choose native URIs, Handles, DOIs, PURLs, ARKs, UUIDs, or XRIs, you will never achieve identifier persistence without active management of your identifiers and web resources. This requires the aforementioned organizational commitment since such management cannot occur without sufficient resources. Management of web resources and identifiers requires time and due diligence and those don't come for free.
And, at the risk of being reductive, that's about it. Once you've got an organizational commitment and a person or team to manage your identifiers and web resources, the rest of the decisions are secondary. If you like semantically meaningful URLs that redirect, choose Handles; if you prefer opaque identifiers, go with ARKs; if you don't want to run your own software, consider PURLs. At that point, it really doesn't matter which scheme you choose, as long as its characteristics match your organization's values. You've already done the heavy lifting; rest easy.
