Sustaining digital libraries
About a month ago, I read on my colleague's blog that the Emory University Digital Library published a new book on sustaining digital libraries. I've finally started reading it and figured I would post a note here.
The articles of this monograph provide resources for digital library stakeholders who seek to better understand how to effectively evolve such efforts from short-term projects to long-term sustainable programs. The monograph includes contributions from leaders in major digital libraries that have made such transitions or which are systematically considering the question of programmatic sustainability, including representatives from the National Digital Infrastructure and Information Preservation Program (NDIIPP) and the National Science Digital Library (NSDL).
I might also note that the book is available for free as a PDF.
So far I've read the introduction by the editors and the abstract from Leslie's paper, and the book looks like a high-quality read from cover to cover, with articles based on actual digital library experience. It's a pragmatic approach for how to sustain digital library initiatives, looking beyond technical concerns towards the more challenging social and economic ones. To some extent, we are getting pretty good at preserving bits and relationships between collections of bits — it is yet to be seen how good we will be at preserving the preservation systems themselves.
Jythons and Javas and bears, oh my!
It's hard to believe but I've been at the new job for six months already, a full half-year come the 29th. Some days it seems like I've been here forever; others like I'm still a rank newb. I haven't written terribly much about what I've been up to (but I assure you I've been busy). Let me rectify that.
The Transfer Problem
Two of the projects I've been working on relate to a fairly general problem that we like to call "transfer," which revolves around, well, transferring files to and fro. Sounds simple. Is simple. That is, until you start thinking about preservation and accounting for a highly heterogeneous network with idiosyncratic nodes, esoteric storage software, and differential firewall rules. And that's where it gets interesting (and problematic). Continue reading…
Digital preservation for archivists
At long last, the paper that Ron Jantz and I wrote for the Journal of Archival Organization has been published in a special double issue. It's titled "Digital Archiving and Preservation: Technologies and Processes for a Trusted Repository" and is intended to be a fairly nitty-gritty piece on digital preservation (in the context of trusted repositories) for archivists. The abstract:
This article examines what is implied by the term "trusted" in the phrase "trusted digital repositories." Digital repositories should be able to preserve electronic materials for periods at least comparable to existing preservation methods. Our collective lack of experience with preserving digital objects and consensus about the reliability of our technological infrastructure raises questions about how we should proceed with digital-based preservation practices, an emerging role for academic libraries and archival institutions. This article reviews issues relating to building a trusted digital repository, highlighting some of the issues raised and possible solutions proposed by the authors in their work of implementing and acculturating a digital repository at Rutgers University Libraries.
This special double-issue of JAO will also be released in the manuscript, "Archives and the Digital Library."
Thanks to editors Bill Landis, Robin Chandler, Tom Frusciano, and Caryn Radick for seeing this through. And of course to Ron Jantz for getting me interested in this crazy stuff at a time when I had no direction or interest in my career.
Identifier Persistence: Fundamentals
A friend and former colleague asked if I would comment on a chapter in her upcoming book on digital rights management and I agreed. The chapter is about identification and authenticity of web resources. Throughout my review of the chapter, I kept coming back to a couple of very basic notions that underlie any effort to provide persistent identifiers for web resources. These notions are, to my mind, central to identifier persistence, and any other concerns rely upon this foundation:
- Identifier persistence requires an organizational commitment. Persistence cannot be ensured by a few renegades in the skunk-works, nor can it be mandated from on high without the support of those who manage the identifiers or produce web resources. All individuals involved in the life-cycle of web resources must be committed to persistence in perpetuity if true persistence of identifiers is to be achieved.
- No technology, no standard, no identifier scheme, no information architecture will get you persistence. Whether you choose native URIs, Handles, DOIs, PURLs, ARKs, UUIDs, or XRIs, you will never achieve identifier persistence without active management of your identifiers and web resources. This requires the aforementioned organizational commitment since such management cannot occur without sufficient resources. Management of web resources and identifiers requires time and due diligence and those don't come for free.
And, at the risk of being reductive, that's about it. Once you've got an organizational commitment and a person or team to manage your identifiers and web resources, the rest of the decisions are secondary. If you like semantically meaningful URLs that redirect, choose Handles; if you prefer opaque identifiers, go with ARKs; if you don't want to run your own software, consider PURLs. At that point, it really doesn't matter which scheme you choose, as long as its characteristics match your organization's values. You've already done the heavy lifting; rest easy.
Princeton, meet Google
Google and Princeton University went public twenty minutes ago in announcing their arrangement to digitize roughly one million of Princeton's public domain works as part of the ongoing Google Books initiative. Princeton is one of a growing number of academic institutions in the United States — including private institutions like Harvard and Stanford and state universities in Virginia, Texas, and California — that have worked with Google on this. For more information, see the post on the Google Books blog or the official announcement.
Many of the details need to be worked out yet, but I am hopeful this will bode well for our users here at Princeton and for the broader community.
The need for digital preservation just grows and grows.
