RESTful Fedora?
Matt Zumwalt of MediaShelf, LLC has been hard at work thinking about how to make Fedora RESTful. There is now a proposal on the Fedora wiki based on a PDF he sent to the fedora-commons-developers list.
It's an interesting proposal. I've read over the PDF version quickly but it does bear a closer read.
Whether SOAP or REST is more appropriate for a Fedora API is something I'm not sure about, though I don't mean to imply it's an either/or situation.
Digital preservation for archivists
At long last, the paper that Ron Jantz and I wrote for the Journal of Archival Organization has been published in a special double issue. It's titled "Digital Archiving and Preservation: Technologies and Processes for a Trusted Repository" and is intended to be a fairly nitty-gritty piece on digital preservation (in the context of trusted repositories) for archivists. The abstract:
This article examines what is implied by the term "trusted" in the phrase "trusted digital repositories." Digital repositories should be able to preserve electronic materials for periods at least comparable to existing preservation methods. Our collective lack of experience with preserving digital objects and consensus about the reliability of our technological infrastructure raises questions about how we should proceed with digital-based preservation practices, an emerging role for academic libraries and archival institutions. This article reviews issues relating to building a trusted digital repository, highlighting some of the issues raised and possible solutions proposed by the authors in their work of implementing and acculturating a digital repository at Rutgers University Libraries.
This special double-issue of JAO will also be released in the manuscript, "Archives and the Digital Library."
Thanks to editors Bill Landis, Robin Chandler, Tom Frusciano, and Caryn Radick for seeing this through. And of course to Ron Jantz for getting me interested in this crazy stuff at a time when I had no direction or interest in my career.
Fedora marches forward
I was pleased to see the note that Sandy Payette sent to the fedora-users mailing list earlier today, updating the community on the Fedora 2.2 release date. Version 2.2 is going to include a bunch of features, some of which have been long-awaited and are quite, well, sexy. Some of the highlights:
- Database support has been extended to include Postgres, which should make all the MySQL-haters happy
- Fedora may now be deployed via a .war file in an existing servlet container, such as Tomcat, rather than requiring its very own Tomcat server
- A Lucene- or Zebra-backed search service has been included, which is more robust than the previous search service that used the built-in Dublin Core-populated database
These are but a few of the enhancements, and I can't wait to put it through its paces when it's released on January 19th.
For a more complete set of feature enhancements, click on the link above to read Sandy's message.
Now if we can come together as a community and work on some more UIs, and get them used in some high-profile projects, many of the gripes against Fedora may be silenced. It's still not a perfect product, but what is? That it uses XML as a storage format and exposes its functions via web-services APIs and allows use of any metadata schema, in my humble opinion, puts it head and shoulders above many other library repository solutions. And for that, it's at least worth consideration.
Why Fedora? More answers to the Fedora users survey
I noticed this response to the Fedora users survey on Peter Murray's blog, and figured I'd post a response. Since my previous employer did not use Fedora, and I haven't begun my new job yet, I'll be posting about our use of Fedora at Rutgers, The State University of New Jersey.
The Jester's Case for Fedora
Peter Murray has written a series of pieces about the Fedora digital repository system over at the Disruptive Library Technology Jester blog.
In the first piece, On the Need for a General Purpose Digital Object Repository, it is argued that having a unified repository simplifies management of information systems or "silos." For instance, there needn't be duplication of workflows or synchronization of content if a number of an organization's repositories, digital libraries, electronic journals, course management systems and so on are all built atop a robust institutional repository. A unified repository is useful if one desires a search across previously disparate digital projects or collections, if one wishes to eliminate redundancies in coding, if one intends to have a particular object, collection of objects, or part of an object shared across different systems — e.g., a journal article repurposed in a course management system and deposited into an open archive. With an open, flexible repository, like Fedora, such a configuration is possible assuming your organization, unit, or consortium has someone to devote to managing and customizing the repository.
An advantage of using the Fedora system, as outlined in Why Fedora? Because You Don't Need Fedora, is that due to modular design and adherence to more or less open standards, one is not necessarily wedded to Fedora for the foreseeable future. Items in a Fedora repository are serialized as XML objects, either in the Fedora-METS or FOXML format. While some of this information is copied into a relational database system and an RDF triplestore for speed and convenience, it is all intact within the serialized XML objects which reside in a predictable directory hierarchy on the local filesystem. There are at least two advantages to this design:
- Should Fedora experience a catastrophic system glitch, one may rebuild the entire system via a built-in utility (cleverly named "fedora-rebuild") that goes through the objects on the filesystem and restocks the database and triplestore. And assuming that the administrator of the system is worth his salt, there should be regular full backups of the filesystem, so the entire repository should be rebuildable. As Peter notes, a simple copy of the filesystem on which the XML objects reside is a fine practice in a larger digital preservation strategy.
- If one decides to move away from Fedora to the Next Best Thing™, it should be relatively simple to migrate content from Fedora into the new system because of Fedora's storage of all objects (and associated metadata, files, and disseminators) to the filesystem as serialized XML. All one needs, perhaps, is a set of funky XSLT scripts to massage the objects into a format that works with the new system and voila. (That is a gross oversimplification, but the point remains that open standards, simple file operations, and XML markup do make for more orderly migrations than black boxes, complex datastores, and loose coupling of information.)
- Having one's objects stored as XML on the filesystem also opens up opportunities to see how tools which act thereupon might be glued into the repository infrastructure. One such example might be for an XML-aware search engine (such as amberfish, Lucene, or Zebra). Since you've got low-level access to these files, it would be fairly simple to tack on a search & indexing system that is independent of your choice of repository.
The third piece, Thinking about Our Fedora Disseminators, highlights Fedora as a repository system that's put real emphasis on digital preservation. While other repository systems allow for preservation of an object and its metadata, Fedora grants one the ability to preserve the behavior of digital objects and the datastreams thereof, a potential approach to the issue of format migration/emulation. Through a dissemination abstraction (the "behavior definition") one might apply the same abstract behaviors to items in different formats, saving one the time of defining redundant behaviors. My explanation is rather vague and incomplete, so I would encourage you to read Peter's third piece in detail. The point is that "for each record, the application simply asked the repository to deliver a thumbnail of the object. And the repository, regardless of media type, delivered one."
Taken together, Peter makes a strong case for Fedora as a fine back-end for a unified, multi-purpose repository. Unlike other repository systems that focus more on the front-end, Fedora focuses on being the plumbing, the "digital library operating system" as Ron Jantz calls it. Were I not already a Fedora enthusiast, I would find it quite difficult not to consider Fedora (or something like it, such as LANL's aDORe Archive) at MPOW. Now if someone can send me some hints on drumming up institutional support…
