The Jester's Case for Fedora
Peter Murray has written a series of pieces about the Fedora digital repository system over at the Disruptive Library Technology Jester blog.
In the first piece, On the Need for a General Purpose Digital Object Repository, it is argued that having a unified repository simplifies management of information systems or "silos." For instance, there needn't be duplication of workflows or synchronization of content if a number of an organization's repositories, digital libraries, electronic journals, course management systems and so on are all built atop a robust institutional repository. A unified repository is useful if one desires a search across previously disparate digital projects or collections, if one wishes to eliminate redundancies in coding, if one intends to have a particular object, collection of objects, or part of an object shared across different systems — e.g., a journal article repurposed in a course management system and deposited into an open archive. With an open, flexible repository, like Fedora, such a configuration is possible assuming your organization, unit, or consortium has someone to devote to managing and customizing the repository.
An advantage of using the Fedora system, as outlined in Why Fedora? Because You Don't Need Fedora, is that due to modular design and adherence to more or less open standards, one is not necessarily wedded to Fedora for the foreseeable future. Items in a Fedora repository are serialized as XML objects, either in the Fedora-METS or FOXML format. While some of this information is copied into a relational database system and an RDF triplestore for speed and convenience, it is all intact within the serialized XML objects which reside in a predictable directory hierarchy on the local filesystem. There are at least two advantages to this design:
- Should Fedora experience a catastrophic system glitch, one may rebuild the entire system via a built-in utility (cleverly named "fedora-rebuild") that goes through the objects on the filesystem and restocks the database and triplestore. And assuming that the administrator of the system is worth his salt, there should be regular full backups of the filesystem, so the entire repository should be rebuildable. As Peter notes, a simple copy of the filesystem on which the XML objects reside is a fine practice in a larger digital preservation strategy.
- If one decides to move away from Fedora to the Next Best Thing™, it should be relatively simple to migrate content from Fedora into the new system because of Fedora's storage of all objects (and associated metadata, files, and disseminators) to the filesystem as serialized XML. All one needs, perhaps, is a set of funky XSLT scripts to massage the objects into a format that works with the new system and voila. (That is a gross oversimplification, but the point remains that open standards, simple file operations, and XML markup do make for more orderly migrations than black boxes, complex datastores, and loose coupling of information.)
- Having one's objects stored as XML on the filesystem also opens up opportunities to see how tools which act thereupon might be glued into the repository infrastructure. One such example might be for an XML-aware search engine (such as amberfish, Lucene, or Zebra). Since you've got low-level access to these files, it would be fairly simple to tack on a search & indexing system that is independent of your choice of repository.
The third piece, Thinking about Our Fedora Disseminators, highlights Fedora as a repository system that's put real emphasis on digital preservation. While other repository systems allow for preservation of an object and its metadata, Fedora grants one the ability to preserve the behavior of digital objects and the datastreams thereof, a potential approach to the issue of format migration/emulation. Through a dissemination abstraction (the "behavior definition") one might apply the same abstract behaviors to items in different formats, saving one the time of defining redundant behaviors. My explanation is rather vague and incomplete, so I would encourage you to read Peter's third piece in detail. The point is that "for each record, the application simply asked the repository to deliver a thumbnail of the object. And the repository, regardless of media type, delivered one."
Taken together, Peter makes a strong case for Fedora as a fine back-end for a unified, multi-purpose repository. Unlike other repository systems that focus more on the front-end, Fedora focuses on being the plumbing, the "digital library operating system" as Ron Jantz calls it. Were I not already a Fedora enthusiast, I would find it quite difficult not to consider Fedora (or something like it, such as LANL's aDORe Archive) at MPOW. Now if someone can send me some hints on drumming up institutional support…
Trackbacks
Use this link to trackback from your own site.

[...] Michael J. Giarlo wrote a very nice summary of my FEDORA trilogy (only three parts so far — I think there are more good things to say about FEDORA; and besides, I like Douglas Adams' concept of what a trilogy should be), and added a piece that I hadn't considered: ¶ [...]
For anyone reading Michael's excellent summary, in the name of full disclosure I should point out that a piece of "Why Fedora? Because You Don’t Need Fedora" had to be revisited with "Fedora, Objects, Datastreams, Filesystems, and a Correction". The correction came after Michael posted his summary of the articles. Be sure to read both (the former will point you to the latter) to get the full picture.
I'm not sure whether it is of interest to people reading thhis blog but the Public Library of Scioence's new high-volume, highly inclusive, highly interactive journal, PLoS ONE is being launched on a Fedora based publishing platform. There is a lot of common cause between PLoS ONE and institutional repositories so it would be great if the two were working to similar standards.