Exploring curation micro-services
As far as I'm concerned, the most exciting developments this year in repositories and digital curation have come out of the California Digital Library. It has been impossible not to notice their papers and presentations. Put simply, their idea is that digital curation is enabled by "micro-services" built upon well-known abstractions such as the filesystem. The benefits are obvious: filesystem tools are ubiquitous and cross-platform, and there are strong market forces to ensure the filesystem persists. The idea is radically simple and straightforward, though many questions remain about such a paradigm. I'll return to those later.
If you have not yet taken a look at CDL's curation micro-service specifications, most of which may be printed on as few as one or two sheets of paper, see the Digital Library Building Blocks.
My co-workers in the LC Repository Development Center have been chatting about these specs on and off throughout the year. After months of procrastinating, I finally read all of the specs on Thursday; it's wonderful that you can do so in the course of one reading session, I might add. Yesterday a bunch of us RDCers got together to chat (informally) about the specs: what they're for, how they work, and how they interact with one another. I learn by doing, by examples, so I combed through each of the specs in advance of our meeting and tried to construct a minimal repository[1] based on micro-services.
Continue reading…
Notes
- Perhaps it's more in line with the specs to refer to this space as "a managed filesystem that drives repository and curation services," given the CDL philosophy that preservation is not a place/repository. But it's easier to say "repository," so there you go. [↩]
Command-line shuffle
Being a nerd, I tend to like the command-line. When I'm working on my laptop at home, I tend to like listening to music. Before I discovered that mplayer had a really convenient shuffle idiom, I would invoke it thusly (to listen to all my Pavement tracks in shuffle mode):
export IFS=$'\n' for track in $(find /mnt/upnp/MediaTomb/Audio/Artists/Pavement -name \*.mp3 | ~/bin/shuffle.py); do mplayer $track; done
And the wee shuffle script I whipped together looks like this:
#!/usr/bin/env python # shuffle.py import sys import random args = list(sys.stdin) random.shuffle(args) sys.stdout.writelines(args)
And here's the convenient shuffle idiom that renders my arg-shuffling script somewhat useless:
find /mnt/upnp/MediaTomb/Audio/Artists/Pavement -name \*.mp3 | mplayer -playlist - -shuffle -loop 0
I2: Survey results
I wrote in June that the I2 subgroup surveyed "repository managers to determine the current practices and needs of the repository community regarding institutional identifiers. Results from the survey will inform a set of use cases that will be shared with the community, and that are expected to drive the development of a new standard for institutional identifiers."
The survey closed in July, and the subgroup spent August writing a report on the survey results. That report is now final and it's available to the public. Feedback may be sent to our (woefully underutilized) public i2info mailing list, left as a comment on this post, or e-mailed to me privately which I can forward to our internal list.
The next step is to build upon the report to draw yet more conclusions from the data — there's an awful lot there — and flesh out some repository use cases for institutional identifiers. The I2 core group is moving quickly towards finalizing identifier metadata elements so that a standard may be drafted, and I think having some use cases documented will help drive the standard in a direction the community can get behind.
Onward and upward.
