I2: Requirements

Posted by Michael Giarlo on June 07, 2009

[Series]

The I2 IR scenario subgroup approached the issue of institutional identifiers in repositories by first brainstorming about the various issues, problems, and sticking points that make identifiers in this space (and elsewhere) such a complex topic. Folks on the subgroup are repository managers or are otherwise involved with or knowledgeable about the repository space, so the brainstorming exercise yielded a good number of concerns.

The purpose of the exercise was to enumerate concerns and issues that could inform a draft survey to be administered to repository managers and experts around the globe in different organizational contexts: libraries, subject disciplines, archives, historical societies, etc. The purpose of the survey is to get an idea of the use cases and constraints around institutional identifiers in these different repository contexts, the assumption being that we ought to have requirements grounded in real world usage before we go off building a standard.

I will note here that the subgroup has worked up a draft survey that has just recently been reviewed by a small group of folks who know about survey design, and we hope to administer the survey to the aforementioned Reporati this week[1]. Which is to say that I don't yet have a strong grasp of the use cases out there in the wild, and this series should be construed as my own premature cognitive fumblings. But let's assume for now that what we learn from the survey results matches our initial brainstorming exercise.

Here is a slightly modified and boiled down version of the concerns and issues the subgroup came up with for a potential institutional identifier standard, which resembles a set of minimum requirements:

  1. Should be agnostic to type of institution, e.g., libraries, museums, personal collections, historical societies
  2. Should handle varying institutional granularity, e.g., institution-level, campus-level, division-level, unit-level
  3. Should handle linking among institutions and subordinate units
  4. Should express different sorts of relationships among these institutions and units
  5. Should relate to existing relevant identifiers and registries
  6. Should be globally unique
  7. Should be actionable
  8. Should enable retrieval of metadata sufficient to identify the institution, which may vary widely by institution
  9. Should accommodate changes as institutions come and go and re-organize and be able to relate defunct institutions to new ones

I doubt the list is exhaustive; I am almost certain we will uncover all sorts of tangly and esoteric use cases that add requirements. I expect it. Why else would we be gathering to discuss the need for an institutional identifier if it were a solved problem or a simple one? [2]

Nevertheless, looking at the above list, the task we've taken on starts to feel less onerous. And thinking about identifier systems constrained by the list of concerns, the mind starts to cook up all sorts of possible solutions. I'll share one in the next post in this series, a strawman proposal of sorts, and how it addresses each of these requirements.

Notes
  1. We will also x-post to repo-related mailing lists as well, and some of us may blog or tweet about it. My inclination is to cast as wide a net as possible so as not to miss important use cases. We can always scope things out later on, but it's useful to be inclusive at this point lest our own assumptions carry the group forward. []
  2. The cynical among you might have interesting answers to this question. []


I2: Background

Posted by Michael Giarlo on May 19, 2009

[Series]

This is the first in a series of posts about institutional identifiers[1].

In my last post, I alluded to some documentation that I've written. That was somewhat misleading, which will soon be apparent, but I liked the parallel construction I had going, and I am but a slave to orderliness.

For about the past six months, I have been working with a NISO group looking into how institutions are identified within information systems:

The I2 (Institutional Identifiers — pronounced "I 2") working group will build on work from the Journal Supply Chain Efficiency Improvement Pilot (http://www.journalsupplychain.com/), which demonstrated the improved efficiencies of using an institutional identifier in the journal supply chain. The NISO working group will develop a standard for an institutional identifier that can be implemented in all library and publishing environments. The standard will include definition of the metadata required to be collected with the identifier and what uses can be made of that metadata. …

The I2 group is split into a few subgroups which have been charged with looking into how institutional identifiers are used in particular scenarios. These scenarios are e-resources, repositories and e-learning systems, and library resource workflows. The scenario names pain me a bit, but so be it; this is our industry, and there are bigger windmills to tilt at.

I am currently co-chairing the subgroup looking at repositories and e-learning, and apparently I am its "tech lead." I don't want to get caught up on names and roles and titles, though; this series isn't about those at all. I'm just setting the scene and explaining why my head's in this space and laying bare my stake in the issue.

The remainder of this series will provide a bit more detail on the issues around institutional identifiers, share how the repository subgroup is grappling with identifier issues and engaging the repository community to assess needs, propose an approach for an identifier system that may meet said needs, and explore what seems to be the thorniest issue[2].

Notes
  1. I offer that very tentatively, knowing what a spectacular failure my last attempt at a series was. []
  2. Hint: management. I know, "duh," right? []


Cataloging and institutional repositories

Posted by Michael Giarlo on February 09, 2009

While doing some reading for a little talk my colleague, Ed Summers, and I are giving at code4lib 2009, I came across a paragraph that sparked a crazy thought. So crazy that it's not crazy at all. So not crazy that I am sure other people have thought of it. But nonetheless, here I am writing about it just in case.

From Sarah Currier's paper on SWORD (emphasis mine):

One of the most frequently cited barriers to academics depositing their teaching materials into repositories is the keystroke-count involved in logging into a repository, uploading the resource, creating metadata, perhaps selecting a licence, and publishing the resource. It was a quick win, therefore, to create a drag-and-drop desktop tool to allow a single keystroke deposit of resources, including multiple resources in one action. For a repository that supports automatic metadata generation, administrative metadata can be created at the point of entry to the repository without the user needing to create any.

And I wondered how many repositories supported automatic metadata generation. I wondered how many repositories supported automatic generation of rich metadata. And lastly I wondered, might this be a more or less natural role for catalogers: augmenting stub metadata records or doing original cataloging for institutional repository deposits? Especially at a time when many of them are being reclassified as acquisitions specialists or digital projects managers?

Potential issues and questions:

  • Author ignorance: Maybe catalogers are already doing this and I'm a moron?
  • Scale: Is it realistic to expect to be able to "keep up" with repository deposits?
  • Granularity: Does cataloging at the level of articles, and perhaps at even finer granularities, introduce challenges?
  • Duplication: If pre-prints are cataloged in the IR, for instance, will they need to be cataloged again later?
  • … there are others I thought of on my commute this morning but have since forgotten them. Feel free to add comments.

I will admit here that I've been somewhat out of the (academic) institutional repository space a while, and cataloging is something I don't share thoughts about very often because my exposure is limited to having taken one course a couple years ago.

I assume there's a body of research about this out there somewhere but I figured I'd post this anyway.

Molotovs away!

Posted by Michael Giarlo on December 23, 2008

Lest I be criticized for unfairly calling out former employers in my recent Burn the Walled Gardens rant, I share news that the Rutgers University Libraries have boldly ventured into the world of open source software: RUcore Open Source Development. Huzzah! Thanks to the molotov-hurling Shaun Ellis, a peacenik/code monkey/musician extraordinaire, for all of his work and for bringing this to my attention.

On the RUcore open source page you can get a list of ongoing projects, a release schedule, and a rationale for their licensing decisions (i.e., choosing GPL 3).

The first project to be released (as of 2008/12/19) is the METS-based bibliographic utility, OpenMIC:

OpenMIC is an open source, web-based cataloging tool that can be used as a standalone application or integrated with other repository architectures by a wide range of organizations. It provides a complete metadata creation system for analog and digital materials, with services to export these metadata in standard formats.

  • Low overhead and infrastructure requirements
  • Events-based model for management and rights documentation
  • Mapping and import from standard and in-house formats
  • Unicode and CJK vernacular character support

OpenMIC is a core application for the Moving Image Collections (MIC) initiative developed at the Rutgers University Libraries with funding from the Library of Congress.

I look forward to following along as Rutgers releases yet more of the tools they have developed as part of their impressive digital library infrastructure. It will be even more interesting to hear what their model will be for taking patches / commits from the broader open source community. These things do take time, even though I failed to show an appreciation for that in my original rant, but I am reminded (by Jonathan Rochkind) that it's better to take the time and get it right. I cringe a bit to say that, knowing full well how things tend to languish in committees and fall victim to analysis paralysis in academia; surely there is some middle ground? There are some very talented and experienced folks at Rutgers, so I will be excited to see them take a leadership role in this space.

Go, Scarlet Knights!

ORE plugin updated

Posted by Michael Giarlo on July 25, 2008

I've been using my time at RepoCamp today to get the OAI-ORE plugin for WordPress validating again.  I'm having some trouble using the validator so I say that with some diffidence.  But the latest code which is now checked in to the WordPress plugins svn repo ought to be close, if not fully conformant, to the 0.9 version of the ORE spec.

I'm not sure the plugin is really useful; it's just an Atom feed of all posts and pages in a WP instance.  I can think of some ways to make this more useful, by allowing blog authors to create their own aggregations, pulling in content outside of the particular instance.  I am certain that others can come up with even better uses.  I'm open to suggestions.

Thanks to Jay Datema for prodding me a bit, if indirectly.