Is MARC a data model?
I posted a status update to Twitter, identi.ca, and Facebook late last night hoping to suss out two questions:
- Is MARC a data model?
- But really: what qualifies something as a data model?
I'd poked around looking for clues to the latter and was left cold by the long Wikipedia entry. Maybe I've been doing the micro-blog thing for too long and my ability to parse information that comes in greater-than-140-character chunks has been damaged. Plus I like learning from examples, and what better example for the library geek than MARC?
The feedback I received was pretty impressive, and not all of it consistent with the rest. I found it an interesting example of crowdsourcing, so to speak. As each response came in, I would read it, cross-reference with, e.g., Wikipedia articles, for accuracy, and revise my own answers to the above questions. I'm honing in on an answer to the former question. The latter question is still a bit murky.
I thought I'd share the responses, too. Responses from Twitter are included in full w/ links to the original. Responses from quasi-public Facebook have been anonymized. You can see my replies interspersed as well and watch the evolution of the (admittedly short) discussion. After the jump:
@bangpound: @mjgiarlo MARC is a markup language. It makes no declarations about how data is stored only how it's formatted.
@ranginui: @mjgiarlo a piece of crap, cue neil young and crazy horse
@anarchivist: @mjgiarlo not a data model, it's a transmission format
@vphill: @mjgiarlo I've heard that said about MARC too, let me know if you get an answer
A container for a data model, such as AACR2
@mjgiarlo: @bangpound, @anarchivist, @vphill: So. let's see: MARC21 bib is a profile of a serialization/transmission format w/ AACR2 as the data model?
@anarchivist: @mjgiarlo wouldn't even assume AACR2 if I was you.
@mjgiarlo: @anarchivist: Okay. Something says "authors go in 100; contributors go in 700," though, right? Is that not a data model? Sorry if dense.
MARC is not a data model (and neither is AACR2) in the sense that neither of them explicitly describes entities and relationships among entities. The relationships in these two non-relational frameworks are implicit, and the semantics of the model must be supplied in the end by the people who use these frameworks. RDA/FRBR is a move toward an actual data model — it makes some relationships explicit and can properly be represented in an Entity-Relationship diagram (with all those relationship words that explicitly express the semantics — words like, for example, "is realized through" or "is embodied in" or "is exemplified by"), but even RDA/FRBR does not fully express all of the relationships/semantics and must be translated into an actual data model in order to be implemented — librarians have been irresponsible, in my opinion, in refusing to learn about relational database concepts, mostly because of their slavish adherence to the old flat-file style that MARC represents.
@gmcharlt: @mjgiarlo MARC is many things at once, which is part of the problem. Not just transmission standard; embodies current cataloging worldview
@edsu: @mjgiarlo i think there are aspects of data modeling in Z39.2 & ISO 2709, and certainly in MARC21 ; that said, i think @gmcharlt is right.
So, based on all the responses I've gotten (on Facebook, on Twitter, around the office), here's my current thinking:
- MARC means more than one thing.
- One meaning of MARC is MARC the binary format. A format is not a data model.
- Another meaning of MARC is, e.g., MARC21 Bibliographic.
- MARC21 Bibliographic is a profile of MARC, which is serialized in the MARC binary format.
- MARC21 Bibliographic defines semantics for fields and subfields and indicators, which makes it feel like a data model. This gets at some of the assumptions I've internalized about data models.
- The MARC21 Bibliographic data model thus has well-defined entities, but otherwise is a poor data model, primarily because:
- It does not have well-defined relationships between the entities;
- It conflates different conceptual models, such as the FRBR Group 1 entities and also mixes FRBR Group 1 entities with Group 2 and 3 entities.
- I'm not sure where this leaves AACR2, but it feels like it just fell out of the discussion.
I'd be pleased if the discussion continued. If nothing else, it really satisfies my curiosity and gets my brain going (which is useful on a Monday morning).
Trackbacks
Use this link to trackback from your own site.

Nice post. Your summary seems pretty much spot on to me. You got me thinking (not for the first time) about the definition of a "data model". Like you, I entered a maze of twisty little wikipedia pages. 15 minutes later I found myself gazing slack-jawed at Codd's paper where he defined "Data Model" for the first time . The Intarwebs, they are truly astounding.
It's interesting to me that Codd himself had to convince people that a data model should be considered distinct from its physical implementation. This seems like the same sort of conflation that happens when people consider MARC transmission format to be a data model.
Kinda interesting to see how people argued (and presumably still do) about what a data model is.
MARC is not a data model. It is a transmission standard. Even the punctuation used within MARC is not dictated by MARC, but rather typically by ISBD. Other data models and punctuation standards beyond AACR2 and ISBD also use MARC for transmission (how else do we get those German records into OCLC?). It's a carrier, not a model.
MARC was never supposed to be a data model. The problem is we, as librarians, have "MARC on the brain" (as described by Diane Hillmann). MARC now means multiple things, several of which it was never designed to be (such as a content/data model/standard). In addition, MARC is frequently used interchangeably with AACR2, which is not correct. Until we separate MARC from how we think about our data, it will be difficult for libraries and librarians to move forward into a world of linked data and true data models. 'Cause the data is good, it's just the carrier (MARC) that's so restrictive and flat.
@Ed: Thanks for the reference. That's helpful for grounding. Here's the link (PDF) for others if they're interested:
@Shana: I'd be interested in hearing if my understanding of the issues, the bullet-points down near the bottom of my post, sounds about right to you. I tried to tease apart the different senses of MARC and also concluded that AACR2 may be beside the point.
MARC21 actually includes five different formats for different purposes: bibliographic, authority, holdings, classification, and community information. Ultimately it is a syntax, with a communications piece and a data piece.
There was a session at ALA Annual in Chicago (July 2009) called "The Future of MARC" which was immensely helpful in teasing out what MARC actually is and all the differences, in particular Rebecca Guenther's and Diane Hillmann's presentations. There's a recording available: http://library.csun.edu/mwoodley/Future_MARC10Jul2009.mp3
There are lots of limitations to MARC, of which you've mentioned a couple. I think you're on the right track. It's a tricky issue to tease out, mainly because we use MARC to mean multiple different things in the library world.
As for AACR2, it should fall out of the discussion as it is a separate piece from MARC. We use MARC to "hold" AACR2 data, but they are fundamentally different (AACR2 is a set of cataloging rules/guidelines defining the data, MARC is a carrier for the data defined by those rules).
I wish we could disambiguate MARC:
"MARC now means multiple things, several of which it was never designed to be (such as a content/data model/standard)."
Is anyone seriously working on that? Is there any sense in which RDA or the DCMI work will contribute to such a separation?
One thing to not lose in this discussion is that not only do we mean two things when we say MARC — AACR2 and the transmission format — but that the latter influences and restricts the former.
What I don't know is how many (incredibly smart) library folks are restricted in their ability to think about our data because they're internally influenced by the format of tag/ind1/ind2/subfield/string-value, although it's hard to look at RDA and think that (at least on average) our capacity to conceptualize new data models is unfettered by the limitations of the MARC data format.