<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Exploring curation micro-services</title>
	<atom:link href="http://lackoftalent.org/michael/blog/2009/09/27/exploring-curation-micro-services/feed/" rel="self" type="application/rss+xml" />
	<link>http://lackoftalent.org/michael/blog/2009/09/27/exploring-curation-micro-services/</link>
	<description>The occasional rambling of a digital library artisan</description>
	<lastBuildDate>Mon, 22 Feb 2010 22:08:57 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: inkdroid &#8250; ipres, iipc, pasig roundup/braindump</title>
		<link>http://lackoftalent.org/michael/blog/2009/09/27/exploring-curation-micro-services/comment-page-1/#comment-93935</link>
		<dc:creator>inkdroid &#8250; ipres, iipc, pasig roundup/braindump</dc:creator>
		<pubDate>Fri, 13 Nov 2009 12:49:59 +0000</pubDate>
		<guid isPermaLink="false">http://lackoftalent.org/michael/blog/?p=504#comment-93935</guid>
		<description>[...] notion of curation micro-services, and how they enable digital preservation efforts at CDL. Several folks in my group at LC have been taking a close look at the CDL specifications recently, so getting to [...]</description>
		<content:encoded><![CDATA[<p>[...] notion of curation micro-services, and how they enable digital preservation efforts at CDL. Several folks in my group at LC have been taking a close look at the CDL specifications recently, so getting to [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Brian Kennison</title>
		<link>http://lackoftalent.org/michael/blog/2009/09/27/exploring-curation-micro-services/comment-page-1/#comment-93839</link>
		<dc:creator>Brian Kennison</dc:creator>
		<pubDate>Thu, 22 Oct 2009 23:57:27 +0000</pubDate>
		<guid isPermaLink="false">http://lackoftalent.org/michael/blog/?p=504#comment-93839</guid>
		<description>This a follow up to my above post. 

I believe in the concepts that the CDL specs represent I just didn&#039;t know how I could make them work. Well I think I&#039;m a little closer. I&#039;ve been looking at Ed Summer&#039;s &quot;dfat.py&quot; that you mention in this post and also Ben O&#039;Steen&#039;s &lt;a href=&quot;http://oxfordrepo.blogspot.com/2009/10/python-in-pairtree.html&quot; rel=&quot;nofollow&quot;&gt;pairtree.py&lt;/a&gt; and they have helped  me get a better understanding of how these specs function. 

My big realization was that, if I can construct a really good metadata record made up from all the parts of the dflat, I can store objects in the repository, collect that metadata in the tool of my choice, and provide access to those object from an arbitrary set of urls. 

Case in Point: I&#039;m using IndexData&#039;s &lt;a href=&quot;http://www.indexdata.com/zebra&quot; rel=&quot;nofollow&quot;&gt;Zebra&lt;/a&gt; for several projects. Zebra is lightweight, stores the complete xml, but lets you retrieve the data in any format you can write an xslt stylesheet for, and provides an SRU interface. I use it as an XML database with a SRU interface. 

So what I&#039;m thinking is,  I&#039;ll store my assets in a pairtree (I need to build the ingest using Ed and Ben&#039;s tools), and then point Zebra to the tree and let it index it (only looking for xml metadata files). I&#039;ll then create a public site (I&#039;m using Zope with xslt-methods) that contains a &quot;splash page&quot; for the collection, a query page (SRU) [or combine the two], and then write xslt to handle the results of the query. The public site, while serving the assets from the pairtree repository, is essentially independent of the asset store and vis-versa (just what I want). 

I think I&#039;m going to create a pairtree site for each collection (good for humans) but harvest the metatdata into a central Zebra index. This way each collection can have it&#039;s own look and feel but the  collections can be federated for search. 

The critical thing here is getting everything you need to present (but nothing more) into that metadata record. I&#039;ve been looking at &lt;a&gt;UNT&#039;s&lt;/a&gt; model but I think you just have to do what&#039;s right for you and then convert to the standard models (DC, OAI-DC, LOM, etc.).  Is METs still the standard or do we use OAI-RE?

While I don&#039;t want to add too many smarts to a pairtree one thing that might be nice is if you could ask the pairtree for all it&#039;s meatadata files  in an Atom or OAI format. That way, like the current &lt;a href=&quot;http://omeka.org/&quot; rel=&quot;nofollow&quot;&gt;Omeka&lt;/a&gt;software that can display content from  a feed, you can present this data as needed.

FWIW (for what it&#039;s worth),

--Brian</description>
		<content:encoded><![CDATA[<p>This a follow up to my above post. </p>
<p>I believe in the concepts that the CDL specs represent I just didn&#039;t know how I could make them work. Well I think I&#039;m a little closer. I&#039;ve been looking at Ed Summer&#039;s &#034;dfat.py&#034; that you mention in this post and also Ben O&#039;Steen&#039;s <a href="http://oxfordrepo.blogspot.com/2009/10/python-in-pairtree.html" rel="nofollow">pairtree.py</a> and they have helped  me get a better understanding of how these specs function. </p>
<p>My big realization was that, if I can construct a really good metadata record made up from all the parts of the dflat, I can store objects in the repository, collect that metadata in the tool of my choice, and provide access to those object from an arbitrary set of urls. </p>
<p>Case in Point: I&#039;m using IndexData&#039;s <a href="http://www.indexdata.com/zebra" rel="nofollow">Zebra</a> for several projects. Zebra is lightweight, stores the complete xml, but lets you retrieve the data in any format you can write an xslt stylesheet for, and provides an SRU interface. I use it as an XML database with a SRU interface. </p>
<p>So what I&#039;m thinking is,  I&#039;ll store my assets in a pairtree (I need to build the ingest using Ed and Ben&#039;s tools), and then point Zebra to the tree and let it index it (only looking for xml metadata files). I&#039;ll then create a public site (I&#039;m using Zope with xslt-methods) that contains a &#034;splash page&#034; for the collection, a query page (SRU) [or combine the two], and then write xslt to handle the results of the query. The public site, while serving the assets from the pairtree repository, is essentially independent of the asset store and vis-versa (just what I want). </p>
<p>I think I&#039;m going to create a pairtree site for each collection (good for humans) but harvest the metatdata into a central Zebra index. This way each collection can have it&#039;s own look and feel but the  collections can be federated for search. </p>
<p>The critical thing here is getting everything you need to present (but nothing more) into that metadata record. I&#039;ve been looking at <a>UNT&#039;s</a> model but I think you just have to do what&#039;s right for you and then convert to the standard models (DC, OAI-DC, LOM, etc.).  Is METs still the standard or do we use OAI-RE?</p>
<p>While I don&#039;t want to add too many smarts to a pairtree one thing that might be nice is if you could ask the pairtree for all it&#039;s meatadata files  in an Atom or OAI format. That way, like the current <a href="http://omeka.org/" rel="nofollow">Omeka</a>software that can display content from  a feed, you can present this data as needed.</p>
<p>FWIW (for what it&#039;s worth),</p>
<p>&#8211;Brian</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michael Giarlo</title>
		<link>http://lackoftalent.org/michael/blog/2009/09/27/exploring-curation-micro-services/comment-page-1/#comment-93731</link>
		<dc:creator>Michael Giarlo</dc:creator>
		<pubDate>Sat, 03 Oct 2009 02:47:11 +0000</pubDate>
		<guid isPermaLink="false">http://lackoftalent.org/michael/blog/?p=504#comment-93731</guid>
		<description>@Brian: You ask some good questions.  I&#039;d be eager to see the CDL folks&#039; response.  I understand they&#039;re working on a response, but are otherwise busy with a few local conferences for the next week or so.

Also, I don&#039;t think Pairtree makes any assumptions about your identifiers, so using semantic ids (or &quot;reasonable URLs&quot; as you put it :) ) should be possible.  You shouldn&#039;t need to use ARKs or opaque ids.  

The filesystem is slightly less opaque than a database with opaque identifiers because you can still visually scan your store of objects and see your identifiers surfaced in the directory listing, the theory being that filesystem tools will likely outlive particular DB servers or schemas.  

There are other specs for surfacing other information in the directory listing such as NAMASTe so that your object store is less opaque, but I&#039;m less convinced of the value of some of these other specs.  Still, it&#039;s a compelling pattern, and will be poking at all the specs to see how it all plays out.

I don&#039;t think you have to access only by identifier.  The great thing about the micro-services is that they are, at least in theory, decoupled.  So you can choose the ones that work for you and ignore the ones that don&#039;t.   What&#039;s neat about the specs is that there&#039;s some overlap, and you have multiple options for storing a &quot;digital object&quot; on the filesystem, so if you don&#039;t like identifier-based access and you do want versioning, you can choose to model objects in DFlat w/ ReDDs rather than Pairtree.  

Mix, match, rinse, lather, repeat!

Are there any areas in the CDL specs where you see an opportunity make good use of XML?

Thanks for writing, Brian.</description>
		<content:encoded><![CDATA[<p>@Brian: You ask some good questions.  I&#039;d be eager to see the CDL folks&#039; response.  I understand they&#039;re working on a response, but are otherwise busy with a few local conferences for the next week or so.</p>
<p>Also, I don&#039;t think Pairtree makes any assumptions about your identifiers, so using semantic ids (or &#034;reasonable URLs&#034; as you put it :) ) should be possible.  You shouldn&#039;t need to use ARKs or opaque ids.  </p>
<p>The filesystem is slightly less opaque than a database with opaque identifiers because you can still visually scan your store of objects and see your identifiers surfaced in the directory listing, the theory being that filesystem tools will likely outlive particular DB servers or schemas.  </p>
<p>There are other specs for surfacing other information in the directory listing such as NAMASTe so that your object store is less opaque, but I&#039;m less convinced of the value of some of these other specs.  Still, it&#039;s a compelling pattern, and will be poking at all the specs to see how it all plays out.</p>
<p>I don&#039;t think you have to access only by identifier.  The great thing about the micro-services is that they are, at least in theory, decoupled.  So you can choose the ones that work for you and ignore the ones that don&#039;t.   What&#039;s neat about the specs is that there&#039;s some overlap, and you have multiple options for storing a &#034;digital object&#034; on the filesystem, so if you don&#039;t like identifier-based access and you do want versioning, you can choose to model objects in DFlat w/ ReDDs rather than Pairtree.  </p>
<p>Mix, match, rinse, lather, repeat!</p>
<p>Are there any areas in the CDL specs where you see an opportunity make good use of XML?</p>
<p>Thanks for writing, Brian.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Brian Kennison</title>
		<link>http://lackoftalent.org/michael/blog/2009/09/27/exploring-curation-micro-services/comment-page-1/#comment-93714</link>
		<dc:creator>Brian Kennison</dc:creator>
		<pubDate>Tue, 29 Sep 2009 23:19:07 +0000</pubDate>
		<guid isPermaLink="false">http://lackoftalent.org/michael/blog/?p=504#comment-93714</guid>
		<description>Michael,

I&#039;ve been looking at this too.

I think most people know that your assets are going to outlive (hopefully) your software. I&#039;ve been running Zope, DSpace, TKL (IndexData), and Omeka sites among others, and while they are all good, I keep thinking &quot;what&#039;s going to happen when you reach the end of the road&quot;? To me this is why the CDL  things are so interesting.

I&#039;m just small time and the folks at CDL, along with your self and the co-workers at LOC, have a whole lot more experience than me,  so I don&#039;t always understand what&#039;s going on ;-)

Some of the questions I&#039;m looking at right now are:
* do you serve out of your DFlat&quot;? Why not? 
* If you think of the OASiS reference model is this the AIP? 
* do you build every app this way? 

I&#039;m thinking you have/build lots of &quot;feeder apps&quot; (think BibApp) that have their own models but support harvesting (OAI, ATOM, etc.) and then on the harvest you store in this model. On top of this file system model you can build other structures (RDB, XML, Fedora, what every suits your fancy, etc.) that support access to this store. 

I&#039;m having trouble with pairtrees and identifiers (ARK). If (big if) I serve out of this structure I want reasonable URLs (wcsu/library/archives/MS044/whatever.xml)  how do I use these numbers (010203945065)? OK, unless I make those numbers &quot;intelligent&quot; (01=college, 02=division,03=format, you get the idea) then the storage on the file system is just as opaque as if you used a database. I understand the need for unique identifiers but do we have to access &quot;only&quot; by identifier (alternative is to use a resolution service I guess but I don&#039;t know). 

It seems to me that the CDL folks don&#039;t like XML (probably for good reason) but I&#039;m leaning toward and XML AIP, store assets on the filesystem (accessible by humans), AND have the ability to serve from that store. 

We&#039;ll see....

--Brian</description>
		<content:encoded><![CDATA[<p>Michael,</p>
<p>I&#039;ve been looking at this too.</p>
<p>I think most people know that your assets are going to outlive (hopefully) your software. I&#039;ve been running Zope, DSpace, TKL (IndexData), and Omeka sites among others, and while they are all good, I keep thinking &#034;what&#039;s going to happen when you reach the end of the road&#034;? To me this is why the CDL  things are so interesting.</p>
<p>I&#039;m just small time and the folks at CDL, along with your self and the co-workers at LOC, have a whole lot more experience than me,  so I don&#039;t always understand what&#039;s going on ;-)</p>
<p>Some of the questions I&#039;m looking at right now are:<br />
* do you serve out of your DFlat&#034;? Why not?<br />
* If you think of the OASiS reference model is this the AIP?<br />
* do you build every app this way? </p>
<p>I&#039;m thinking you have/build lots of &#034;feeder apps&#034; (think BibApp) that have their own models but support harvesting (OAI, ATOM, etc.) and then on the harvest you store in this model. On top of this file system model you can build other structures (RDB, XML, Fedora, what every suits your fancy, etc.) that support access to this store. </p>
<p>I&#039;m having trouble with pairtrees and identifiers (ARK). If (big if) I serve out of this structure I want reasonable URLs (wcsu/library/archives/MS044/whatever.xml)  how do I use these numbers (010203945065)? OK, unless I make those numbers &#034;intelligent&#034; (01=college, 02=division,03=format, you get the idea) then the storage on the file system is just as opaque as if you used a database. I understand the need for unique identifiers but do we have to access &#034;only&#034; by identifier (alternative is to use a resolution service I guess but I don&#039;t know). </p>
<p>It seems to me that the CDL folks don&#039;t like XML (probably for good reason) but I&#039;m leaning toward and XML AIP, store assets on the filesystem (accessible by humans), AND have the ability to serve from that store. </p>
<p>We&#039;ll see&#8230;.</p>
<p>&#8211;Brian</p>
]]></content:encoded>
	</item>
</channel>
</rss>
