Command-line shuffle
Being a nerd, I tend to like the command-line. When I'm working on my laptop at home, I tend to like listening to music. Before I discovered that mplayer had a really convenient shuffle idiom, I would invoke it thusly (to listen to all my Pavement tracks in shuffle mode):
export IFS=$'\n' for track in $(find /mnt/upnp/MediaTomb/Audio/Artists/Pavement -name \*.mp3 | ~/bin/shuffle.py); do mplayer $track; done
And the wee shuffle script I whipped together looks like this:
#!/usr/bin/env python # shuffle.py import sys import random args = list(sys.stdin) random.shuffle(args) sys.stdout.writelines(args)
And here's the convenient shuffle idiom that renders my arg-shuffling script somewhat useless:
find /mnt/upnp/MediaTomb/Audio/Artists/Pavement -name \*.mp3 | mplayer -playlist - -shuffle -loop 0
Validating ORE from the Command-line
I've been periodically poking at getting Linked Data/RDF views hooked into the World Digital Library web application, following Ed Summers' lead from his work on Chronicling America. The RDF views also use the OAI-ORE vocabulary to express aggregations — in WDL, an item is an aggregation of its constituent files. The goal is to provide a semantically rich and holistic representation of a WDL item (identifier, constituent files, metadata, translations, and so on).
The ORE format is a new one for me so it's hard to say whether the output of my dev branch is valid ORE or not. Plus I'm a sucker for validators. Turns out Rob Sanderson has developed a Python library for validating ORE, and this little snippet is what I've been using to validate the ORE. I didn't put much effort into making it readable, so much as banging something functional out so I can meet deadlines, so mea culpa and all that. But without further hemming and hawing, the code:
# validate.py import sys from foresite import * rem = RdfLibParser().parse(ReMDocument(sys.argv[1])) aggr = rem.aggregation n3 = RdfLibSerializer('n3') rem2 = aggr.register_serialization(n3) print rem2.get_serialization(n3).data
Most of this code is naively copied and pasted from Rob's excellent Foresite documentation.
I invoke it thusly: python validate.py {URL}
And the output:
@prefix _27: <http://www.semanticdesktop.org/ontologies/nfo#>.
@prefix _28: <http://localhost/en/item/1/id#>.
@prefix _29: <http://localhost/en/item/1/>.
@prefix bibo: <http://purl.org/ontology/bibo/>.
@prefix dc: <http://purl.org/dc/elements/1.1/>.
@prefix dcterms: <http://purl.org/dc/terms/>.
@prefix ore: <http://www.openarchives.org/ore/terms/>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix rdfs1: <http://www.w3.org/2001/01/rdf-schema#>.
_28:ResourceMap a ore:ResourceMap;
dc:format "text/rdf+n3";
dcterms:created "2009-07-31T14:23:31Z";
dcterms:modified "2009-07-31T14:23:31Z";
ore:describes _29:id.
_29:id a bibo:Image,
ore:Aggregation;
dcterms:DDC "973";
dcterms:alternative "Antietam, Maryland. Allan Pinkerton, President Lincoln, and Major General John A. McClernand"@en;
dcterms:created "1862年10月3日"@zh,
"3 de octubre de 1862"@es,
"3 de outubro de 1862"@pt,
"3 octobre 1862"@fr,
"3 октября 1862 года"@ru,
"October 3, 1862"@en,
" ٣ آكتوبر، ١٨٦٢"@ar;
dcterms:creator "Gardner, Alexander"@en,
"Gardner, Alexander"@es,
"Gardner, Alexander"@fr,
"Gardner, Alexander"@pt,
"Гарднер, Александр"@ru,
"جاردنر, أليكسندر"@ar,
"加德纳, 亚历山大"@zh;
... (and so on and so forth)
dcterms:title "Antietam, Maryland. Allan Pinkerton, President Lincoln, and Major General John A. McClernand: Another View"@en,
"Antietam, Maryland. Allan Pinkerton, el Presidente Lincoln y el General Principal John A. McClernand: Otra visión"@es,
"Antietam, Maryland. Allan Pinkerton, le président Lincoln et le général-major John A. McClernand: Autre vue"@fr,
"Antietam, Maryland. Allan Pinkerton, Presidente Lincoln e Major-General John A. McClernand: Outra Vista"@pt,
"Антитэм, штат Мэриленд. Аллан Пинкертон, президент Линкольн и генерал-майор Джон А. Макклернанд: Другой снимок"@ru,
"أنتينام، ميريلاند ألان بينكرتون، الرئيس لينكولن، واللواء جون أ. ماكليرناند: منظر آخر"@ar,
"安蒂特姆,马里兰州 艾伦·平克顿、林肯总统和少将约翰·A ·马克克拉南: 另一个视角"@zh;
ore:aggregates <http://localhost/static/c/1/reference/04326u_thumb_item.gif>,
<http://localhost/static/c/1/service/04326u.tif>;
ore:isDescribedBy <http://localhost/en/item/1/item.rdf>;
rdfs:seeAlso <http://hdl.loc.gov/loc.wdl/dlc.1>.
<http://localhost/static/c/1/reference/04326u_thumb_item.gif> a _27:FileDataObject;
dcterms:format "image/gif";
_27:fileSize "34531"^^<http://www.w3.org/2001/XMLSchema#long>.
<http://localhost/static/c/1/service/04326u.tif> a _27:FileDataObject;
dcterms:format "image/tiff";
_27:fileSize "1301614"^^<http://www.w3.org/2001/XMLSchema#long>.
ore:Aggregation rdfs1:isDefinedBy <http://www.openarchives.org/ore/terms/>;
rdfs1:label "Aggregation".
ore:ResourceMap rdfs1:isDefinedBy <http://www.openarchives.org/ore/terms/>;
rdfs1:label "ResourceMap".You might pick up on some warts I have yet to fix, but there you go.
WDL metadata mapping, and, parsing TEI in Python
Context
Early on in the effort to develop the first public version of the World Digital Library web application, we developed a (non-public) Django-based cataloging application where Library of Congress catalogers could manage metadata for WDL items. Management in this sense includes creation of records, editing of records, versioning of edits, mapping of source records, and some light workflow for assignment of records to individual catalogers and for hooking into translation processes[1].
I worked primarily on the source record mapping tools. They take a number of formats as input and are called by the cataloging application to map metadata from these formats into the WDL domain model. Several though not all of which are XML-based, and thus easily dealt with in Python, via the etree module in the lxml package.
Dan recently kicked off a new R&D project for evaluating (any) metadata against any number of metadata profiles, mapping into a generic data dictionary, the goal being to determine how feasible it would be to develop a toolset for aiding remediation of metadata across any number of digital collections. I have been working on this project with Dan, and got started by seeing how generalizable the WDL metadata mapping tools are. Turns out they're fairly generalizable once you tweak the various format-specific mapping rules to map into the generic data dictionary model rather than the WDL model (around 15 elements, and somewhere between Dublin Core and MODS in terms of specificity but flatly structured like DC).
Some of the test data I am working with now, that has nothing to do with WDL, is SGML-based TEI 2 markup. The closest I worked with on WDL was TEI P5 for manuscript description which is serialized in XML. Turns out my TEI mapping rules from before blew up on this TEI 2 stuff, as lxml.etree (naturally) wasn't digging the non-XML input. I googled around a bit for how best to parse TEI (or any SGML) in Python and then discovered it's actually simple as pie.
Code
If you've got the BeautifulSoup module installed[2]:
>>> from BeautifulSoup import BeautifulSoup >>> tei = open('foo.sgm').read() >>> BeautifulSoup(tei).findAll('title')[0].string u'[Memorandum to Dr. Botkin]: a machine readable transcription.'
If not, the lxml.html module works too:
>>> from lxml import html >>> h = html.parse(open('foo.sgm')) >>> h.xpath('//title')[0].text '[Memorandum to Dr. Botkin]: a machine readable transcription.'
Data
And here's what the sample data looks like:
<!doctype tei2 public "-//Library of Congress - Historical Collections (American Memory)//DTD ammem.dtd//EN" [ <!entity % images system "07010101.ent"> %images; ]> <tei2> <teiheader type="text" date.created="1994/03/15" date.updated="2002/04/05" status="updated" creator="National Digital Library Program , Library of Congress"> <filedesc> <titlestmt> <amid type="aggitemid">wpa0-07010101</amid> <title>[Memorandum to Dr. Botkin]: a machine readable transcription.</title> <amcol><amcolname>Life Histories from the Folklore Project, WPA Federal Writers' Project, 1936-1940; American Memory, Library of Congress.</amcolname><amcolid type="aggid"></amcolid> </amcol> <respstmt> <resp>Selected and converted.</resp> <name>American Memory, Library of Congress.</name> </respstmt></titlestmt> <publicationstmt> <p>Washington, DC, 1994.</p> <p>Preceding element provides place and date of transcription only.</p> <p>For more information about this text and this American Memory collection, refer to accompanying matter.</p> </publicationstmt> <sourcedesc> <lccn></lccn> <sourcecol>U.S. Work Projects Administration, Federal Writers' Project (Folklore Project, Life Histories, 1936-39); Manuscript Division, Library of Congress.</sourcecol> <copyright>Copyright status not determined; refer to accompanying matter.</copyright></sourcedesc> </filedesc> <encodingdesc> <projectdesc><p>The National Digital Library Program at the Library of Congress makes digitized historical materials available for education and scholarship.</p></projectdesc> <editorialdecl><p>This transcription is intended to have an accuracy of 99.95 percent or greater and is not intended to reproduce the appearance of the original work. The accompanying images provide a facsimile of this work and represent the appearance of the original.</p></editorialdecl> <encodingdate>1994/03/15</encodingdate> <revdate>2002/04/05</revdate> </encodingdesc> </teiheader> <text type="manuscript"> <body> <div> <pageinfo> <controlpgno entity="I07010101">0001</controlpgno> <printpgno></printpgno></pageinfo> <p>Memorandum to Dr. Botkin from G. B. Roberts, May 26, 1941</p> <p>Subject: Alabama Material</p> <p>This material has not yet been accessioned and has only <del rend="overstrike">beeen</del> been roughly classified as life histories, folklore, and miscellaneous data and copy save in the case of the 2 ex-slave items and the essay on Jesse Owens, each of which was recommended.</p> <p>Total no. of items recommended: 3 (14 pp.) <handwritten>In progress</handwritten></p></div></body></text></tei2>
Notes
- Catalogers cataloged stuff in the English language, but every metadata record needed to be translated into the other six U.N. languages: Spanish, Russian, French, Arabic, Chinese, and Portuguese. [↩]
- And you are but one
sudo easy_install BeautifulSoupaway from that. [↩]
JSON and the Blarghonauts, or, Firefox and Pretty-Printing FAIL
There must be a better way of viewing pretty-printed JSON from Firefox than this. (EDIT: Hail, JSONovich!)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | #!/usr/bin/env python # ~/bin/jsonhandler.py # Take some JSON from a file or stdin, format it, output to a tempfile, # open in EDITOR from __future__ import with_statement import os import sys import simplejson import tempfile EDITOR = "/usr/bin/gedit" if __name__ == "__main__": if len(sys.argv) == 2: # if invoked as jsonhandler.py {FILE} json = open(sys.argv[1]) else: # if JSON is piped in (e.g., from Firefox, # or cat {FILE} | jsonhandler.py) json = sys.stdin json = simplejson.load(json) # the with_statement is kind of gratuitous but I like it with open(tempfile.mktemp('.json'), 'w') as jsonfile: simplejson.dump(json, jsonfile, indent=4) # all of that and gedit doesn't even highlight JSON # I have emacs highlighting JSON but this generates a "stdin is not # a tty" error, so EDITOR is not set to emacs # xemacs works a little better, but I need to click: # "Options > Syntax Highlighting > In this buffer" every time, despite # saving to custom.el, so EDITOR is not set to xemacs # Very annoying! os.system("%s %s" % (EDITOR, jsonfile.name)) |
And then I set ~/bin/jsonhandler.py as the action for application/json in Edit | Preferences | Applications.
Yuck. Help?
Convert Windows shortcuts into Ubuntu shortcuts
[Update: Feel free to grab the code via bzr with bzr branch http://lackoftalent.org/bzr/shortcut_converter.]
Here's another entry in the "dumb little scripts that work for me and may or may not be helpful to other folks" department…
I use both Windows and Ubuntu at home, gradually transitioning from the former to the latter. I've accumulated a bunch of Windows URL shortcuts, mostly things I wanted to read once so instead of bookmarking them, I dragged their links to my desktop. This creates .URL files which are simple little plain-text two-liners. It turns out that on Ubuntu, and probably similar *nix systems, web shortcuts are also simple little plain-text files. These files have the .desktop extension (though you won't see the extension by looking at the desktop).
I wanted a way to convert my .URL files to .desktop files so that I can just toss them on my Ubuntu desktop and double-click them the same way I would if I were on Windows. This cruddy little Python script does the trick.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | #!/usr/bin/env python # shortcut_converter.py from __future__ import with_statement import os.path import sys TEMPLATE = """[Desktop Entry] Version=1.0 Encoding=UTF-8 Name=%(basename)s Type=Link URL=%(url)s Icon=gnome-fs-bookmark """ def convert(f): """ Takes a full filepath to a .URL file, converts it to a .desktop file in the same directory """ print "Converting %s" % f (filepath, filename) = os.path.split(f) (basename, extension) = os.path.splitext(filename) with open(f) as urlfile: lines = [line.strip() for line in urlfile.readlines()] url = lines[1].split('URL=')[1] dtfname = os.path.join(filepath, '%s.desktop' % basename) with open(dtfname, 'w') as dtfile: print "Writing %s" % dtfile.name dtfile.write(TEMPLATE % locals()) if __name__ == '__main__': for arg in sys.argv[1:]: if os.path.isfile(arg) and arg[-3:].lower() == 'url': convert(arg) else: print "*** %s is not a URL file" % arg |
I used scp to pull over all my .URL files and then invoked the script thusly:
python shortcut_converter.py *.URL
worksforme!
