Microsoft has the Power(set)
Powerset's Sr. Product Manager writes:
We’re excited to announce officially that Microsoft has signed an agreement to acquire Powerset.
…
With any startup, the challenge is to take the seeds of an idea and grow it into a viable company. At Powerset, we transformed our idea into a world-class semantic search platform, demonstrating the future of search with our Wikipedia search experience. But building a large-scale semantic search engine is expensive, requiring an engineering effort and computing resources beyond what most start-ups could ever imagine. Because our goals around improving search align so well, Powerset has decided to team up with Microsoft. We believe that this is the fastest way to bring our technology to market at a large scale.
Read more on Microsoft's Live Search blog.
It's not surprising to see Microsoft gobble up a company that has strived to be a Google-killer from its inception. It will be interesting to watch Microsoft continue battling Google and to see how this latest acquisition comes into play.
(Maybe I gave up on compling too soon, eh?)
Visionary visualization… from Microsoft?
Maybe I'm late to the party — nods to John Blyberg and Rob Styles — but damn(!), does Microsoft have some exciting visualization projects or what?
John and Rob wrote about Microsoft Surface, a hardware/software combination that allows for tactile manipulation of data. In Microsoft's own words:
Microsoft Surface represents a fundamental change in the way we interact with digital content.
With Surface, we can actually grab data with our hands, and move information between objects with natural gestures and touch.
Surface features a 30-inch tabletop display whose unique abilities allow for several people to work independently or simultaneously. All with out using a mouse or a keyboard.
Don't take my word for it; go watch a demonstration video and be amazed.
But wait, open the video in another window, and keep reading before you lose interest in my uninteresting prose.
I noticed among the recently released TED talks a brilliant short presentation by Microsoft's Blaise Aguera y Arcas on their Photosynth project. My first impression was just the visual joy of the photo browse / pan / zoom interface. It's impressive in its own right. But what really tickles me about Photosynth is that it "takes a large collection of photos of a place or an object, analyzes them for similarities, and displays them in a reconstructed three-dimensional space."
For example, throw it at http://www.flickr.com/photos/tags/seattlepubliclibrary/ and it will construct a collage-like view of the Seattle Public Library. Zoom in and you'll see images at that level of zoom. Pan in three dimensions and (assuming there are enough photos to support the various views) you can virtually be in front of SPL. See for yourself:
Imagine browsing the stacks with this thing. Imagine finding a book on the stacks and being able to hook directly into its full-text. Super cool!
Microsoft Surface is tentatively planned for a November release, which will be targeted for "retail and entertainment settings". It could be available to the public in a few years' time though it will likely cost a few times more than an average PC. [Gleaned from a Seattle PI article.] Photosynth is also not available yet, but you can track its progress on the Photosynth team blog. (Yes, there's a feed.)
[Disclosure: I lived in the shadow of Redmond not long ago, though I was not employed by Microsoft. I probably invest in Microsoft indirectly, but I haven't scoured my portfolio distributions in a while.]
L'informatique est morte. Vive l'informatique!
Is the discipline of computer science on its last legs? Neil McBride, a lecturer in the School of Computing at De Montfort University, advocates for great change in The death of computing. Citing greatly reduced CS enrollment figures in the UK, US, and Australia, and the growing disconnect between industry and academia, McBride argues that computer scientists need to reform the field from within or risk further marginalization and ultimate irrelevance. Though I can sympathize with his desire to revitalize the discipline, and I understand how perception of computer science might have suffered from the dot-com bust, his message amounts to more than mere doomsaying and pointless nostalgia.
If it is true that computer scientists "look to games programming for [their] salvation," there is a great opportunity being missed. There are other options — exciting, pragmatic, and revolutionary options — computer scientists might investigate if they believe a wholesale rededication of their skills is needed for the betterment of their field. (To be sure, some already have begun this great work.) As a former student of information science and computational linguistics, I'm here as an interested observer to say that your skills are needed if we are to accomplish some of our loftiest goals. I humbly submit the following areas that could use your help:
- Information retrieval: Build smarter, faster algorithms for finding and organizing information. Instead of building a better bubble sort, figure out better ways to access and relate disparate bits of information. The Google guys made a couple bucks at this; why not cite their success, and point at the meteoric rise of Google, as evidence of the continuing and growing sexiness of computer science?
- Semantic web: Bring your knowledge to bear upon the growing semantic web discussion. If you could think up distributed computing, perhaps the challenge of distributed networks of semantically encoded data is ready for your insight.
- Natural language processing: Be the Google-killer by being the first to market with a usable natural language search tool. There is much research in NLP, but very little of it seems ready for end-users. Help make the keyword a thing of the past. Computational linguists would love to cultivate interdisciplinary connections with you folks.
Although McBride's article may fade into the background of the very frequent, if strident, cries that CS is dead, I am hopeful that interdisciplinary ties between computer science, information science, library studies, and linguistics will bring about practical innovation, not to mention a renewed sense of relevance and excitement for computer scientists.
OCLC report on students' perceptions of libraries
From http://www.oclc.org/reports/perceptionscollege.htm –
College Students’ Perceptions of Libraries and Information Resources examines the information-seeking habits and preferences of international college students. This report is a companion piece to the December 2005 OCLC Perceptions of Libraries and Information Resources report.
The 396 college students who participated in the survey range in age from 15 to 57 and are either undergraduate or graduate students. The college students were from all of the six countries included in the survey (Australia, Canada, India, Singapore, United Kingdom and the United States). Responses from U.S. 14- to 17-year-old participants have also been included to provide contrast and comparison with the college students, as these young people are potential college attendees.
With all-new graphs and additional analysis of how college student data compare to that of total respondents, this report is a subset of the original Perceptions report and provides findings from the online survey in an effort to learn more about:
- Library use
- Awareness and use of library electronic resources
- The Internet search engine, the library and the librarian
- Free vs. for-fee information
- The “Library” brand
This report looks at these questions from the point-of-view of college students and 14- to 17-year-olds. In the original study, we found that college students are more aware of and use libraries’ information resources more than other survey respondents. In addition, the more educated the respondents, the more they continue to use libraries after graduation. Awareness does not always translate into high usage.
Overall, respondents have positive, if outdated, views of the “Library.” Younger respondents—teenagers and young adults—do not express positive associations as frequently. These findings, and more, are valuable insights for anyone seeking to know more about the library usage and perceptions of college students and young people.
This subset of the original Perceptions report is appropriate for provosts, deans and academic library administration. Read the report online or order a print copy using the links at the right, then use our feedback form to tell us what you think.
I'm eager to print this puppy out and read through it. It may be eye-opening to read about what users actually think rather than the typically confident proclamations about users needs in the off-the-cuff, hand-wavy, evidence-bankrupt way that many (most?) tend toward. I'm as guilty as the next guy or gal, but that doesn't make it any more acceptable.
A comparative analysis of keyword extraction techniques [excerpt]
With widespread digitization of printed materials and steady growth of "born-digital" resources, there arise certain questions about access and discoverability. One such question is whether the full-text of this content, produced by advanced optical character recognition (OCR) techniques, is sufficient as a descriptor of the content. Will the model of mass digitization and full-text searching enable users to find the information they need? Or will we need to continue employing the classification skills of highly qualified human beings in order to ensure information is discoverable? The latter model seems to have worked well for the library community, with trained indexers and catalogers summarizing documents according to established standards and widely used thesauri or controlled vocabularies. The predictability of these techniques has some obvious benefits, such as consistency across different systems, the ability to construct browse interfaces in addition to search ones, and reduction of common errors such as differences in case, punctuation, spelling, and so forth. The process of human classification has thus proven to be quite effective in our endeavors to organize information.
The question of whether we will continue to classify digital content in a similar manner ought to be asked. Is there any hope to keep up with the dizzying pace with which documents are digitized? Classification is a costly, time-consuming process, requiring highly trained individuals to consume a large amount of information and summarize it. If the goal is to continue digitizing and making accessible information at the current rate, it is improbable that human catalogers and indexers will be able to keep up without sacrificing some of the quality that results from their considerable skills. Yet the goal of enhancing access and discoverability of digital content is one that ought to be pursued, and will likely not be realized through full-text searching alone. Indeed, why should we put so much time and effort into the process of digitization if it does not benefit our users?
Fortunately, the process of automatic extraction of keywords is one that has received much attention. As implied by the phrase, automatic keyword extraction is a process by which representative terms are systematically extracted from a text with either minimal or no human intervention, depending on the model. The goal of automatic extraction is to apply the power and speed of computation to the problems of access and discoverability, adding value to information organization and retrieval without the significant costs and drawbacks associated with human indexers. Research is taking place in numerous fields across the globe, and there is no clear frontrunner among the technologies and algorithms. This paper explores five approaches to keyword extraction, as presented in research papers, to demonstrate the different ways keywords may be extracted, to reflect commonalities between the approaches, and to evaluate the results thereof. Each paper is presented in a different section, for ease of organization.
… Read the paper in its entirety.
