Who’s afraid of the Wolfram search?

I might be.

The Wolfram|Alpha “computational knowledge engine” has been generating buzz for some time, especially since Stephen Wolfram, its eccentric progenitor, announced that it would be going live in mid-May. Expect the twittering to reach a crescendo.

Since the Wolfram|Alpha (WA, let’s say) promises to answer questions typed into a simple text box, it’s being described in the press as a Google-killer. The idea, in an alpha nutshell, is that WA interprets a natural language query and then combs through a gigantic pile of databases, both public and licensed, in order to respond with an answer — rather than Google’s list of web pages that may or may not contain an answer.

Wolfram recently gave a demonstration of WA at Harvard’s Berkman Center. The whole presentation is posted, but you can get a quicker sense of what WA aims to do in this surprisingly murky collection of screenshots:

From this demo and other the-Wolfram-is-coming reviews blooming like tremulous flowers in the rain, WA looks to be a fancy calculator, an atlas on steroids, a deft collator of visualized data.

But is it more than that? Beyond looking up and presenting information, will it give us genuine and new answers? Will it represent a significant push beyond Google’s suddenly modest ambition to “organize the world’s information and make it universally accessible and useful”?

Wolfram himself seems to think so:

…what about all the actual knowledge that we as humans have accumulated?

A lot of it is now on the web—in billions of pages of text. And with search engines, we can very efficiently search for specific terms and phrases in that text.

But we can’t compute from that. And in effect, we can only answer questions that have been literally asked before. We can look things up, but we can’t figure anything new out.

So how can we deal with that? Well, some people have thought the way forward must be to somehow automatically understand the natural language that exists on the web. Perhaps getting the web semantically tagged to make that easier.

… I realized there’s another way: explicitly implement methods and models, as algorithms, and explicitly curate all data so that it is immediately computable.

Wolfram is know for making audacious claims about the power of computation; his massive boiling down of all complexity into relatively simple mathematical rules, A New Kind of Science, was a ‘surprise best seller’ on Amazon even though Wolfram posts all of it for free. The promise of a simple handle on an immensely complex world–frothing up into a good dose of post-religious hype–is irresistible. It’s quite congruent, when you think about it, to Google’s keyword-search doorway to the infinite.

But Google is best used to locate information, not to solve problems. Sure, if you type into its search field “square root of 81″ it will offer you a quick answer atop the usual pagerank results. Google has dabbled, in fact, with calculator functions. This slippage between search and calculation, though, is what alarms me.

A pernicious information illiteracy takes root — the world of clear ascription of responsibility suffers another blow — anytime someone starts assigning oracular power to the Google search algorithm. “It says [fill in information claim here].” I’ve seen college students actually cite a Google search in research–not research on Google search, mind you, but research on a subject informed by something that the search dug up one night. Who wrote and published the data is unimportant: in the middle of that dreary night, “It says….”

At an extreme point, we reach the absurdity of Carol Beer in Little Britain, overriding every thought and instinct as she dabbles on the keyboard and announces, after desultory searches, “Computer says no…”

Of course any decent web calculator will draw on good data, and won’t be nearly as mechanistic or useless or funny as Carol. But even an amazing one — and WA promises to be amazing — shouldn’t be confused with actual intelligence; assembling and synthesizing only gets you so far. One of WA’s biggest cheerleaders, Twine founder Nova Spivack, makes a similar point:

Wolfram Alpha, at its heart is quite different from a brute force statistical search engine like Google. And it is not going to replace Google — it is not a general search engine: You would probably not use Wolfram Alpha to shop for a new car, find blog posts about a topic, or to choose a resort for your honeymoon. It is not a system that will understand the nuances of what you consider to be the perfect romantic getaway, for example — there is still no substitute for manual human-guided search for that. Where it appears to excel is when you want facts about something, or when you need to compute a factual answer to some set of questions about factual data.

Spivack’s distinction between (WA’s) computation and (Google’s) look-up is helpful, as is his concession that WA, as elegantly structured as it may be, will only be useful in presenting and recombining known facts. Wolfram himself, no stranger to hyperbole, may wish to characterize WA as generating new knowledge. But until it develops algorithms for context, nuance, interpretation, influence, critique, seriousness, incoherence–until it embraces all of human expression, in all of its messiness–it will never offer sufficient answers to questions more debatable than “What was the average rainfall in Boston last year?”–just as Wikipedia cannot extend beyond professed neutrality.

So my fear of WA, knowing little about how it actually will work and feel, is that it will offer a fancy dashboard of pseudo-expertise, subtly diverting human inquiry into what’s pre-known. This seems an old fear, a fear of robots, and maybe, like many old human fears, it will melt away in the light of new threats.

In any case, by WA seems poised to offer a counterpoint to the semantic web, a different model of bringing structure to information to make search more responsive to the questions we ask. The road is strewn with various ‘natural language’ search disappointments — Ask Jeeves was deaf, Powerset seems blind to all but Wikipedia — and there’s reason to hope that Wolfram’s interpretation of natural language will be smarter, that it will process our questions and deliver them to large and various datasets. If it then answers authoritatively, though — caveat emptor.

Objects in mirror are closer than they –

The occasion of a little makeover for good old Clayfox (thanks Jai in New Delhi!) has me thinking back over all its incarnations, most of which have been slightly hideous. Without WordPress and its myriad of free themes, I hate to think of the garish rags that might be tricking out these musings.

The maturation of the web means that those of us who have no business attempting layouts, who agonize endlessly over colors and fonts, who last stumbled around CSS (and last opened Dreamweaver) sometime back in the first Bush II era — well, we can grab our look and feel from the rack and save our energies for, I don’t know, wondering if connectivity is impoverishing.

You may not care for this current incarnation — you may find it distracting or commercial-feeling (yet not a single thing to buy!) — but I like how it surfaces a little more of the content piled up around here. I’m also a little intrigued by the view/popular metrics, all of which started from scratch after the May Day theme switchover. It’s been my firm belief that only a select few check in with this site; now I’ll get a sense of what those few are looking at without bothering with the likes of Google Analytics.

Since nothing is quite as self-indulgent as a blogger blogging about his blog, indulge me further, rare and wonderful reader, in a little amble through the Wayback….

***

Clayfox 2005-recently– For its second outing as a blog (the first was a very brief and forgettable foray in the late ’90s), Clayfox embraced WordPress and adopted a theme called VeryPlainText that kept things, well, somewhat clean. The author of VeryPlainText graciously tweaked his code in response to my request that my “pages” could be commented upon, just like “posts.” We had a little conversation about whether “pages” were meant to be static & impervious to comments — and I saw his point — yet the Kapaga page had to register carping & complaints. The “CLAYFOX” header was generated dynamically from Flickr images tagged with their respective letters — an effect that seemed quite clever, 2.0, variety-inducing, and colorful on top of the veryplainness. Then the javascript that I swiped for this stopped working, so the letter images became static and predictable. Anyway, say hello to a Clayfox that is no more:

***

Clayfox 2004-5– Making up for previous wretched excesses (see below), I was going for a clean look in the last days of hand-coding the whole site. A fritzed-out fox carried over earlier iconography, but otherwise this was demur signaling indeed:

***

Clayfox 2002-3–Oh the Wayback Machine is pitiless; even if it can’t quite capture every tiled iteration of gradient, it still grabs enough of the Clayfox home page at this awkward stage to recall its crazy insouciance, its Fireworks firewords. Streaks evoke an even earlier atrocity, the months when the home page actually had snowflakes trickling across it.

***

Clayfox 1998-2000–And finally on our nostalgia tour, we see a little infant site that really didn’t have a home page to speak of, just a series of handmade course webpages, hand-coded. We see electric blue text against a darker blue background, oh yes. I was actually proud of the fox/navigation in the header: like browser buttons, you see, except they were in the webpage! Each one had to be linked to a ‘next’ and ‘back’ page.

***

I think we can agree that the years between 1998 and now have been kind to Clayfox, or at least have helped make it into something more presentable. The design sins you see before you in this look back persist in some fashion, doubtlessly, on the site. Clayfox wouldn’t be itself, somehow, without some awkward badinage of simplicity, flashiness, and underengaged interactive widgets. There’s strange fun in all that–I can’t explain it to myself, but the site has been intermittently compelling enough to keep alive all these years. Just wait until it hits puberty.

Time rendered moot

Are you partial to absurd lists? So is Time Magazine! This bastion of old media has been developing a “World’s Most Influential” franchise over the past few years, addressing or cultivating some mysterious need to rank Vladimir Putin against Miley Cyrus on a fuzzy scale of “influence.” You can watch a Time editor fumble for a rationale for the whole enterprise, but really why bother.

This year’s list does pack a punch though, even if it makes a complete hash of Time’s list fetish. Time threw the list open to online readers with a poll that got relentlessly, ingeniously hacked. Despite Time’s best efforts, a person called “moot” ended up topping the poll as the world’s most influential person, heading a list that defined and maintained across days of voting a mysterious acrostic: “Marblecake also the game.” This phrase means something to tittering hackers clustered around a bulletin board called 4chan.

Unable to run a real poll online, Time is now trying to laugh the whole thing off: “To put the magnitude of the upset in perspective, it’s worth noting that everyone moot beat out actually has a job.” Be that as it may, it’s worth further noting that “everyone moot beat out” was deliberately positioned on the list by “moot,” who did a fine job, actually, of endangering the jobs of hapless Time employees.

Of particular interest in this embarrassment is the testing of reCAPTCHA, the defense against spam comment submission once used by this website & still in use all over the web, including at Time’s ill-fated poll. The blog Music Machinery has been tracking Time’s losing struggle to shore up their poll against a flood of bogus submissions, and has a particularly detailed rundown of hackers’ manipulations of ReCAPTCHA.

As I described a while ago, reCAPTCHA provides two words for a person to recognize and type: an image of a ‘control’ word that been identified by consensus, along with another image of an ‘unknown’ word. It’s a clever way to check if a captcha interpreter is trustworthy and then apply her interpretation to an ‘unknown’ word — and actually harness a comment/poll submission utility for text digitization projects.

In this instance, according to Music Machinery, the hackers tried to distinguish the ‘control’ word and match that, then flood reCAPTCHA with fake interpretations of the ‘unknown’ word (every ‘unknown’ word was interpreted as ‘penis,’ heh heh), creating a bogus consensus around ‘unknown’ words that would turn them into zombie ‘control’ words. An overwhelmed and standardized control, in turn, would facilitate autovoting.

In the end, again according to the Music Machinery narrative, all this business of distinguishing control words in reCAPTCHAs was enough of a speed bump that the hackers resorted to “brute force”: ie, interpreting both reCAPTCHA words and voting as frantically as they could by hand, with the help of some basic productivity utilities. This took a grimly dedicated team of devoted voters interpreting two reCAPTCHAs and casting votes over 200 times per hour per minute, for 40 or more hours while the poll was still open.

So what are we left with? Time embarrassed, reCAPTCHA tested, and a real contest, after all, for influence.

Xciting connections

In the perfect world we never seem to live in, migration of scholarship to the web would mean endlessly networked citations. It would mean new metrics for gauging the impact of any given publication, substantiating tenure/promotion and grant proposals with hard evidence. It would give us new tools to map the interplay of research in an interdisciplinary age. Machines would be prosthetic connectors of our truest thoughts.

Citation mapping is a step towards this promise. Academics have been diligently appending to their research footnotes and endnotes of attributions all along; the hooks are there, all we need to do is link them up. Easier said than done, of course, as the Tower of Babylon still smolders. Citation formats and database structures vary; the semantic web is under construction; too often software used to generate citations (MS Office, Endnote, Zotero & the like) is disconnected from the end version of an article, meaning that the article has to be OCR’d and citations re-interpreted. For these and other reasons, as this recent D-Lib article enumerating problems with citation counts points out, “the rates of citation data accuracy and completeness are not precise enough to make fair assessments.”

That’s not stopping efforts to corral citations into paths of discovery, and as usual the science data managers are out in front. Thompson Reuter’s Web of Science, in particular, has been innovating bibliometric analysis and visualization; its Citation Mapping Tool debuted last summer. The tool ‘maps’ articles into generations, allowing you to travel back and forth between cited and citing. Here’s a visualization of how one article cites others:

As this review notes, the tool is far from exhaustive, thanks to database quirks and variation of records across journals. Exporting a citation map is underwhelming at present: you can download it as a flat image, but there is no way to harvest the data into data management. The tool presents some color coding options, so you can sort out ‘types’ of references, but designation of these codes again relies on consistency across fields that cannot be taken for granted.

But perhaps the biggest drawback to this or any version of simple citation mapping is its inability to reflect conceptual relationships. Citations, after all, are made to a variety of sources for a variety of reasons, not all of them equally germane to what an article is about. An article may cite something it’s refuting, or may be cluttered with window-dressing references, or may go out of its way to cite the work of mentors or colleagues more out of a sense of politesse than necessity. Until this variation of citation quality is somehow addressed, along with improved metadata standardization and database interoperation, it seems doubtful that citation mapping can, in the words of the WOS mapping reviewer, “represent, and make access to, the historical progress of human inquiry, including its interdisciplinary aspects.”

***

Time to take another tack? As a recent NYT summary noted, data scientists at Los Alamos have come up with a new mapping of the connections between various disciplines. These connections are charted by tracking logs of click-throughs by researchers moving between journals. The project, detailed in PLoS, is seeking a more accurate way to measure and represent research interconnections than the more traditional citation mapping.

The PLoS report lists advantages of clickstream data: it is immediate information (versus the years that citation data can take to fall into place), it is based on private and actual navigation activity (versus the various motives for citation mentioned above). The report also notes a drawback to relying on clickstreams: “User interactions with scholarly web portals are shaped by many constraints, including citation links, search engine results, and user interface features.” It’s the same infrastructure problem haunting citation mapping.

In any case, the map of click-through connections is quite fun to look at – it’s color-coded by discipline. Humanities sort out to the middle, which is good and proper. Behold what the PLoS authors call a “first-ever glimpse of this terra incognita”:

Who would not sing for Lycidas?

It’s late January, another semester is gearing up, and yet once more I’m preparing another round of Lit Hum — must be time for Stanley Fish to say something risible about the humanities.

Last year around this time, Fish reveled in the inutility of it all: “To the question “of what use are the humanities?”, the only honest answer is none whatsoever. ”
In a NY Times blog post published today (“The Last Professor”) he declares, “Except in a few private wealthy universities (functioning almost as museums), the splendid and supported irrelevance of humanist inquiry for its own sake is already a thing of the past.”

Universities, you see, are now dominated by a “business model” that has irreversibly devalued the life of the mind:

The best evidence for this is the shrinking number of tenured and tenure-track faculty and the corresponding rise of adjuncts, part-timers more akin to itinerant workers than to embedded professionals. In this latter model , the mode of delivery – a disc, a computer screen, a video hook-up – doesn’t matter so long as delivery occurs. Insofar as there are real-life faculty in the picture, their credentials and publications (if they have any) are beside the point, for they are just “delivery people.”

And they’re “delivering” to students who could care less about the humanistic tradition; they’re clocking time, really just wanting “information and skills necessary to gain employment,” thankyouverymuch.

The devaluation in Fish’s latest post of students, “itinerant workers,” technology, “delivery people,” even museums — all this is too execrable to merit much debate, though we could generously posit that debate is what Fish wants. (For a more trenchant indictment of university “business models” I suggest Marc Bousquet’s 2002 The ‘Informal Economy’ of the Information University). It’s probably a waste of time to dwell on Fish’s mugging for the NYT, a late-career prance undaunted by flops (his 2007 screed against Starbucks was plausibly recognized by Ron Rosenbaum as the worst op-ed ever).

What pushes Fish’s recent fulmination past annoying and into painful, though, is the post’s conclusion:

People sometimes believe that they were born too late or too early…. I feel that I have timed it just right, for it seems that I have had a career that would not have been available to me had I entered the world 50 years later. Just lucky, I guess.

Lucky to have had a powerhouse career, and so lucky to be coming to an end of it just as, generally, the “life of the mind” has left the building. If Fish is representative of a mode of academic privilege — not just tenured, but superstar professor/critic/administrator blazing through several universities — then he’s embarrassing more than himself. What is it about his lucky career that makes him so future-indifferent? There’s no elegy, even, just a smug old man farting.

***

Fish’s career continues to be much discussed. I suspect he’ll be remembered less for what he thought than what he did — stocking Duke University’s English department with itinerant (that word again) superstars. As this Lingua Franca post-mortem outlines, outside evaluators of the Fish Duke fiefdom cut through the glitter to find a department “without anything we would be disposed to describe as an undergraduate or a graduate curriculum.” A similar indifference to actual pedagogy runs through Fish’s later comments-catching announcements of the death of the humanities.

When as a tender young grad student I took up Fish’s Is There a Text in this Class I was drawn in — but even then something didn’t seem right. What sticks in my memory after all these years is Fish’s reading of John Milton’s Lycidas, particularly the lines,

He must not float upon his wat’ry bier
Unwept….
(13-14)

Fish wanted to pay attention to reader response — an exciting emphasis for me at the time, New Critical scales falling from my eyes. Could a poem really depend on its relationship with me? Yet Fish’s depiction of the “reader’s experience” came to seem, well, forced. Apparently the “reader” comes to the end of line 13 expecting “perceptual closure”: that poor drowned shepherd Lycidas just can’t be left floating out there in the water; according to Fish, “there is now an expectation that something will be done about this unfortunate situation, and the reader anticipates a call to action, perhaps even a program for the undertaking of a rescue mission.”

Then, Fish would have it, “the reader” goes on to line 14, “Unwept,” and now learns that “nothing will be done,” “the only action taken will be the lamenting of the fact that no action will be efficacious, including the actions of speaking and listening to this lament.”

Say what? Here was enjambment on steroids, certainly not the way I experienced the lines. This “reader” seemed quite idiosyncratic to me — and I experienced the same disappointment I had just experienced when, reading Calvino’s If On a Winter’s Night a Traveler, it became quite clear that “you” was not me, but rather just another character in a novel.

What strikes me now is the consistency of Fish’s defeatism: the raised expectations, the dashing of same. If, as Paul Alpers once put it, Fish was “dogmatically relativistic,” the Fishean notion of “interpretive communities” began to seem simply dogmatic. We live in a wilderness of imposed interpretation:

the choice is never between objectivity and interpretation but between an interpretation that is unacknowledged as such and an interpretation that is at least aware of itself. It is this awareness that I am claiming for myself.

Bully for you, Mr. Fish. This fixation on mediation (“critical activity is constitutive of its object”) has somehow now shrunk into an arthritic shrug at university “business models” and the death of humanities. Tenure, that meretricious patronage, is as lost as Lycidas, as dead as Daphnis. Meanwhile the hungry sheep look up and are not fed. Pastures new, anyone?

Google Images come to Life

How did you experience the American Century? Much of it, for me, was framed through Life Magazine. It was always a pleasure to leaf through Life’s photos in issues collected by my grandparents — vibrant, propagandistic, king-sized.

TV news killed the big tent photo circus off, and frozen pop images of America shrank and segregated down to People, Newsweek, Playboy, Rolling Stone, etc. But the Google juggernaut has just announced a revival — that is, digitization of all Life images, distributed through Google Images. Already 20% of the Life photo corpus is online.

The usual Google scanning tradeoffs apply. The good news: sudden and profuse availability, serendipitous discovery of previously sequestered nuggets within the course of one search. The bad news: search reduced to the blunt satisfaction of keyword searching (looking for all Life photos of Julie Christie taken by Paul Schutzer in 1966? Easy to find some, hard to find all.) Google Images has taught us to work under these conditions; we approach it looking for anything pertinent, happy to sift through unrelated dreck as long as we find treasure.

But it’s a model that frays and sputters when a full corpus is set within it, and we start wishing for authoritative and complete trajectories through it. Want to undertake a complete analysis of, say, images of war in Life down through time? That seems tantalizingly possible, but in actuality you’ll have to wait for more serious cataloging. Until then, we have a fun little trick to limit a keyword search to Life images — in the Google search box, type source:life and, sure, roll your eyes.

Then there’s the ever-uneasy question of use. Am I breaking any rules by posting a Life photo on this blog? Is it ok to post a small version of the photo, but not the large watermarked ‘full size’? As of this writing, there is no clear guidance for re-use provided by Google; clearly they have brokered a deal with TimeLife, which hopes to sell prints of these photos to rediscoverers of them, but of course they will be a tiny fraction of the cutting and pasting crowd. Even so this could be a win-win, a simple version of the Google’s recent dramatic and complex agreement with publishers.

Still, photos are easier to swipe and recontextualize than text content. And by scattering these photos into Google Images stripped of their original context, Google and Life are clearly championing fragmentation, the free-floating repositionings of a captured moment, Life as clipart.

Hearkening back to those grandparent-collected magazines, though, I’m sorry that a fuller scan of the photos in situ wasn’t undertaken. Without complete scans of the classic Life issues, we won’t even have digital access to all the photographs in those big pages, no matter what Google claims. Many of the most amazing ones festooned advertisements: housewives daring Frigidaires, impossibly air conditioned Cadillacs, reassuring insurance, Kodak inviting you to capture your own life….

Still, it’s churlish not to celebrate the wide release of Great Photos into the digital wilderness, and I look forward to seeing how they actually fare in a Flickr world. And I wonder: is National Geographic next?

‘O little cloud the Virgin said, I charge thee to tell me…’

Every once in a while Clayfox drifts into the tag clouds. And yet its heart has never quite followed. Maybe that’s because most often those clouds don’t prove to be so very informative after all.

Let’s review: tag clouds are a way to visualize the frequency of application of (usually uncontrolled) keywords to a corpus of stuff by a number of people. In many — even most — cases I wouldn’t call these taggers a ‘community’, unless we water down the definition of ‘community’ to a collection of people who have signed up for an online service. Even within the context of one academic tagging experiment, that can be thin or lumpy tea….

Even populous and richly tagged environments like Flickr can puff up clouds that seem, well, rather vaporous. Look at the cloud of “all time most popular tags,” and what is revealed?

tagcloudflickr.jpg

It seems that when taking digital pictures with NIKONS and CANONS Flickrites gravitate to WEDDINGS and PARTIES, they focus on FRIENDS and FAMILY, they like to TRAVEL on VACATION to the BEACH or to places like CALIFORNIA and FRANCE and JAPAN. Well, well, blow me over with a feather.

Even as a means of self-portrayal, cloud tags come up short — at least to an unstrategic tagger like myself. I use and love del.icio.us — but the cloud that it serves up of my tagging activity has never been of more interest than, say, an alphabetical list of my tags. And I’ve never really discovered much about anyone else by scanning a cloud of their del.icio.us tags. Have you?

I’m willing to be convinced that appending tag clouds can be a smart search engine strategy. Perhaps this is their real utility: providing another way for the machines to read us.

***

But I’m not anti-cloud, far from it. I just happen to think that clouds are a lot more interesting to human beings when they are of words in a text, rather than of tags applied to objects. Tag clouds open up all kinds of blurry mysteries: who’s doing the tagging? how canny or consistent are the taggers? what is the extent of the corpus being tagged? But a word cloud of a given text can be as revelatory as word mining — a re-mapping of a document to bring out its frequencies, its quirks, its long tails.

And word clouds, at least those generated on the addictive new Wordle , can be quite beautiful as well. I can imagine students really learning from them, or at least investigating the vocabulary field of, say, a poem from new angles.

As an example, I’ve created word clouds of two poems by William Blake: the introduction to Songs of Innocence, and the introduction to Songs of Experience. Compare them below, and you’ll quickly see that the Innocence poem is more repetitious, aural, interactive, while the world of the Experience poem is more disperse, visual, occupied by distances. You could get all that by reading the poems themselves, without any scrambling of their words and plumping up of their frequencies. But word clouds are a way of remapping a fixed world of meaning, visually exploring it — an engaging thing to do even if they drive you back, in the end, into fresh appreciation for syntax and line structure and the very contexts they explode. Enjoy!

Innocence
William Blake word cloud - innocence

Experience
William Blake word cloud - experience

Changing the subject

Who is this woman, and why is she crying?

Mrs. Belmont at gunmen’s trial (LOC)

This photo, from a collection of early news photos housed at the Library of Congress, is part of an experiment that has that venerable institution dipping a toe into the Web 2.0 waters. Compare the photo on LC’s own website, versus on Flickr.

By publishing some of its holdings into Flickr, where items can be annotated by anyone, LC is taking seriously what you often hear now but rarely see yet: in a digital environment, libraries have to move beyond providing access and into facilitating use.

Access has been traditionally provided by libraries by the application of pre-determined, hierarchical subjects; that’s what allows physical objects to be sorted and found. It’s a system that puts the onus on one cataloger to master a relatively fixed universe of related subjects, and apply this system to an object so said object can be placed and later found in its correct place.

On the web, of course, objects are easily replicated, dispersed, recontextualized. They can be represented in any number of places, found through any number of pathways and connections. They travel unpredictably across an increasingly read-write landscape, wherein someone just might improve and embellish the guess of that lonely cataloger about what an object is ‘about,’ making it thereby more discoverable. Accommodation to an endless amount of comment and annotation seems a nascent effect of the dynamically networked use of objects.

But back to the photo: how has being Flick’d out of LC’s precincts improved our sense of its subject? Somebody had scrawled a title, “Mrs. Belmont at gunmen’s trial,” and the LC record left it at that. Just a few days after it appeared in Web 2.0-land, commenters had connected the photo to a Wikipedia entry about Alva Erskine Belmont –a rather remarkable socialite and promoter of the women’s suffrage movement–as well as another photo in the same LC collection documenting the sensational Rosenthal murder of 1912.

Wikipedia, blog postings, tags, and comments are bringing this photo to life on Flickr, giving us a better sense of its context and content. But lest we get carried away with the wisdom of crowds, we should also acknowledge a misogynistic annotation on the photo in Flickr: “dr_ass2001″ has taken up himself to draw a square around Ms. Belmont’s head and write, “Stop crying, you moron.”

***

So will LC be modifying its records based on the annotations these digitized photos catch in Flickr? Their FAQs about the project demure:

The Library will decide what to do with data added through Flickr once the pilot is over. Because resources to update catalog records are limited, the Library cannot promise to incorporate contributed data into its own records.

Still, on Flickr pages such as that housing Ms. Belmont, an LC librarian has promised to alter records based on contributed information; and as of this writing, a search for ‘flickr’ in LC’s Prints and Photographs online catalog calls up 127 instances of metadata being added or altered as a result of the “Flickr community project, 2008.”

So what are the criteria for bringing information contributed through this “community project” into LC’s more authoritative catalog? How much time and effort are LC librarians putting into that crosswalk? It will be interesting to learn answers. As a member of RLG Programs observed three months into this experiment:

Social tagging in this framework doesn’t mean letting others catalog your collections for you – it really means offering up materials for a conversation which you have to follow closely to extract the bits worth bringing back.

“Conversation” seems to be the operative word here — but until LC makes its activities in this experiment a little more transparent, it’s rather like a conversation held in a confessional booth. In any event, the move towards opening up cataloging into a conversation with the public over the web is certainly a paradigm shift. Web 2.0 endeavors like LibraryThing have for years now facilitated the interplay of LC Subject Headings and free-form annotation. But now here’s LC itself, the very mortar of brick and mortar libraries, striking up conversation.

***

This has implications that range into epistemology. A recent article by David Pimentel traces the implications of treating knowledge-making as conversational: “the nature of knowledge is increasingly viewed as an iterative process, with each individual attempting to make sense of the world s/he encounters.” We live in a world increasingly impatient with indexing done by professionals, “inevitably limited to one individual’s perceptions of an information object at one particular moment in time.”

A conversational world, growing out of Gordon Pask‘s Conversation Theory, Pimentel reminds us, is one of “participants communicating and seeking a shared agreement, or mutual understanding.” What is correct is formulated by participants in this communication, not some “external absolute.”

As Pimentel suggests in passing, an iterative and unfixed arena of exchange is of increasing importance in an world so often formulated as heterogeneous or interdisciplinary–the only way, perhaps, to “unif[y] theories and concepts across disciplines.” To be sure, most any uncontrolled conversation contains trivial or inane or erroneous noise, and crowd-tagging experiments seem especially full of that. It may be the price to pay for being able to talk at all in an environment that is still often known for the big stern Shushhhhh.

A post on Flickr that accompanied the launch of this LC experiment last January was cheerfully titled “Many hands make light work.” I doubt the LC librarians trolling the comments on the two photo collections so far released onto Flickr would agree–but assuredly, many hands make different work, and perhaps more interesting work all around.

Librarians get to come into a closer and more collaborative relationship with users of the objects they collect. Those ‘users’ (or patrons?) are able to participate in the detective work that is so often at the heart of subject identification, perhaps gaining a stake in culture as a result. The collection gets marked with new pathways through it, becoming less of a sterile pile and more of an ongoing seeding of discourse.

***

The very first aim of the pilot though, as outlined in the “Many hands” post, has less to do with rethinking cataloging or conversational theory or anything like that, and more to do with publicity: “to increase exposure to the amazing content currently held in the public collections of civic institutions around the world.” Indeed, if you look through the LC collection on Flickr, a goodly number of comments are, shall we say, merely appreciative:

Comments on an LC photo in Flickr

Like so much else about this pilot, this mere enthusiasm expressed for objects that have been online for many years –as if they have just now been made accessible–is striking. If LC had simply switched on annotation tools on their own site, I doubt that so much enthusiasm and activity would have arisen around these photographs.

The trick seems to have been to bring these objects to Flickr, a “major gravitational hub” that is “driven by network effects,” to borrow terms from Lorcan Dempsey. The willingness of LC , no slouch itself when it comes to gravitational hubs, to open up a dialog with a very different kind of hub, is heartening — less for the new exposure it can bring to the vast collections of august institutions (though that’s always valuable) than for the dynamic friction that is bound to arise from the commingling of authority and the crowd.

Though the immediate impulse is to breathe a vast sigh of relief that Mrs. Belmont has been released from the gloomy dungeon of LC’s sterile, unchanging gallery and is now facing a new public on Flickr, I suspect the ultimate value of such liberation will be renewed appreciation for the thin skein of metadata so laboriously pieced together by specialists over the years that can now be embroidered, tested, interrogated. From what little I now know of Alva, I think she would value the old standards, even while pushing for new ways of living.

Where English is going

With some notable exceptions, the willingness of English Departments to seriously engage with current communication technology has advanced “one funeral at a time,” to quote one voice in the wilderness. Denial, nostalgia, tenure pressure: all part of the tweedy sluggishness. Meantime the hungry sheep look up and are not fed.

But I’ve found a video that stirs hope — at least for the Rutgers English Department. In it Richard Miller, its chair, posits that training students to express themselves in the communication channels they actually inhabit should be a core concern of English Departments.

Because we live in a read-write world, it is essential that the English Department provide training to our students about how to live in this world. This is a world which has radically defined what authority means, what expertise means, and how you define labor.

Exactly, and how refreshing to hear an authoritative voice engaging with a Board of Governors in this clear, simple, true way.

Miller’s video pitch defensively emphasizes the traditional publication prowess of Rutgers faculty, and it announces the advent of Web 2.0 as if to Rip Van Winkel. But then the video warms up to its convictions–or at least mine….

When professors of English start thinking like this, can Spring be far behind?

Life in the taggregate

From its earliest days, the promise of the Semantic Web has been to bring networked computers closer to the forms and priorities of human inquiry. This promise depends on mark-up language that gives data some structure, and frameworks that bring such structure into recognizable relationships. As a May 2001 Scientific American piece by Tim Berners-Lee and colleagues put it, “for the semantic web to function, computers must have access to structured collections of information and sets of inference rules that they can use to conduct automated reasoning.”

Automated reasoning! This dream may be coming to life in e-science, with its highly structured and interoperable datasets, but in many other contexts the idea of a Semantic Web sits uneasily with the younger and more popular kid on the block, the Participatory Web. Web 2.0 environments amasses a lot of data and, more importantly, a lot of information about this data generated by humans downright impervious to the need of machines for identifiable and consistent structure. Such tags are generally free-form, non-hierarchical, not expressing relationships in a predictable and consistent way; they dance to “folksonomy” not “taxonomy”; they are blithely untethered to “ontologies,” to any URI-based language standards.

Nevertheless there is intriguing thought out there about the potential interplay of the Semantic Web and Web 2.0. The Tagcommons sites lays out Use Cases that envision sharing tags across databases, and sketches out some functional requirements to make that interoperability happen. Tom Gruber, in particular, has argued energetically for “collective intelligence systems” built from syntheses of structured data and social software; his travel-review site RealTravel uses a “snap-to-grid” model to disambiguate and structure user-supplied tags.

And now in Yahoo! Research Berkeley labs, algorithms are starting to take into account aggregate patterns in order to sift out meaning from vast oceans of community-generated tags despite all their unstructured messiness — or, as computer scientists like to say, despite all their “noise.” It’s a matter of inference and cluster analysis. Case in point: the photo-sharing site Flickr‘s new experiments in extracting “practical information about the world” from the snapshots and tags poured into it by the great unwashed. The report “How flickr helps us make sense of the world: context and content in community-contributed media collections,” describes a layered process of tag and image analysis–one that can be conducted entirely by machines–that identifies representational tags as well as place and event semantics.

What does all this do for us? For one thing, it can improve a search through piles of community-contributed materials; my search for “Harlem” stands a better chance of coming up with the most representative picture of the neighborhood, or a set of iteratively varied views of the neighborhood, or even a conglomeration of views for a composite view. I could determine the most visited place in the neighborhood, or the scenes of important events. Yahoo!’s researchers are even thinking about automatic tagging of photos, or suggestions for tags, that are generated by visual content abetted by contextual and geographical cues.

Here are a couple of spins of Yahoo! Labs’ TagMaps:

Flickr World Browser Harlem

^ TagMap’s World Browser analyzes Flickr tags to locate “Harlem” on a map and offer a set of representative photos (on the right). Harlem seems pushed to the west, and the chicken picture is a little odd, but this machine-generated guess seems viable enough.

TagMap World Browser Paris

^ A search for ‘Paris’ in TagMap’s World Browser whisks us to a city in the middle of France, not Texas, and avoids any pictures of over-photographed heiresses. See: machines have taste too.

Teasing meaning out of cacophony, evaluating ‘where what & when’ through dumb processing of inconsistent human traces: it’s not hard to sense an artificial intelligence awakening here with its own priorities, despite the human decision (conscious or not) to ignore machine-oriented information conventions. What is the ultimate effect of algorithms trained to crunch through the idiosyncratic and identify the representational? Could such aggregate processing of unstructured data fuel a general regression to the mean, as alchemist Jonah Bossewitch muses? As a Trekkie (or is it Trekker?) might say, streaming into yet another convention, resistance is futile.

The fear of human conglomeration coming into sudden sentience is nothing new, of course. I just re-read Frankenstein with a set of fresh young readers, and alarmist correlations of that good old story to a improbably persistent, flexible, and collective-mashed form of AI doubtlessly come too easily to me now. But I do sometimes wonder whether we too will wake up from our most logocentric tagging idylls to sense senseless and unblinking eyes, watching us in the dark and hungry for more.