'Why, I'm Posterity -- and so are you.'

Who’s afraid of the Wolfram search?

Posted: May 5th, 2009 | Author: Mark Phillipson | Filed under: Library musings | Tags: , , , , , , , , | 2 Comments »

I might be.

The Wolfram|Alpha “computational knowledge engine” has been generating buzz for some time, especially since Stephen Wolfram, its eccentric progenitor, announced that it would be going live in mid-May. Expect the twittering to reach a crescendo.

Since the Wolfram|Alpha (WA, let’s say) promises to answer questions typed into a simple text box, it’s being described in the press as a Google-killer. The idea, in an alpha nutshell, is that WA interprets a natural language query and then combs through a gigantic pile of databases, both public and licensed, in order to respond with an answer — rather than Google’s list of web pages that may or may not contain an answer.

Wolfram recently gave a demonstration of WA at Harvard’s Berkman Center. The whole presentation is posted, but you can get a quicker sense of what WA aims to do in this surprisingly murky collection of screenshots:

From this demo and other the-Wolfram-is-coming reviews blooming like tremulous flowers in the rain, WA looks to be a fancy calculator, an atlas on steroids, a deft collator of visualized data.

But is it more than that? Beyond looking up and presenting information, will it give us genuine and new answers? Will it represent a significant push beyond Google’s suddenly modest ambition to “organize the world’s information and make it universally accessible and useful”?

Wolfram himself seems to think so:

…what about all the actual knowledge that we as humans have accumulated?

A lot of it is now on the web—in billions of pages of text. And with search engines, we can very efficiently search for specific terms and phrases in that text.

But we can’t compute from that. And in effect, we can only answer questions that have been literally asked before. We can look things up, but we can’t figure anything new out.

So how can we deal with that? Well, some people have thought the way forward must be to somehow automatically understand the natural language that exists on the web. Perhaps getting the web semantically tagged to make that easier.

… I realized there’s another way: explicitly implement methods and models, as algorithms, and explicitly curate all data so that it is immediately computable.

Wolfram is know for making audacious claims about the power of computation; his massive boiling down of all complexity into relatively simple mathematical rules, A New Kind of Science, was a ‘surprise best seller’ on Amazon even though Wolfram posts all of it for free. The promise of a simple handle on an immensely complex world–frothing up into a good dose of post-religious hype–is irresistible. It’s quite congruent, when you think about it, to Google’s keyword-search doorway to the infinite.

But Google is best used to locate information, not to solve problems. Sure, if you type into its search field “square root of 81″ it will offer you a quick answer atop the usual pagerank results. Google has dabbled, in fact, with calculator functions. This slippage between search and calculation, though, is what alarms me.

A pernicious information illiteracy takes root — the world of clear ascription of responsibility suffers another blow — anytime someone starts assigning oracular power to the Google search algorithm. “It says [fill in information claim here].” I’ve seen college students actually cite a Google search in research–not research on Google search, mind you, but research on a subject informed by something that the search dug up one night. Who wrote and published the data is unimportant: in the middle of that dreary night, “It says….”

At an extreme point, we reach the absurdity of Carol Beer in Little Britain, overriding every thought and instinct as she dabbles on the keyboard and announces, after desultory searches, “Computer says no…”

Of course any decent web calculator will draw on good data, and won’t be nearly as mechanistic or useless or funny as Carol. But even an amazing one — and WA promises to be amazing — shouldn’t be confused with actual intelligence; assembling and synthesizing only gets you so far. One of WA’s biggest cheerleaders, Twine founder Nova Spivack, makes a similar point:

Wolfram Alpha, at its heart is quite different from a brute force statistical search engine like Google. And it is not going to replace Google — it is not a general search engine: You would probably not use Wolfram Alpha to shop for a new car, find blog posts about a topic, or to choose a resort for your honeymoon. It is not a system that will understand the nuances of what you consider to be the perfect romantic getaway, for example — there is still no substitute for manual human-guided search for that. Where it appears to excel is when you want facts about something, or when you need to compute a factual answer to some set of questions about factual data.

Spivack’s distinction between (WA’s) computation and (Google’s) look-up is helpful, as is his concession that WA, as elegantly structured as it may be, will only be useful in presenting and recombining known facts. Wolfram himself, no stranger to hyperbole, may wish to characterize WA as generating new knowledge. But until it develops algorithms for context, nuance, interpretation, influence, critique, seriousness, incoherence–until it embraces all of human expression, in all of its messiness–it will never offer sufficient answers to questions more debatable than “What was the average rainfall in Boston last year?”–just as Wikipedia cannot extend beyond professed neutrality.

So my fear of WA, knowing little about how it actually will work and feel, is that it will offer a fancy dashboard of pseudo-expertise, subtly diverting human inquiry into what’s pre-known. This seems an old fear, a fear of robots, and maybe, like many old human fears, it will melt away in the light of new threats.

In any case, by WA seems poised to offer a counterpoint to the semantic web, a different model of bringing structure to information to make search more responsive to the questions we ask. The road is strewn with various ‘natural language’ search disappointments — Ask Jeeves was deaf, Powerset seems blind to all but Wikipedia — and there’s reason to hope that Wolfram’s interpretation of natural language will be smarter, that it will process our questions and deliver them to large and various datasets. If it then answers authoritatively, though — caveat emptor.


The silence of the cyberlambs

Posted: October 20th, 2007 | Author: Mark Phillipson | Filed under: Academia | Tags: , , , , , , , , | 3 Comments »

It’s taken long enough, but Clayfox has shaken off summer dreams to engage with a little edu-distopia, 2007-style.

Michael L. Wesch, the Kansas State University anthropology prof who brought the YouTube-fueled world a much-referenced little primer on Web 2.0 some time back, has had his students produce a new video, this one a decidedly grim picture of the college classroom grandly titled A Vision of Students Today. The jaunty electronica is back (CC-friendly Tryad), but this time it’s frosting a world of disjunction and guilt.

Behold sallow college students flashing sign after sign of disengagement with an scene of education that may as well be some boring corner of the moon — blandly self-absorbed, at any rate, in creaky rhythms and technologies and communication patterns dating from 1840-something, tagged as Death-in-Life by Marshall McLuhan forty years back already & still death-in-living.

They’re ignored and distracted, these laptop-toting prisoners of the Havisham lecture hall; they’re indebted, claustrophobic, self-loathing, and lazy. Their lives are being drained away by Facebook twittering, while off in the lectern distance some dork scratches at a chalkboard and impervious-anyway book spines sit uncracked. And oh, the fluorescence, the fluorescence…

Tragic, no? I’m struck by the ways our young victims express and don’t express themselves in this YouTube cri de coeur. It’s a Vision of Students Today that’s clearly filtered through Alienation, Adolescent 101; one suspects that Catcher in the Rye is a rare one of the eight books these kids have managed to find time to read (or not…). Did you glimpse that Google Doc, that hub, presumably, for planning the video? “200 students made 367 edits to this document.” Collective expression in action! And… action!

And yet we hear no voices. Instead, here’s the tour of a sterile wilderness of signs–some scrawled on furniture, several displayed by kids fixing the camera with a a look of bale. Sometimes a sign is two-sided; it says one thing, then their holder flips it over to counter or complicate. One turn of the screw. But that’s as deep as it gets: the flipped succession of surface statements.

I’m sure these students recognized themselves as doing something provocative, challenging norms, goading the world to rethink the process of college education . It’s a start, but just a start, a register of sad: using collaborative communication to hunker down in oversized sweatshirts behind a slogans that say, with variation: We don’t get you (flip over) you don’t get us.

Let’s hope that the next YouTube sensation from Wesch — who clearly knows how to make ‘em — shows students in a more active mode, trusting themselves with a subject beyond disfunction.


Mining the machines

Posted: March 15th, 2006 | Author: Mark Phillipson | Filed under: Academia, Libraryworld | Tags: , , , , , , , , , , , , , , | 1 Comment »

Last year at the ARL symposium called Managing Digital Assets, I smiled inwardly to think of the grumbling likely to be kicked off by observations such as this by Donald Waters of the Mellon Foundation:

…what unites our interest in digitization and open access in a digital world is that the material becomes ‘processable,’ or subject to computational processing. That is, the growth in the market of readers is not among groups of humans, but of machines, which are programmed to index, manipulate, mine, aggregate, decompose, and build up scholarly and other forms of content by algorithm. It is this machine ‘processability’ that makes digitized objects and open access materials most valuable to scholars.

Protest, fume, rail against the subjection of your most exquisitely developed thought to the dumb imperatives of ones and zeros — Waters is absolutely right. You want influence? Or, more to the point, you want to avoid obliteration in the vast digital swamp? You’d better know how to demarcate, classify, and optimize your work for machine crunching — or find someone who does. And pray that the stewards of such crunching, the information managers you never thought about, have your best interests in mind.

All this occurred to me while reading a new D-Lib piece by Daniel Cohen, director of research projects at the very creative Center for History and New Media at George Mason University. Cohen also spoke at that ARL session, and at the time he sold me on Firefox scholar. His new article, “From Babel to Knowledge: Data Mining Large Digital Collections”, offers two nice examples of humantist-friendly manipulation of machine “processability.”

First: Syllabus Finder. Where was this godsend when I was inefficiently wandering around the chaff of the web, trying to crib ideas for my own syllabi? It’s a very sensible, very needed genre-based search tool. First, it defines “document classification” through a very simple dictionary of keywords endemic to syllabi (“assignment,” “office hours,” etc.). This classification is fed into Google through its API service, along with the search query, for optimized searches. The results can then be further refined through more automated analysis or combined with other search results.

I gave it a spin, using canonical writers from the Romantic era as search terms. To my happy surprise, good old Ashes Sparks & Hypertext, a six year old syllabus for a seminar I taught back in the day at UC Berkeley, kept showing up — and at or near the top of results. #1 for Coleridge, #2 for Byron, #1 for Wordsworth, #2 for Blake, #4 for Hemans. Yeah, baby! But we drop down to #14 for Keats, alas, and as for Shelley, he just kept coming up as a “fatal error,” an “Uncaught SoapFault exception.” So Syllabus Finder is a little buggy — but, dare we say it, a little poetic too. Maybe we’re just overly pleased by taking the silver for Byron:

Ashes Sparks is the second syllabus listed for Byron

I don’t know what to make of the way this tool seems to like the Ashes Sparks syllabus — certainly I indulged in no optimization — no thought about how the thing would be retrieved. The only distinguishing feature of that document, really, is that it’s been online steadily for six years. It’s just one of those Google-blessed mysteries. Perhaps cannier post-processing could promote syllabi more deserving of prominence. But Syllabus Finder works pretty well–I’d recommend it to a fledgling (and not-so-fledgling) instructor. As Cohen puts it, it does a surprisingly good job at achieving its modest goal – on most topics for every ten documents it retrieves, about nine are syllabi – and it has thus far found and catalogued over 600,000 syllabi, synthesizing a collection of course materials considerably larger than any created or maintained by a professional organization, educational institution, or library, or by any other effort on the web to aggregate syllabi.

A second and more complex treat today from the George Mason wizards: H-Bot. This is an automated historical fact finder that can field natural language queries. (Or at least ones that begin with ‘what’ or ‘when’ or ‘who’; it’s not ready to handle where, which, how, or why). The algorithm here is “question answering” — which involves the identification of relevant documents, some natural language processing (to interpret queries), and statistical/linguistic analysis of retrieved documents. (In addition to the D-Lib article, there’s more on H-bot here)

Playing with H-Bot is fun. When did Hitler die? The answer in an eyeblink, as the Germans say: April 30, 1945. When did Gandhi die? Here’s a quirk:

Fun with H-Bot

Well sure, but that wasn’t the Gandhi I meant. Interestingly, here’s what happens when I ask the same question but tell H-Bot not to “check trusted websites first”:

Fun with H-Bot

Here’s a case when the unfiltered swamp actually answered my question — or read my mind — better than “trusted websites.” Quantity over quality? Very sensibly, H-Bot demurs when I ask “Is God dead?” or “When did God die?” (“I’m sorry. I cannot provide any answer on that.”) But ask it “Who is God?” and H-Bot serves up a perky little answer:

Fun with H-Bot

Simple-minded? Sure. But viable. Arguments will rage, hairs will split, blood will spill, but our dumb machines have given us an efficient pulse of information in the midst of the cacophony, delivered by strategic sifting of great gobs of data.

Which brings us to a final point that Cohen makes about machine data-mining: “Quantity may make up for a lack of quality.” Even the most ardent humanist can’t deny: when it comes to information, we’ve got a whole lot of quantity these days. It’s how we draw from such quantity that counts.