Life in the taggregate

Friday, November 23, 2007

From its earliest days, the promise of the Semantic Web has been to bring networked computers closer to the forms and priorities of human inquiry. This promise depends on mark-up language that gives data some structure, and frameworks that bring such structure into recognizable relationships. As a May 2001 Scientific American piece by Tim Berners-Lee and colleagues put it, “for the semantic web to function, computers must have access to structured collections of information and sets of inference rules that they can use to conduct automated reasoning.”

Automated reasoning! This dream may be coming to life in e-science, with its highly structured and interoperable datasets, but in many other contexts the idea of a Semantic Web sits uneasily with the younger and more popular kid on the block, the Participatory Web. Web 2.0 environments amasses a lot of data and, more importantly, a lot of information about this data generated by humans downright impervious to the need of machines for identifiable and consistent structure. Such tags are generally free-form, non-hierarchical, not expressing relationships in a predictable and consistent way; they dance to “folksonomy” not “taxonomy”; they are blithely untethered to “ontologies,” to any URI-based language standards.

Nevertheless there is intriguing thought out there about the potential interplay of the Semantic Web and Web 2.0. The Tagcommons sites lays out Use Cases that envision sharing tags across databases, and sketches out some functional requirements to make that interoperability happen. Tom Gruber, in particular, has argued energetically for “collective intelligence systems” built from syntheses of structured data and social software; his travel-review site RealTravel uses a “snap-to-grid” model to disambiguate and structure user-supplied tags.

And now in Yahoo! Research Berkeley labs, algorithms are starting to take into account aggregate patterns in order to sift out meaning from vast oceans of community-generated tags despite all their unstructured messiness — or, as computer scientists like to say, despite all their “noise.” It’s a matter of inference and cluster analysis. Case in point: the photo-sharing site Flickr’s new experiments in extracting “practical information about the world” from the snapshots and tags poured into it by the great unwashed. The report “How flickr helps us make sense of the world: context and content in community-contributed media collections,” describes a layered process of tag and image analysis–one that can be conducted entirely by machines–that identifies representational tags as well as place and event semantics.

What does all this do for us? For one thing, it can improve a search through piles of community-contributed materials; my search for “Harlem” stands a better chance of coming up with the most representative picture of the neighborhood, or a set of iteratively varied views of the neighborhood, or even a conglomeration of views for a composite view. I could determine the most visited place in the neighborhood, or the scenes of important events. Yahoo!’s researchers are even thinking about automatic tagging of photos, or suggestions for tags, that are generated by visual content abetted by contextual and geographical cues.

Here are a couple of spins of Yahoo! Labs’ TagMaps:

Flickr World Browser Harlem

^ TagMap’s World Browser analyzes Flickr tags to locate “Harlem” on a map and offer a set of representative photos (on the right). Harlem seems pushed to the west, and the chicken picture is a little odd, but this machine-generated guess seems viable enough.

TagMap World Browser Paris

^ A search for ‘Paris’ in TagMap’s World Browser whisks us to a city in the middle of France, not Texas, and avoids any pictures of over-photographed heiresses. See: machines have taste too.

Teasing meaning out of cacophony, evaluating ‘where what & when’ through dumb processing of inconsistent human traces: it’s not hard to sense an artificial intelligence awakening here with its own priorities, despite the human decision (conscious or not) to ignore machine-oriented information conventions. What is the ultimate effect of algorithms trained to crunch through the idiosyncratic and identify the representational? Could such aggregate processing of unstructured data fuel a general regression to the mean, as alchemist Jonah Bossewitch muses? As a Trekkie (or is it Trekker?) might say, streaming into yet another convention, resistance is futile.

The fear of human conglomeration coming into sudden sentience is nothing new, of course. I just re-read Frankenstein with a set of fresh young readers, and alarmist correlations of that good old story to a improbably persistent, flexible, and collective-mashed form of AI doubtlessly come too easily to me now. But I do sometimes wonder whether we too will wake up from our most logocentric tagging idylls to sense senseless and unblinking eyes, watching us in the dark and hungry for more.

The communal LOR

Thursday, January 18, 2007

In our last episode, we beat up a bit on the notion of “learning object repositories” (LORs), wondering whether the well-meaning assemblage of modular bits and pieces of educational materials was actually a frustration of coherent teaching. Educational practices, after all, are still grounded in settings and customs that predate the digital on-demand world. We speak of courses, of curricula, of graduation; we cling on to learning as an unfolding, progressive narrative. And progressive narratives seem to be exactly what free-floating clusters of learning objects lack.

Haunted as I am by S.T. Coleridge’s Ancient Mariner and that ghostly character’s pseudo-progressive travails, I can’t help thinking of decontextualized learning objects as similar to the unearthly sounds that rise out of the mouths of his dead crew and swirl unfixedly about:

Around, around, flew each sweet sound,
Then darted to the Sun;
Slowly the sounds came back again,
Now mix’d, now one by one.

Sometimes a-dropping from the sky
I heard the skylark sing;
Sometimes all little birds that are,
How they seem’d to fill the sea and air
With their sweet jargoning!

And now ’twas like all instruments,
Now like a lonely flute;
And now it is an angel’s song,
That makes the Heavens be mute.

It ceased…

The Rime of the Ancient Mariner is heuristic to the core; it teaches us to teach through many spectacularly negative examples. Disconnection from community, the poem suggests, leads to a horror-mirror world of isolation: a world teeming with elements snapped off from the teleology of cause & effect. The Mariner butchers the bird, obeying some unexplained private impulse, and dooms himself to a world where wind is heard but not felt, or felt but not heard — and the same goes for companionship, morality, religion, expiation. Very dissatisfying. Those free-floating supernatural sounds — all that “sweet jargoning” — are momentarily marvelous, even Heavens-eclipsing — and yet they’re unreliable and of dubious value, to say the least. They don’t advance the plot; they just cease.

The Mariner’s original sin: ignoring community (which was, after all, so strongly fostered by that unlucky albatross). It’s a pretty trenchant sin; even after any amount of penance, he seems doomed to repeat it. He poaches the Wedding Guest, blocking this unwilling auditor from entering a communal wedding celebration (the poor Guest protests, to no effect, “The guests are met, the feast is set: / May’st hear the merry din….’”), and forcing the Guest, instead, to listen to a hard-luck story having little to do with its auditor, superficial appearances notwithstanding (”That moment that his face I see, / I know the man who must hear me…”).

Dore Mariner

And what in mute Heaven’s name does any of this have to do with learning object repositories? It seems that we’re learning the Mariner’s lesson all over again. The most thoughtful study that I’ve read about the uptake and implementation of LORs is the recent study “Community Dimensions of Learning Object Repositories,” funded by the Joint Information Systems Committee (JISC). The gist of this report is evident directly from its title: however energetically you go about building a constellation of durable, interoperable, reusable, and sharable chunks of teaching & learning materials, it won’t mean a thing unless you tailor it to the cultural norms and expectations of a user community. As the report observes in its rather British way, “pedadogical, social, and organisational factors have not been at the forefront in LOR development to date.”

A community shares goals, interests, practices; it draws on commonly available tools; it shares understanding of processes and concepts. The JISC study lines up and sets marching some hard questions bound to make any repository-builder squirm: What is the purpose of the LOR — ie, how does it serve its community? Who are key stakeholders in that community? In what broader context does that community operate? A LOR project that starts by grappling with such large questions stands a better chance of being organized by pedagogical goals and activities, rather than all the content it can cram into its great maw just because — like the Mariner knocking an albatross down out of the sky — it can.

Treating teachers as one big community is in many ways an absurdity, of course — we operate within a dizzying array of conditions and expectations, and with a variety of allegiance to vastly different sponsoring institutions. Nevertheless, it is at least a good step to consider how a LOR addresses whatever generalizations you may wish to venture about teachers as a community. This borders on a truism, but then again how many LORs truly meet an actual teacher half way? The JISC report hazards a few claims about teachers and the way they behave:

  • They have a very problematic relationship with metadata. Descriptive metadata can fail them when they’re hunting in the dark for objects. When submitting an object to an LOR, they’re not trained & often not helped in the fine art of quality metadata appendage. More on this issue here, btw
  • They often prefer to create their own learning objects, rather than patch someone else’s in. On the scale of teacherly chores — grading, planning, meeting, exhorting, reviewing — creation of new materials for one’s class is actually on the fun side, one of the best ways to stand out and inspire, to make your class into a unique event. Even if you’re not so handy with making new things, by dipping into the well of pre-made pieces you risk “loss of educational narrative,” as the JISC report puts it (and how many teachers got into the business because of their assemblage skills anyway?). Educational narrative may be more important to individual-obsessed humanists than object-oriented scientists, the report notes in passing.
  • Teachers like incentives just like anyone else, and an LOR would do well to supply some. They could be in the form of recognition or perhaps an even more tangible reward for contribution, or proof that use of material from the LOR will make a teacher more effective. If the LOR is keyed to the goals of the institution that pays said teacher, that’s a fine reason to use it.
  • Despite all impediments, teachers, bless ‘em, are a persistently open-minded lot, at least according to the JISC report: “In general the interviewees have a positive attitude to reuse, and most have stated that they are willing to keep trying to reuse material, despite the difficulties they have faced.” This is a suggestion that LORs have some time to wake up to the willing worlds around them in all their glorious particularity.

And let’s close, on that brighter note, by nodding towards LORs that do seem engaged with the communities that use them, on some level at least.

The granddaddy of LORs, LC’s American Memory Project, set an early standard by layering its gigantic offerings with a “Learning Page… especially for teachers” : a collection of “teacher created, classroom tested lesson plans… [to] jumpstart your use of primary sources,” a rundown of curricular themes, various strategies to promote critical thinking, and professional development materials.

The National Science Digital Library corrals its resources for various imagined players: K12 Teachers, Librarians, NSDL Community Members (you know who you are), University Faculty, and First Time Users. Each of these groups has customized “pathways” through the library, as well as a fistful of fairly active blogs grouped by audience category.

Finally, the December issue of D-Lib describes a geoscience LOR named “Teach the Earth” built by the Science Education Resource Center at Carleton College; the article is encouragingly titled, “Digital Library as Network and Community Center: A Successful Model for Contribution and Use.”. The authors state, flat out:

A successful educational digital library is as much a social process as a technical problem. It requires creation of a culture that fosters contribution to and use of the library. We have addressed creation of this culture by working with NSF-funded projects focused on the professional development of geoscience faculty as teachers. Each of these projects partnered with SERC to create its project website. They seek two primary services in this partnership: 1) tools, resources and experts that assist them in creating high quality project websites and 2) placement of their resources in a network that enhances dissemination and use of their work. We created a win-win situation that yields rapid production of content for the library and facilitates use, by allowing our partners the flexibility to meet their own project goals while contributing to the overarching digital library.

Let’s see: professional development, support of individual projects with an eye towards incorporation, maintenance of a consistent level of quality, enhancement of dissemination and recognition of work — sounds like a happy LOR to me, one that engages its users, rather than stunning them.

The SERC authors claim that a full 25% of all geoscience faculty in the US (the audience it bothered to target) now use Teach the Earth: now that’s uptake!

Learning object(ions)

Thursday, January 4, 2007

The pendulum has certainly swung far away from the early days of digital learning happytalk, which was all objects all the time. In them dotgone days, “strategic futurists” such as Wayne Hodgins proclaimed that “the ability to learn and apply the right stuff faster is the only sustainable competitive advantage there is for any of us” — and the way to win was to call up that stuff, those digital learning objects, pronto. The “learnativity revolution” would be powered by gobs and gobs of “terrific resources” marked up by Learning Objects Metadata, dressed up for discovery. Powering all this (remember when ‘powering’ was a verb?): the Lego (TM) metaphor, as touched on by a 2002 D-Lib article called “Metadata Principles and Practicalities”

In a modular metadata world, data elements from different schemas as well as vocabularies and other building blocks can be combined in a syntactically and semantically interoperable way. Thus, application designers should be able to benefit from significant re-usability as they gather existing modules of metadata and ’snap’ them together much as individual Lego™ blocks can be assembled into larger structures.

Legos at SXSW

Though futurist Hodgins (a co-author of the D-Lib piece) is avowedly “wandering and pondering as he scours the world for trends and technologies most of us will not see for the next 18 months to 10 years,” an anxious world is still waiting for the followup to “Into the Future: A Vision Paper” (2000), in which “the rules of Newtonian physics have been superseded by those of Learnativity, where the gravitational pull of creating new knowledge determines and shapes the actions of everything within.” The process, as described in this Vision, is at once entropic and plastic:

Breaking knowledge down into information objects, the smallest useful chunks of information, frees it to be used again. Think of this as creating and assembling Lego™ blocks. Whether you’re assembling a bridge or a house or a spaceship, you use the same Lego™ to form a “learning object.”

The notion that newly created digital objects can upend physics may seem to belong to the discard pile next to sock puppets and Netscape 4.0. And yet the Legoland learning world haunts us still. We have a deeper sense of how hard it is to transform (let alone revolutionize) education with modular resources, but the web brims with learning object repositories that are palpably yearning to be engaged by actual teachers.

Every once in a while, a teacher even urges their use to colleagues, such as this 2006 endorsement by a Professor of Geomorphology writing in Ariadne:

Reusable educational objects (REO) or reusable learning objects (I prefer the wider term) are becoming an area of interest in education, especially in Higher Education. This stems from the ideas of reusability from ‘mass’ e-learning in the USA and from there developed the Sharable Content Object Reference Model (SCORM) as well as some resources such as MERLOT (Multimedia Educational Resource for Learning and Online Teaching). This tends to have full resources such as a slide set or a Web page. Lecturers should try this as there may well be all sorts of useful material available within the archive, often free.

There is a lot of faith packed up here — in a preferred definition of a ‘learning object’ (a definition that tends to crumble when you push on it), in the value of reuse and mass broadcast, in the existence of “all sorts of useful material” to be unearthed within an archive (for free!). All the more reason to wonder and ponder the extent of actual use of learning object repositories. Are current offerings honoring the enthusiasm of our good professor of Geomorphology? If not, is there something fundamentally flawed in the idea of freely recontextualizable learning objects?

I recently took a quick sip of MERLOT (”a free and open resource designed primarily for faculty and students of higher education”), the learning object resource singled out by the good prof, and found it to be… rather flat. Though it offers ‘peer review’ filters and advanced searching, MERLOT failed me when I came into it with a specific agenda: to find a peer-reviewed resource that would supplement teaching of William Wordsworth’s poetry. No results found. Was that too specialized? Then how about something about landscape in art or literature? How about anything at all involving the keyword ‘landscape’? Finally, one peer-reviewed result found: oddly enough, an FTP tutorial (author unknown, section 508 non-compliant).

When I approached MERLOT without an agenda — that is, in ‘browse’ mode — I was again underwhelmed. Looking to see how available resources might be engaged, I picked through assignments posted on the site, and found one rather expansively called The British Empire. The gist of this assignment: go to an outside website, read sections of it, and write a 5-7 page essay. This outside website itself warns: “This site is not a rigourous academic site! I’m sure there are plenty of mistakes and oversights on my part; for which I apologise in advance! My interest in the subject is purely that of a personal journey of discovery….”

After a few disappointments like this, the sun was setting on my hope that MERLOT had much to offer me. To be sure, like our Geomorphology prof, the site has nothing but the best intentions. Its solicitation of assignments and personal collections offers some way into the “15818 materials” (as of this writing) somewhat chaotically gathered. In other words, there’s effort to bring the wisdom of learning communities to bear on these bits and pieces– to encourage peer review, share insight, suggest deployment. ‘Gold level’ users of the site (rated by submitted materials, comments, assignments, and collections) would surely attest to MERLOT’s value.

But the effort seems limited by the objects model embraced by past futurists. “Materials” are gathered, and activity is to follow: the activity of wrestling them into actual curricula in a meaningful way. Put it this way: I would have to be a fairly passive teacher if I were satisfied with the results and suggestions I unearthed on MERLOT. I would have to be willing to suspend the gravitational pull of my own course — sacrifice context, really — on order to incorporate an object impervious to what came before in my class and what would follow: a second-handedly endorsed learning resource with priorities and emphases that may be disconnected — even inimical — to my own.

***

At the heart the idea of “learning objects,” then, is believe in modularity, as if teaching were so much recombination. If you’re in a really dark mood, you might consider the model of replaceable parts as emblematic of the “Information University” vividly deplored by Marc Bousquet a few years ago. In the nightmare Information University, labor is made up of so many interchangeable parts, available on-demand and easily replaced:

Constrained to manifest itself as data, labor appears when needed on the management desktop–fully trained, ‘ready to go out of the box,’ and so forth–and after appearing upon administrative command, labor in this form should ideally instantly disappear.

Who would consent to work this way? Replacements for the tenured class, of course, that market-immune anachronism that is vanishing like so many glaciers:

Dispensing with the skilled professoriate is accompanied by the installation of a vast cadre of differently-skilled workers–graduate students, part-time faculty, technology specialists, writing consultants, and so forth.

Just the sort of workers lacking the training and time and perspective, I would suggest, to assemble a coherent and effective pegagogy out of a massive pile of Legos™.

On activating digital collections

Monday, November 13, 2006

I was on the verge of crafting a blog entry expressing fears and reservations about Second Life when it occurred to me that skepticism has gotten too much play here of late. I’m really not so grumpy. To try to prove that, I’ll slap down here a few paragraphs from a mini-manifesto I’ve been working on lately. It’s lumpy and unfinished — but it’s hopeful.

***

The digitization of learning objects does not, in itself, foster study of them. Even the richest digital library teaches little if it is not selectively engaged by pedagogical context and activity. Conversely, a learning environment that fails to incorporate available resources best suited to its purposes courts hermeticism and limitation. We should be committed to activating digital collections — exploring mutually beneficial relationships between collections and learning environments — through the integration of digitized material with new modes of study and dissemination.

The best digital learning tools may well draw on discrete digital collections in different ways — often within the same environment. This is no surprise: just as no one pedagogical application could exhaust the possibilities of a robust digital library, it is often the case that no one collection satisfies the evolving or multiple purposes of a sophisticated learning environment.

At a university, such tools should be conceptualized in consultation with faculty, consultations that focus on teaching methods and goals. Identification of relevant collections to draw upon (at an institution’s library and from the wider world) is an important follow-up to this impetus, akin to identification of the software that will run the project. Just as some innovative projects now run on proprietary as well as open source software, and may mix a number of microapps into one environment, so should we draw on a range of collections that best engage the purpose of a given project.

Existing digital collections, more often than not, consist of material restricted for certain uses or limited to certain audiences, in compliance with licensing and codification. Projects that engage diverse existing collections are likely to require special permissions and/or an access architecture of no little complexity (material variously available to various populations): negotiating this variety can be difficult, but it nevertheless ensures that pedagogical goals, and not the restrictions of any one collection, shape a learning environment.

In addition to existing digital collections, heretofore unpublished or uncollected assets may vastly improve and distinguish learning environments. This ‘dark’ material may be created and owned by individual faculty members, or held in reserve by public or private enterprises; its use may be open to negotiation, which takes no small amount of time and effort. In some cases, onerous restrictions or the simple lack of relevant material may drive the crafters of educational environments into active production of new assets for a given project: videotaping new interviews, recording new performances, capturing new creations. This active creation also requires a lot of resources, but the advantage here is that content can be produced with permissions and licensing optimal for a learning environment.

The heterogeneous provenance of collections means that any producer of digital learning tools has an active interest in understanding and promoting standards of interoperability. We also have a stake in open access movements: a collections landscape less hedged by restriction is a landscape that will offer a fuller array of elements for the tools we build. Whenever possible, our projects should be made open for access and use beyond any conceptualized engagement; this maximizes the often extensive investment of an organization in any given project, and inspires the holders of potentially useful collections to match our lead.

Finally, the fungible quality of digital material means that it is often transformed through incorporation into a learning environment. As it is used, it changes– through recontextualization, annotation, or other user modifications. A project that begins by drawing on discreet collections may thus become a unique collection itself, reflective of assignment-related engagements of a given community. Instructors may shape materials in a certain way, or supplement it over the course of a term. Student work may be archived in the project and in turn made available for future iterations of the project or outside use of it. Evidence of active study may thus consist of transformation of material in the environment; it could also be fresh material generated or uploaded by students. Many of the most interesting educational environments will in this way prove to be ‘two-way’ collection areans, necessitating thoughtful policies about ‘outputs’ as well as ‘inputs.’

***

For an exemplification of some of these points, I invite you to take a little tour of the Havel at Columbia site. Here is a digital melange of:

  • Columbia University Libraries holdings (special collections, institutional archive, media holdings)
  • Donated commercial material (documentary film selections, CNN news archives)
  • Donated privately owned material
  • Material purchased for the project (CORBIS & Getty images, video archives)
  • Material created for the project (video interviews conducted with Lou Reed, George Soros, et. al.)
  • Campus events videotaped during Havel’s residency
  • User ‘notebooks’ used to assemble and annotate assets into multimedia essays and demonstrations

Unavoidably, some of this material is restricted to students and instructors at Columbia. But whenever possible, we’ve opened things up for universal access, and encouraged those participating in this project to do the same.

The next step, I can almost hear you thinking, would be to release the collection of publicly accessible material on this site under a CC license. We’ll get there, I’m sure of it. See? Hopeful!

The end of EndNote?

Thursday, September 7, 2006

You’ve wrangled that paper to a plausible conclusion — a bit of sleep is just around the corner — but hold on, not so fast, you’re Sisyphus after all. Citation formatting is a special curse, the inane labor at the end of hard work that holds all your effort hostage. Never does it seem less true that it’s the thought that counts.

The best portrait of this frustration that I know is Louis Menand’s New Yorker article from three years back, “The End Matter; The Nightmare of Citation.” (And no, I won’t properly cite it.) Menand mobilizes here a full sense of the tyranny that must be endured in the construction of endnotes —

Every error is an error of substance, a betrayal of ignorance and inexperience, the academic equivalent of the double dribble. That the decorums of citation are the arbitrary residue of ancient pedantries whose raisons d’etre are long past reconstructing does not reduce the penalties for nonconformity.

Surely technology should free us from such tiresome finish-line ambushes. And yet, as Menand observes,

The notion that the personal computer has eliminated the bone-crushing inefficiency of the typewriter, and turned composing The End Matter into a drive in the word-processing park, belongs to the myth that all work on a computer is “fun”-one of the Digital Age’s cruellest jokes.

Microsoft Word, as Menand observes, is too often a baffling mess when it comes to foot/endnote generation, plaguing you with random formatting and automatically generated annoyances. Too many options: the exhauster citer just wants to be faultless and to be done.

EndNote — which is a plug-in in my version of MS Word — might seem to be a lifesaver. Indeed, many of us have been happy to sit through earnest training in this and similar tools, entranced by the promise of metadata pulled down from a network, stored in a local database, and spit back out, effortlessly, into formatted endnotes. Oh, you wanted APA 5th, not Turabian? Hold on just a sec - (click, click) - here you go! Choose a style, any style: here are 1012 to choose from!

And yet, in my personal experience, EndNote endnotes are chock full of flaws. I’m not here to assign blame — maybe it was an incomplete OPAC record, maybe the library filter was off, maybe EndNote dropped a field — at the end of the day (rather, the night), citations are liable to look like nothing in that overstuffed, unloved red style manual (which is all but impervious, anyway, to the need to cite digital sources). Back to fixing, fretting, fudging. Only EndNote is liable to overwrite your corrections: surprise!

And yet the dream of escaping such frustrations through technology won’t die — and shouldn’t. It seems only fair that our Babylonian predicaments be ameliorated, at least somewhat, by computers–our vast interconnected ever-churning never-complaining prostheses.

George Mason’s Center for History & New Media (a seemingly ever-inventive group) has had a promising tool chugging down the pike for some time that offers a new glimmer of hope. It manages citations and other research information in a web environment. When first I heard about it , they were calling this tool Firefox Scholar – now it’s been rebranded to Zotero: a term loosely based on the Albanian word for acquiring/mastering. Whatever – let’s trust that this promising project will prove to be less obscure than such an etymology.

From what I can tell from the description of Zotero, bennies include:

  • Ability to capture & store PDFs, files, images, links, web pages in a browser platform.
  • A range of organization options, including folders & tagging & ’smart’ collections.
  • iTunes-like interface.
  • Spotlight-like search-as-you-type.

…and, most relevant here:

  • Ability to sniff out a citation on a web page & capture it to your library
  • Citation export.

Zotero works with Firefox to sense when you are visiting a page with full bibliographic data (like an OPAC) and offers a little book icon; click it, and citation material comes flying into your computer.

Zotero in a Firefox browser bar

Since suddenly there’s a profusion of browser-based store-organize-share tools (SOS?) for scholars, Zotero will be all the more valuable if it can be jiggered to play with academic social software like Connotea or the aforeglimpsed CiteULike – and, while we’re dreaming, if it can feed stored items into networked repositories. Since it’s free and open source, one can imagine any kind of evolution for this “next generation research tool.”

Will researching and citing on the web actually get a little easier? We’ll see – Zotero is in private beta now, but should be in public beta by the end of the month.

Dear PennTags

Wednesday, June 14, 2006

Please don’t take this the wrong way. It’s not you, it’s me. It’s just that I was so excited to meet you — I had so many preconceptions, I had heard so much about you. And then when I actually met you, you seemed kind of standoff-ish and, I admit, sort of different from what I thought you’d be. But I still like you — don’t get me wrong.

When I first heard about you I thought: finally! A way for scholars to tag up an OPAC as well as electronic journals — a tool enabling social discovery by a defined community swimming through carefully selected resources. In short, I thought you’d be more sophisticated and more focused than del.icio.us. I thought: finally, it will be easy for a specific class or a set group of scholars to sift together through premium resources: collaborative discovery centered on the information source most unique to Penn, the Penn library.

But when we actually met you were so confusing (and I’m not alone in thinking so). Your home page hit me right off the bat with pictures of birds and a big tagcloud, a cloud that seemed more random than representative:

PennTags

What does it mean that Lauder_Institute_Area_Studies dwarfs united_states? I think it means that you haven’t gotten around enough to render a representative or even very interesting snapshot of the Penn community — so until you do, I suggest you don’t wear this raw data on your sleeve.

I know your type — you’re enamored of presenting data as it comes into your system — makes you seem extra dynamic. But until you get more play, you’re not delivering useful information with your overall clouds and ‘latest tagged’ lists. In fact, I doubt such look-ma-it’s-web2.0 features will ever be that useful to anyone, however big you get.

I guess my point is, first impressions are important — so you should use your home page to introduce yourself, rather than show off. I finally found my way to the “About” page (tiny button, my friend! why so shy?), a page that finally addresses the question, “What is PennTags”? And here you got kind of weird. You started pretending that del.icio.us doesn’t even exist. Or, to put it another way, you said almost nothing about yourself that couldn’t be said about del.icio.us. You bragged:

Have you ever bookmarked a web page and then can’t find it again in your mass of bookmarks? The beauty of PennTags is that it allows you to organize your bookmarks/resources exactly the way you want and it lets you share them with others. It’s both personal and portable.

Well ok, but I thought your beauty, PennTags, would be that you would be different from del.icio.us — that instead of letting anyone tag anything just ‘out there’ on the open web, you’d let a defined community — namely, Penn and sub-communities within Penn — tag things that are available by virtue of being at Penn. Otherwise, why reinvent the wheel? Ignoring the popular kid & just pretending to be him won’t impress many who are likely to be drawn to you in the first place.

Jumping into some of your posts, though, I found that your users are in fact using you as I thought they might — they are tagging your library’s catalog records, and they are tagging articles available in your library’s database, as well as outside websites. Following these links put me on quite different adventures.

When the item tagged is in the OPAC

OPAC tagging is pretty darn sweet — and you pulled this off with Voyager, no less. When I clicked on a post referring to a book on Godard, I didn’t get to access the book (obviously), but I was routed to its catalog record, and I found that the user-contributed tag and summary had made the trip with me, and appeared in a yellow box right in the OPAC:

PennTags

After seeing this trick, PennTags, I started to warm to you. People who know nothing about you or about tagging or even about bookmarking are bound to wonder what these yellow notes are on showing up on the bottom of OPAC records — maybe you’ll recruit more users this way, and get smarter. At the very least, you’re giving library records a sense of life; any way to enliven the OPAC with user contributions is a-ok with me.

But I wonder how you’ll manage any significant success — imagine ten such yellow PennTag records clinging onto a record in the catalog. You’ll have to be careful to keep a balance between authoritative metadata and folksonomy, between succinct official catalog records and long contributed summations.

When the item tagged is in a journal database

What about when someone posts and tags a journal article in you? I clicked on such a record, and, not to my surprise, got dumped at a Penn database log-in screen — which means that if I were affiliated with Penn, I’d go right to the article. Since I’m not, I see nothing — no user summations, no fun yellow boxes. This begs the questions again about who is using PennTags, and for what purpose. Frankly, I felt ignored by you here. If you are of, by, & for people behind Penn’s walls, then perhaps you should live behind that wall too — it’s not particularly interesting, for someone who can’t get at resources, to see how they’re being tagged.

That said, clicking on the title of another posted article, a JSTOR title, took me — much to my surprise — right into the article; I was ushered straight in thanks to my own institution. That experience started me dreaming again, PennTags, about an openURL world, filled with cross-institutional tagging of academic assets. At the very least it renewed my hope that I might find you of use while waiting for my own library to get tagging off the ground.

When the item tagged is an outside website

Then there are the outside websites that are being posted and tagged in you, just as they’re tagged in del.icio.us. As you know, I think it’s redundant and a little silly to use you just for this purpose, but I’m also warming to the idea of tagging websites right alongside OPAC records and journal articles. You see, PennTags, I’m open to persuasion; you just haven’t taken the time to articulate the benefits of this mix. You’re actually allowing your users to bring resources into your library, in a way. Rather than reinventing a wheel, you’re melting a wall. That’s a big step, and it’s one to think about — not take for granted.

Yeah, inside/outside tagging has plenty of potential, no doubt about it, but here again I’m a little let down. Here’s the deal, PennTags: I think you could be a little more proactive about what academic tagging could or even should be. Could it be hierarchical? Might it be user-faceted? Are there ways to enforce best practices? By offering little firm guidance, you’re once again playing pseudo-del.icio.us, leaving everything up to an undifferentiated swamp.

But look around, PennTags: you operate in a world full of productive distinctions. You even list some, shyly — they get buried in a section called “More Tagging Tips”:

PennTags

How hard would it be to invite your users to think along these lines, gently, somewhere in the tagging process? Can tagging evolve to something beyond a single ‘fill in whatever you want’ open field? I know you don’t want to come across as bossy or proscriptive or — god forbid — librarian-like, but I wonder if just a couple of criteria particularly useful to your academic community (say Topic and Relevance) could be quietly promoted, just as del.icio.us already subtly promotes tagging uniformity through ‘recommended tags.’

The thing to keep your eye on is use: how these tags are used by actual populations, in actual classes or other sub-groupings, for actual purposes. I find it pretty weird that you’re asking people to think about tagging with an uncle in mind — unless this is an uncle at Penn. Relevance is a subjective and fairly meaningless call against a wide-open horizon (where many uncles live), but within the context of english242 students working collectively on a presentation about Keats’s illness, say, “Relevance” becomes a powerful way of characterizing a resource.

Imagine, too, if you allowed any kind of distinction among users — how interestingly instructors and students, say, could interact within a classroom framework as what they are (in the institution’s eye) through you. Or professors and research assistants. Or members of a class and those outside the class. Or librarians. Or alumni. These distinctions shape the day-to-day life of your campus, and though I suspect you imagine yourself to be leveling the playing field in exciting new ways, you don’t have to dumb the field down that much. Nor do user distinctions need to control the way people use you. Building them in would only help when it become desirable to browse or subscribe to the tagging work of a certain subset of the campus community. Here’s your advantage over del.icio.us: you operate in a circumscribed world organized around definable purposes, roles, means, events.

I think you’d be even cooler if you presented yourself as not just another collective knowledge base, but as the way that only Penn could make the knowledge of the world work for definable ends. That’s why I think your most promising feature is ‘Projects’. Right now you only allow one owner post to a given project, but maybe in the future you’ll loosen up and let many users work on a given project — and maybe even specified classes of users. Then, I suspect, the RSS functionality you’ve already built in would start to be useful not merely to the curious, but to a much more involved user-base: the tasked.

Well, PennTags, you can guess by the way I’ve gone on here that I actually am pretty attracted to you, and I look forward to seeing how you mature. You’re raising awareness of tagging in academic settings — and you’re not just sitting around wondering about what that might mean — you’re actually putting tags into motion. That’s the only way any of us is really going to learn how this 2.0 phenom might work for us. So — way to be, & keep in touch.

Your PennPal,
Mark

By indirections find resources out

Tuesday, June 6, 2006

OCLC’s recent report College Students’ Perceptions of Libraries and Information Resources resonates a bit with the Al Gore slideshow movie I saw this weekend: it deploys lots of slick graphs and charts to frame information that can only be received with dismay.

The almost 400 students surveyed by OCLC think of commercial search engines as a perfect fit for their lifestyle and their needs, and they turn to them first whenever looking for information. The respondents respect the libraries, and feel that they can find quality information through them, but they almost never delve into library websites first to find information. Their instant ‘brand’ identification for libraries is ‘book.’

In short, libraries seem to exist as a point of last resort in the minds of many college students — a complicated, confusing, sometimes outdated facility to be approached for information only when Google fails. The pull-quotes in the OCLC report are inflected with grammatical errors, just to rub salt in the wounds. Rampant illiteracy or OCLC sabotage? You decide:

OCLC survey

OCLC survey

OCLC survey

Hidebound notions of what academic libraries are actually doing these days make it all the more important to find new ways to expose services. The LibX Firefox Extension, for example, embeds links to library resources in a variety of more user-friendly websites (their screenshots show little logos popping up in Amazon and Google searches, as well as New York Times book reviews). LibX is another one of these nifty localizing extensions that Firefox has inspired — and it works with COinS.

A less technical way of exposing those expensive electronic library services is to take particular note of how students actually learn about them, according to the OCLC study. Have a look with me at this chart, which breaks down the ways college students (and broader populations, for comparison’s sake) find out about electronic information sources *besides* through search engines:

OCLC survey

Librarians themselves are way down on the chart — and they rate even lower for the non-college crowd. So what’s at the top? ‘Friends’ and ‘Links’: more reasons to make it easy for students to create, store, and share links to library resources. But look at who’s coming in third–beating out other media, advertising, and my cousin who works for CNN: Teachers. Teachers, way above librarians. While librarians are increasingly framing themselves as teachers — the ‘instructional librarian’ is a familiar role and position by now — such data suggests we think of teachers as front-line librarians, or at least librarian-proxies.

Consider, too, this chart showing “Cross-referencing Sources to Validate Information”:

OCLC survey

Though it’s hard to see in this small version, the chart shows that college students (in green) and the general population (in orange) validate the information they find on sites most often by comparing other websites with similar information (80-82%). But in second place, at least for the college crowd, here comes our unexpected resource champ, the Teacher, with an impressive 78%. That source of information validation beats out checking library materials (64%) and checking with a librarian (36%).

Given their relatively exalted position on the information food chain, teachers need all the training and support they can get from librarians. We should throw out the assumption that just because someone wrote a dissertation, he knows all about how to use library resources and can pass on this wisdom to students. The ground is changing too fast, and the unsupported instructor will not have time to keep up. That’s not his job–it’s the librarian’s.

Case in point: a European history and philosophy librarian mentioned to me the other day that Blackwell Synergy is becoming a significant point of access to important journals in his areas. And perhaps you thought of this database (if you thought of it at all) as focused on science?

The point is, in a healthy educational environment, a teacher will be backed up with well-selected electronic resources that are ever one click away in the course management system, tended and manicured by librarians. This is indirect, ongoing training – for teachers as well as for their students – in the use of resources, delivered at the point where it’s most needed. Such targeted support could actually minimize class disruption (no need for librarians to come point out where resources are, if they’re already being well-delivered), while letting students hold on to the fantasy (which they evidently need in these perilous times) that the library is all about books.

LibraryThings

Thursday, May 18, 2006

If it once took a special type of person to be a library cataloguer — one comfortable in back offices & around heavy rule books, methodical, perhaps quiet — now everyone wants to get in on the action. The rise of self-cataloguing has been one of the more inexorable effects of digital media. The discovery within cataloguing of social connections now seems to be another.

Of course long before all this web stuff we were being trained to collect content in various forms, and value assemblages as inherent identifyers of taste. Siva Vaidhyanathan’s recent presentation at Columbia’s Correcting Course forum carries this age-old ritual into my lifetime; he talks about a mass paperback industry that marketed (unread? unreadable?) books as class identification… VHRs marooned on shelves, monuments of their owner’s cinematic pleasures …. the fine art of of mixed tapes, now supplanted of course by playlists….

The fetishistic Mac application Delicious Library wraps a collection database into a pretty package so… so… well, so you can have a virtual representation of all your books — all your video games — all your DVDs, right on the hard drive of your computer. Scan the item’s UPC barcode with a webcam, and presto, metadata from Amazon flies right into your own library database — including cover art. Awesome, right?

Ok it’s actually fairly purposeless. You can assign items ratings, and you can designate their location in actual space, but I doubt many are actually relying on Delicious Library to find stuff. If you lend out an item to a friend, you can track it with DL — but really, if you’re lending out more than you can remember & your friends can’t be trusted to return things, well, maybe a policy change is in order. And DL’s symbiosis with Amazon’s API is worrisome — Amazon-hosted One-Click Shopping recommendations are just a click away.

But describe Delicious Library to someone, and it’s possible that they’ll turn cataloguer right in front of your eyes: huh, my things in a database….

Delicious Library

^ Finding Nemo and other treasures: virtual shelving in Delicious Library

LibraryThing — straight outta Portland Maine, btw — is a web app significantly tastier than its desktop cousin because it networks people’s collections. LibraryThing still invites you to play with representations of your books on virtual shelves for yourself — but now you’re doing your assembling among & amid a myriad of intersecting libraries. Now metadata is up for grabs, unregulated by Amazon or any other detached entity: social tagging comes to the fore. You can hear the 2.0 pitch — it’s del.icio.us for books! — and lo, tagging abounds.

But just around books — LibraryThing valiantly resists the siren call of other media on favor of bibliomania. It links its bibliographic records to OCLC’s Find a Library as well as Amazon and library OPACs via the good old Z39.50 client server protocol, and hosts discussion of titles among those who share it in their libraries.

In short, if you love books, LibraryThing seems an unrigged communal playpen, as well as a self-inventory tool. It provides branching recommendations based on mutual ownership, not Amazonian purchases. It presents clouds of a book’s common tags unseeded by commerce. It offers RSS subscriptions for any given tag, so you can track books as collections, not products, come in.

LibraryThing Screenshot

^Adding to my library in LibraryThing: I enter in a title, and LT checks it against a bibliographic database of my choosing. And I choose LC! No snappy webcam scan, alas, though barcodes are acceptable identifiers.

LibraryThing screenshot

^Now that I’ve added my book to LibraryThing, I can see how others have tagged and rated it. Looks like some people don’t care for literary theory, and yet they own the book. Go figure. This title hasn’t been reviewed yet in LibraryThing, but many have.

LibraryThing screenshot

^My so-far small library (the books on my desk right now).

Most intriguing of all, LibraryThing has recently added Library of Congress subjects into the mix. The premise is that user-created tags can coexist with library-tended subject headings, that folksonomy can play off of controlled hierarchy. At times, tags and subject headers coincide. In other instances, they hardly ever do. LibraryThing has only just embarked on this odd tango, and who knows where it will lead — but at the very least it should generate some intriguing friction.

LibraryThing screenshot

^Exploring the tag “literary theory” on LibraryThing. I see heavy users of this tag, works most often tagged by the term, and the latest books into the system so tagged (and I can subscribe to the tag via RSS). I also see related LC Subject Headings, in case I feel like faceted browsing.

Already user-tags are sitting up a little straighter and paying more attention to themselves. Discussion on LibraryThing’s metablog, Thingology, has been spurred by subject headings to characterize — dare I say categorize — tags. Discussants finds tags to fall into recognizable camps: personal location notes (”living room,” “office”), personal use tags (”read,” “damaged,” “study”), broadcast opinion tags (”excellent,” “lame” ), and personal subject tags (anything in the uncontrolled descriptive universe). The half-hazard felicities of user-tag surfing is getting measured right up against the precision of subject headings.

All this driven by Tim Spalding, a web developer, not a librarian. Or is he? Should we settle for patron?

The means of conception

Monday, March 27, 2006

Nothing odd will do long. ‘Tristram Shandy’ did not last.
- Samuel Johnson

Wrong! — I gleefully thought, way back when I was slogging through an eighteenth century literature class in college — bored silly by Johnson’s lumbering, moralizing, psuedo-Oriental Rasselas, and, in contrast, completely delighted by Lawrence Sterne’s goofy carnival of the mind, Tristram Shandy. Wrong, you fat old authoritative Dr. Johnson, because here I am 220 years later savoring every Rabelaisian joke, every self-conscious pratfall, every typographic stunt of Tristram Shandy.

I had to admire the concision of the put-down, though. A quick slam of the sprawling, irresolute Shandy.

With the wisdom of age, I now am ready to concede that Johnson was half-right: nothing odd does “do” for long. Especially online. I’ll circle back to that emphasis in a moment — but first, let me submit that Tristram Shandy is far from odd, considered rightly. Part of the thrill of reading it in 1980-something *cough* was seeing evidence of postmodern friskiness that actually pre-dated the United States. Tristram’s obsessions stretched reflexivity back into exotically distant realms of bygone minutia (unlike the broad cardboard exoticism of Johnson’s Happy Valley). It seems that then, as well as now(-ish), conceptions were improbable, resolutions impossible; the world teemed with distraction, neurosis, and disordered influence; and authors invited readers to play games.

In fact, if we glance back at a couple of Tristram’s more infamous tricks, we might feel that Sterne’s techniques are getting less odd by the day. When our author despairs at describing the concupiscible Widow Wadman, and throws open his pages to the reader (here’s paper ready to your hand. — Sit down, Sir, paint her to your own mind—as like your mistress as you can—and unlike your wife as your conscience will let you…) — is this not collaborative authoring space?

Tristram Shandy blank page

And when the narrator, picking up momentum by way of a vegitable [sic] diet, sits down and charts out the loopy plot lines of the novel as it’s progressed so far, even dropping in anchor points so we can check his graph against designated passages — is this not, however tongue-in-cheek, metadata visualization, or a mapping of information flow?

Tristram Shandy plotlines

L–d! said my mother, what is all this story about? —-
A COCK and a BULL , said Yorick —- And one of the best of its kind, I ever heard.

Indeed, and though I haven’t read it (which is to hear it) for, well, many years, Tristram sticks with me–probably because I prefer open concoction to moralistic bullying, especially when it comes to narration. And this preference has had currency for a long time; Tristram Shandy has lasted just fine.

Yet Johnson’s other snap judgment — nothing odd will do long — seems to me all the more true in the virtual places we increasingly come crowding for intelligence. Which is not to say that there aren’t odd things online — far from it — surf randomly, and the web seems a veritable cacophony of twaddle diddle, tweddle diddle, –twiddle diddle, —- twoddle diddle, –twuddle diddle, —- prut-trut — krish –krash — krush. Not to mention diddle diddle, diddle diddle, diddle diddle — hum — dum — drum.

But nothing odd does much online: you can park the most esoteric idiosyncratic wonderfully strange material on the web, but if you want it to get discovered, if you want it to work, if you want it to have an effect — if you want others to conceive of it (a favorite Shandyword) — then you must enter into common language and assumptions. This is so obvious it’s practically a truism — and yet see how many times we learn the lesson, how difficult it is to get out of our own heads.

Two quick, fairly pedestrian examples: John Kupersmith’s wonderful Library Terms that Users Understand shows how befuddled users can be by the simplest failure of librarians to realize that words like “Index” or “Database” or “Serial” can mean next to nothing to my Uncle Toby, just wanting to know where to find that Popular Mechanics article. Or let’s say you’ve given an OPAC a cute acronym and now you invite my Uncle Toby to “search EUNICE!” My poor uncle Toby blush’d.

Or have a look at Dan Cohen’s equally simple but solid advice about climbing up in Google ranks. Search engine optimization has its share of murk to it, but the basic path to visibility is: don’t be odd. Use a domain name that describes your resource (”chinook” or “aeoleus” sound great — but what are you airing?), use keywords in file names (with mod_rewrites, if necessary), get linked by highly linked sites (meaning, be understandable, and get understood by a widely understood site).

If this all sounds like it leads to a world as flat and predictable as, well, Johnson’s Rasselas, that’s not what I meant, not at all. It’s just that you can’t be *merely* odd or unique if you want to *do*: you need the sophistication to hook into conventional terms, general assumptions, broadly shared expectations. This involves a double-motion that might as well be called self-consciousness. Tristram’s greatness is showing us how fun such contrivance can be. Sterne earns his pleasure (and ours too, he’s brought us jolting right along with him) when he sits back to marvel at himself, his magnificently clashing agendas: By this contrivance the machinery of my work is of a species by itself; two contrary motions are introduced into it, and reconciled, which were thought to be at variance with each other. In a word, my work is digressive, and it is progressive too, — and at the same time.

If it were all digression, Johnson would have been completely right about Tristram Shandy. But it is progressive too, which means that it sobers up just enough to realize, despite its irrepressible uniqueness, that above all things in the world, ’tis one of the silliest things in one of them, to darken your hypothesis by placing a number of tall, opake words, one before another, in a right line, betwixt your own and your readers conception.

Hogarth's frontpiece to Tristram Shandy

Mining the machines

Wednesday, March 15, 2006

Last year at the ARL symposium called Managing Digital Assets, I smiled inwardly to think of the grumbling likely to be kicked off by observations such as this by Donald Waters of the Mellon Foundation:

…what unites our interest in digitization and open access in a digital world is that the material becomes “processable,” or subject to computational processing. That is, the growth in the market of readers is not among groups of humans, but of machines, which are programmed to index, manipulate, mine, aggregate, decompose, and build up scholarly and other forms of content by algorithm. It is this machine “processability” that makes digitized objects and open access materials most valuable to scholars.

Protest, fume, rail against the subjection of your most exquisitely developed thought to the dumb imperatives of ones and zeros — Waters is absolutely right. You want influence? Or, more to the point, you want to avoid obliteration in the vast digital swamp? You’d better know how to demarcate, classify, and optimize your work for machine crunching — or find someone who does. And pray that the stewards of such crunching, the information managers you never thought about, have your best interests in mind.

All this occurred to me while reading a new D-Lib piece by Daniel Cohen, director of research projects at the very creative Center for History and New Media at George Mason University. Cohen also spoke at that ARL session, and at the time he sold me on Firefox scholar. His new article, “From Babel to Knowledge: Data Mining Large Digital Collections”, offers two nice examples of humantist-friendly manipulation of machine “processability.”

First: Syllabus Finder. Where was this godsend when I was inefficiently wandering around the chaff of the web, trying to crib ideas for my own syllabi? It’s a very sensible, very needed genre-based search tool. First, it defines “document classification” through a very simple dictionary of keywords endemic to syllabi (”assignment,” “office hours,” etc.). This classification is fed into Google through its API service, along with the search query, for optimized searches. The results can then be further refined through more automated analysis or combined with other search results.

I gave it a spin, using canonical writers from the Romantic era as search terms. To my happy surprise, good old Ashes Sparks & Hypertext, a six year old syllabus for a seminar I taught back in the day at UC Berkeley, kept showing up — and at or near the top of results. #1 for Coleridge, #2 for Byron, #1 for Wordsworth, #2 for Blake, #4 for Hemans. Yeah, baby! But we drop down to #14 for Keats, alas, and as for Shelley, he just kept coming up as a “fatal error,” an “Uncaught SoapFault exception.” So Syllabus Finder is a little buggy — but, dare we say it, a little poetic too. Maybe we’re just overly pleased by taking the silver for Byron:

Ashes Sparks is the second syllabus listed for Byron

I don’t know what to make of the way this tool seems to like the Ashes Sparks syllabus — certainly I indulged in no optimization — no thought about how the thing would be retrieved. The only distinguishing feature of that document, really, is that it’s been online steadily for six years. It’s just one of those Google-blessed mysteries. Perhaps cannier post-processing could promote syllabi more deserving of prominence. But Syllabus Finder works pretty well–I’d recommend it to a fledgling (and not-so-fledgling) instructor. As Cohen puts it, it does a surprisingly good job at achieving its modest goal – on most topics for every ten documents it retrieves, about nine are syllabi – and it has thus far found and catalogued over 600,000 syllabi, synthesizing a collection of course materials considerably larger than any created or maintained by a professional organization, educational institution, or library, or by any other effort on the web to aggregate syllabi.

A second and more complex treat today from the George Mason wizards: H-Bot. This is an automated historical fact finder that can field natural language queries. (Or at least ones that begin with ‘what’ or ‘when’ or ‘who’; it’s not ready to handle where, which, how, or why). The algorithm here is “question answering” — which involves the identification of relevant documents, some natural language processing (to interpret queries), and statistical/linguistic analysis of retrieved documents. (In addition to the D-Lib article, there’s more on H-bot here)

Playing with H-Bot is fun. When did Hitler die? The answer in an eyeblink, as the Germans say: April 30, 1945. When did Gandhi die? Here’s a quirk:

Fun with H-Bot

Well sure, but that wasn’t the Gandhi I meant. Interestingly, here’s what happens when I ask the same question but tell H-Bot not to “check trusted websites first”:

Fun with H-Bot

Here’s a case when the unfiltered swamp actually answered my question — or read my mind — better than “trusted websites.” Quantity over quality? Very sensibly, H-Bot demurs when I ask “Is God dead?” or “When did God die?” (”I’m sorry. I cannot provide any answer on that.”) But ask it “Who is God?” and H-Bot serves up a perky little answer:

Fun with H-Bot

Simple-minded? Sure. But viable. Arguments will rage, hairs will split, blood will spill, but our dumb machines have given us an efficient pulse of information in the midst of the cacophony, delivered by strategic sifting of great gobs of data.

Which brings us to a final point that Cohen makes about machine data-mining: “Quantity may make up for a lack of quality.” Even the most ardent humanist can’t deny: when it comes to information, we’ve got a whole lot of quantity these days. It’s how we draw from such quantity that counts.