The communal LOR

In our last episode, we beat up a bit on the notion of “learning object repositories” (LORs), wondering whether the well-meaning assemblage of modular bits and pieces of educational materials was actually a frustration of coherent teaching. Educational practices, after all, are still grounded in settings and customs that predate the digital on-demand world. We speak of courses, of curricula, of graduation; we cling on to learning as an unfolding, progressive narrative. And progressive narratives seem to be exactly what free-floating clusters of learning objects lack.

Haunted as I am by S.T. Coleridge’s Ancient Mariner and that ghostly character’s pseudo-progressive travails, I can’t help thinking of decontextualized learning objects as similar to the unearthly sounds that rise out of the mouths of his dead crew and swirl unfixedly about:

Around, around, flew each sweet sound,
Then darted to the Sun;
Slowly the sounds came back again,
Now mix’d, now one by one.

Sometimes a-dropping from the sky
I heard the skylark sing;
Sometimes all little birds that are,
How they seem’d to fill the sea and air
With their sweet jargoning!

And now ’twas like all instruments,
Now like a lonely flute;
And now it is an angel’s song,
That makes the Heavens be mute.

It ceased…

The Rime of the Ancient Mariner is heuristic to the core; it teaches us to teach through many spectacularly negative examples. Disconnection from community, the poem suggests, leads to a horror-mirror world of isolation: a world teeming with elements snapped off from the teleology of cause & effect. The Mariner butchers the bird, obeying some unexplained private impulse, and dooms himself to a world where wind is heard but not felt, or felt but not heard — and the same goes for companionship, morality, religion, expiation. Very dissatisfying. Those free-floating supernatural sounds — all that “sweet jargoning” — are momentarily marvelous, even Heavens-eclipsing — and yet they’re unreliable and of dubious value, to say the least. They don’t advance the plot; they just cease.

The Mariner’s original sin: ignoring community (which was, after all, so strongly fostered by that unlucky albatross). It’s a pretty trenchant sin; even after any amount of penance, he seems doomed to repeat it. He poaches the Wedding Guest, blocking this unwilling auditor from entering a communal wedding celebration (the poor Guest protests, to no effect, “The guests are met, the feast is set: / May’st hear the merry din….’”), and forcing the Guest, instead, to listen to a hard-luck story having little to do with its auditor, superficial appearances notwithstanding (“That moment that his face I see, / I know the man who must hear me…”).

Dore Mariner

And what in mute Heaven’s name does any of this have to do with learning object repositories? It seems that we’re learning the Mariner’s lesson all over again. The most thoughtful study that I’ve read about the uptake and implementation of LORs is the recent study “Community Dimensions of Learning Object Repositories,” funded by the Joint Information Systems Committee (JISC). The gist of this report is evident directly from its title: however energetically you go about building a constellation of durable, interoperable, reusable, and sharable chunks of teaching & learning materials, it won’t mean a thing unless you tailor it to the cultural norms and expectations of a user community. As the report observes in its rather British way, “pedadogical, social, and organisational factors have not been at the forefront in LOR development to date.”

A community shares goals, interests, practices; it draws on commonly available tools; it shares understanding of processes and concepts. The JISC study lines up and sets marching some hard questions bound to make any repository-builder squirm: What is the purpose of the LOR — ie, how does it serve its community? Who are key stakeholders in that community? In what broader context does that community operate? A LOR project that starts by grappling with such large questions stands a better chance of being organized by pedagogical goals and activities, rather than all the content it can cram into its great maw just because — like the Mariner knocking an albatross down out of the sky — it can.

Treating teachers as one big community is in many ways an absurdity, of course — we operate within a dizzying array of conditions and expectations, and with a variety of allegiance to vastly different sponsoring institutions. Nevertheless, it is at least a good step to consider how a LOR addresses whatever generalizations you may wish to venture about teachers as a community. This borders on a truism, but then again how many LORs truly meet an actual teacher half way? The JISC report hazards a few claims about teachers and the way they behave:

  • They have a very problematic relationship with metadata. Descriptive metadata can fail them when they’re hunting in the dark for objects. When submitting an object to an LOR, they’re not trained & often not helped in the fine art of quality metadata appendage. More on this issue here, btw
  • They often prefer to create their own learning objects, rather than patch someone else’s in. On the scale of teacherly chores — grading, planning, meeting, exhorting, reviewing — creation of new materials for one’s class is actually on the fun side, one of the best ways to stand out and inspire, to make your class into a unique event. Even if you’re not so handy with making new things, by dipping into the well of pre-made pieces you risk “loss of educational narrative,” as the JISC report puts it (and how many teachers got into the business because of their assemblage skills anyway?). Educational narrative may be more important to individual-obsessed humanists than object-oriented scientists, the report notes in passing.
  • Teachers like incentives just like anyone else, and an LOR would do well to supply some. They could be in the form of recognition or perhaps an even more tangible reward for contribution, or proof that use of material from the LOR will make a teacher more effective. If the LOR is keyed to the goals of the institution that pays said teacher, that’s a fine reason to use it.
  • Despite all impediments, teachers, bless ‘em, are a persistently open-minded lot, at least according to the JISC report: “In general the interviewees have a positive attitude to reuse, and most have stated that they are willing to keep trying to reuse material, despite the difficulties they have faced.” This is a suggestion that LORs have some time to wake up to the willing worlds around them in all their glorious particularity.

And let’s close, on that brighter note, by nodding towards LORs that do seem engaged with the communities that use them, on some level at least.

The granddaddy of LORs, LC’s American Memory Project, set an early standard by layering its gigantic offerings with a “Learning Page… especially for teachers” : a collection of “teacher created, classroom tested lesson plans… [to] jumpstart your use of primary sources,” a rundown of curricular themes, various strategies to promote critical thinking, and professional development materials.

The National Science Digital Library corrals its resources for various imagined players: K12 Teachers, Librarians, NSDL Community Members (you know who you are), University Faculty, and First Time Users. Each of these groups has customized “pathways” through the library, as well as a fistful of fairly active blogs grouped by audience category.

Finally, the December issue of D-Lib describes a geoscience LOR named “Teach the Earth” built by the Science Education Resource Center at Carleton College; the article is encouragingly titled, “Digital Library as Network and Community Center: A Successful Model for Contribution and Use.”. The authors state, flat out:

A successful educational digital library is as much a social process as a technical problem. It requires creation of a culture that fosters contribution to and use of the library. We have addressed creation of this culture by working with NSF-funded projects focused on the professional development of geoscience faculty as teachers. Each of these projects partnered with SERC to create its project website. They seek two primary services in this partnership: 1) tools, resources and experts that assist them in creating high quality project websites and 2) placement of their resources in a network that enhances dissemination and use of their work. We created a win-win situation that yields rapid production of content for the library and facilitates use, by allowing our partners the flexibility to meet their own project goals while contributing to the overarching digital library.

Let’s see: professional development, support of individual projects with an eye towards incorporation, maintenance of a consistent level of quality, enhancement of dissemination and recognition of work — sounds like a happy LOR to me, one that engages its users, rather than stunning them.

The SERC authors claim that a full 25% of all geoscience faculty in the US (the audience it bothered to target) now use Teach the Earth: now that’s uptake!

Learning object(ions)

The pendulum has certainly swung far away from the early days of digital learning happytalk, which was all objects all the time. In them dotgone days, “strategic futurists” such as Wayne Hodgins proclaimed that “the ability to learn and apply the right stuff faster is the only sustainable competitive advantage there is for any of us” — and the way to win was to call up that stuff, those digital learning objects, pronto. The “learnativity revolution” would be powered by gobs and gobs of “terrific resources” marked up by Learning Objects Metadata, dressed up for discovery. Powering all this (remember when ‘powering’ was a verb?): the Lego (TM) metaphor, as touched on by a 2002 D-Lib article called “Metadata Principles and Practicalities”

In a modular metadata world, data elements from different schemas as well as vocabularies and other building blocks can be combined in a syntactically and semantically interoperable way. Thus, application designers should be able to benefit from significant re-usability as they gather existing modules of metadata and ‘snap’ them together much as individual Legoâ„¢ blocks can be assembled into larger structures.

Legos at SXSW

Though futurist Hodgins (a co-author of the D-Lib piece) is avowedly “wandering and pondering as he scours the world for trends and technologies most of us will not see for the next 18 months to 10 years,” an anxious world is still waiting for the followup to “Into the Future: A Vision Paper” (2000), in which “the rules of Newtonian physics have been superseded by those of Learnativity, where the gravitational pull of creating new knowledge determines and shapes the actions of everything within.” The process, as described in this Vision, is at once entropic and plastic:

Breaking knowledge down into information objects, the smallest useful chunks of information, frees it to be used again. Think of this as creating and assembling Legoâ„¢ blocks. Whether you’re assembling a bridge or a house or a spaceship, you use the same Legoâ„¢ to form a “learning object.”

The notion that newly created digital objects can upend physics may seem to belong to the discard pile next to sock puppets and Netscape 4.0. And yet the Legoland learning world haunts us still. We have a deeper sense of how hard it is to transform (let alone revolutionize) education with modular resources, but the web brims with learning object repositories that are palpably yearning to be engaged by actual teachers.

Every once in a while, a teacher even urges their use to colleagues, such as this 2006 endorsement by a Professor of Geomorphology writing in Ariadne:

Reusable educational objects (REO) or reusable learning objects (I prefer the wider term) are becoming an area of interest in education, especially in Higher Education. This stems from the ideas of reusability from ‘mass’ e-learning in the USA and from there developed the Sharable Content Object Reference Model (SCORM) as well as some resources such as MERLOT (Multimedia Educational Resource for Learning and Online Teaching). This tends to have full resources such as a slide set or a Web page. Lecturers should try this as there may well be all sorts of useful material available within the archive, often free.

There is a lot of faith packed up here — in a preferred definition of a ‘learning object’ (a definition that tends to crumble when you push on it), in the value of reuse and mass broadcast, in the existence of “all sorts of useful material” to be unearthed within an archive (for free!). All the more reason to wonder and ponder the extent of actual use of learning object repositories. Are current offerings honoring the enthusiasm of our good professor of Geomorphology? If not, is there something fundamentally flawed in the idea of freely recontextualizable learning objects?

I recently took a quick sip of MERLOT (“a free and open resource designed primarily for faculty and students of higher education”), the learning object resource singled out by the good prof, and found it to be… rather flat. Though it offers ‘peer review’ filters and advanced searching, MERLOT failed me when I came into it with a specific agenda: to find a peer-reviewed resource that would supplement teaching of William Wordsworth’s poetry. No results found. Was that too specialized? Then how about something about landscape in art or literature? How about anything at all involving the keyword ‘landscape’? Finally, one peer-reviewed result found: oddly enough, an FTP tutorial (author unknown, section 508 non-compliant).

When I approached MERLOT without an agenda — that is, in ‘browse’ mode — I was again underwhelmed. Looking to see how available resources might be engaged, I picked through assignments posted on the site, and found one rather expansively called The British Empire. The gist of this assignment: go to an outside website, read sections of it, and write a 5-7 page essay. This outside website itself warns: “This site is not a rigourous academic site! I’m sure there are plenty of mistakes and oversights on my part; for which I apologise in advance! My interest in the subject is purely that of a personal journey of discovery….”

After a few disappointments like this, the sun was setting on my hope that MERLOT had much to offer me. To be sure, like our Geomorphology prof, the site has nothing but the best intentions. Its solicitation of assignments and personal collections offers some way into the “15818 materials” (as of this writing) somewhat chaotically gathered. In other words, there’s effort to bring the wisdom of learning communities to bear on these bits and pieces– to encourage peer review, share insight, suggest deployment. ‘Gold level’ users of the site (rated by submitted materials, comments, assignments, and collections) would surely attest to MERLOT’s value.

But the effort seems limited by the objects model embraced by past futurists. “Materials” are gathered, and activity is to follow: the activity of wrestling them into actual curricula in a meaningful way. Put it this way: I would have to be a fairly passive teacher if I were satisfied with the results and suggestions I unearthed on MERLOT. I would have to be willing to suspend the gravitational pull of my own course — sacrifice context, really — on order to incorporate an object impervious to what came before in my class and what would follow: a second-handedly endorsed learning resource with priorities and emphases that may be disconnected — even inimical — to my own.

***

At the heart the idea of “learning objects,” then, is believe in modularity, as if teaching were so much recombination. If you’re in a really dark mood, you might consider the model of replaceable parts as emblematic of the “Information University” vividly deplored by Marc Bousquet a few years ago. In the nightmare Information University, labor is made up of so many interchangeable parts, available on-demand and easily replaced:

Constrained to manifest itself as data, labor appears when needed on the management desktop–fully trained, ‘ready to go out of the box,’ and so forth–and after appearing upon administrative command, labor in this form should ideally instantly disappear.

Who would consent to work this way? Replacements for the tenured class, of course, that market-immune anachronism that is vanishing like so many glaciers:

Dispensing with the skilled professoriate is accompanied by the installation of a vast cadre of differently-skilled workers–graduate students, part-time faculty, technology specialists, writing consultants, and so forth.

Just the sort of workers lacking the training and time and perspective, I would suggest, to assemble a coherent and effective pegagogy out of a massive pile of Legosâ„¢.

On activating digital collections

I was on the verge of crafting a blog entry expressing fears and reservations about Second Life when it occurred to me that skepticism has gotten too much play here of late. I’m really not so grumpy. To try to prove that, I’ll slap down here a few paragraphs from a mini-manifesto I’ve been working on lately. It’s lumpy and unfinished — but it’s hopeful.

***

The digitization of learning objects does not, in itself, foster study of them. Even the richest digital library teaches little if it is not selectively engaged by pedagogical context and activity. Conversely, a learning environment that fails to incorporate available resources best suited to its purposes courts hermeticism and limitation. We should be committed to activating digital collections — exploring mutually beneficial relationships between collections and learning environments — through the integration of digitized material with new modes of study and dissemination.

The best digital learning tools may well draw on discrete digital collections in different ways — often within the same environment. This is no surprise: just as no one pedagogical application could exhaust the possibilities of a robust digital library, it is often the case that no one collection satisfies the evolving or multiple purposes of a sophisticated learning environment.

At a university, such tools should be conceptualized in consultation with faculty, consultations that focus on teaching methods and goals. Identification of relevant collections to draw upon (at an institution’s library and from the wider world) is an important follow-up to this impetus, akin to identification of the software that will run the project. Just as some innovative projects now run on proprietary as well as open source software, and may mix a number of microapps into one environment, so should we draw on a range of collections that best engage the purpose of a given project.

Existing digital collections, more often than not, consist of material restricted for certain uses or limited to certain audiences, in compliance with licensing and codification. Projects that engage diverse existing collections are likely to require special permissions and/or an access architecture of no little complexity (material variously available to various populations): negotiating this variety can be difficult, but it nevertheless ensures that pedagogical goals, and not the restrictions of any one collection, shape a learning environment.

In addition to existing digital collections, heretofore unpublished or uncollected assets may vastly improve and distinguish learning environments. This ‘dark’ material may be created and owned by individual faculty members, or held in reserve by public or private enterprises; its use may be open to negotiation, which takes no small amount of time and effort. In some cases, onerous restrictions or the simple lack of relevant material may drive the crafters of educational environments into active production of new assets for a given project: videotaping new interviews, recording new performances, capturing new creations. This active creation also requires a lot of resources, but the advantage here is that content can be produced with permissions and licensing optimal for a learning environment.

The heterogeneous provenance of collections means that any producer of digital learning tools has an active interest in understanding and promoting standards of interoperability. We also have a stake in open access movements: a collections landscape less hedged by restriction is a landscape that will offer a fuller array of elements for the tools we build. Whenever possible, our projects should be made open for access and use beyond any conceptualized engagement; this maximizes the often extensive investment of an organization in any given project, and inspires the holders of potentially useful collections to match our lead.

Finally, the fungible quality of digital material means that it is often transformed through incorporation into a learning environment. As it is used, it changes– through recontextualization, annotation, or other user modifications. A project that begins by drawing on discreet collections may thus become a unique collection itself, reflective of assignment-related engagements of a given community. Instructors may shape materials in a certain way, or supplement it over the course of a term. Student work may be archived in the project and in turn made available for future iterations of the project or outside use of it. Evidence of active study may thus consist of transformation of material in the environment; it could also be fresh material generated or uploaded by students. Many of the most interesting educational environments will in this way prove to be ‘two-way’ collection areans, necessitating thoughtful policies about ‘outputs’ as well as ‘inputs.’

***

For an exemplification of some of these points, I invite you to take a little tour of the Havel at Columbia site. Here is a digital melange of:

  • Columbia University Libraries holdings (special collections, institutional archive, media holdings)
  • Donated commercial material (documentary film selections, CNN news archives)
  • Donated privately owned material
  • Material purchased for the project (CORBIS & Getty images, video archives)
  • Material created for the project (video interviews conducted with Lou Reed, George Soros, et. al.)
  • Campus events videotaped during Havel’s residency
  • User ‘notebooks’ used to assemble and annotate assets into multimedia essays and demonstrations

Unavoidably, some of this material is restricted to students and instructors at Columbia. But whenever possible, we’ve opened things up for universal access, and encouraged those participating in this project to do the same.

The next step, I can almost hear you thinking, would be to release the collection of publicly accessible material on this site under a CC license. We’ll get there, I’m sure of it. See? Hopeful!

The U of CitizendiUm

If you agree that Wikipedia presents more thorns than roses to academic experts, you have good company: one of Wikipedia’s two founders.

The split between Jimmy Wales and Larry Sanger has a certain Old Testament character: Wales (the Web 2.0 brother) reigns over the miraculous worldwide flourishing of the anonymously and communally edited encyclopedia that nobody predicted, while Sanger wanders in the web wilderness, in stubborn pursuit of distinctly pre-2.0 constructs of expertise.

Nupedia, Sanger’s original attempt to build an expert-authored online encyclopedia (and the predecessor of Wikipedia) crashed and burned. Now Sanger’s back with a similar idea: a “progressive fork” off of Wikipedia called Citizendium. His vision of harnessing “educated, thinking people who read about science or ideas regularly” into rival encyclopedia generation awaits you here.

In Sanger’s new scenario, regular Joes and Janes would be welcome to pitch into Citizendium as long as they deferred to ‘editors’: subject-area specialists who “meet certain benchmark requirements–the same straight-up credentials that the offline world relies on.” These expert editors would claim the right to patrol topics by flashing credentials. If several editors with the right credentials claimed a topic, well, “the more the merrier”: disputes among them would be settled “by discipline-oriented editorial workgroups” that would be “staffed only by editors.”

Wikipedian anonymity is quite obviously out of the question here. If the world of Wikipedia is mythically flat — built by faceless if not selfless peers — Citizendium is stunningly hierarchical, as if brandishing one’s identity could settle most any question of authority. One can easily imagine, though, a “straight-up credentials” demolition derby: institutions impugned, publications trashed, countries belittled, research areas broadswiped. If the offline world relies on credentials, it also relies on heterogeneity, microclimates, and quite local constructs of authority.

Citizendium would begin by mirroring Wikipedia, and, presumably, refine this populist chaff into premium wheat. Expertise standing on the shoulders of undifferentiated pygmies, as it were. And since Citizendium content would be freely available under the GNU Free Documentation License, Wikipedia could in theory suck the refined content back into itself, without directly compromising on its disdain for egghead experts.

The reigning smackdown of Citizendium is Clay Shirky’s blog post last month entitled Larry Sanger, Citizendium, and the Problem of Expertise — a precise attack that drew a defensive response from Sanger. I generally agree with Shirky, who sees disaster looming in Sanger’s dream of a self-certifying expertocracy shorn of institutional context. Shirky’s concluding dismissal, however, gives me pause:

Sanger is an incrementalist, and assumes that the current institutional framework for credentialling experts and giving them authority can largely be preserved in a process that is open and communally supported. The problem with incrementalism is that the very costs of being an institution, with the significant overhead of process, creates a U curve — it’s good to be a functioning hierarchy, and its good to be a functioning community with a core group, but most of the hybrids are less fit than either of the end points.

Such categorization is ominous for any of us skating the half-pipe of that ‘U’: those of us, that is, applying social software to learning environments. Ours is a hierarchical world, we want to build communally supported processes: are we doomed to hybrid mush? Admittedly, even the most starry-eyed 2.0 prophets have trouble describing how communal software is to work its magic, once it’s scooped out of the vast flickring seas and let loose within the tiny microclimate of a classroom. Yochai Benkler, for example, says much about networked production of educational texts, but little about peer production within a class (in, for example, his article Common Wisdom:. Peer Production of Educational Materials).

If social software depends on scale — the happy fact of human diversity that guarantees that someone, somewhere, is bound to perform a necessary function — then what happens when your field is winnowed down to, say, eight bright-eyed students with the same major? If your software is thoughtlessly cribbed from a quite different environment, one that depends on scale or interconnection that is foreign or even inimical to a classroom, you’re courting failure. Shirky’s notion of situated software — “small, purpose-built apps” — is well worth bearing in mind in this respect.

Whatever the tool it’s using, customized or off-the-rack, a classroom exists in a microclimate that consists not just of a gaggle of students, however skilled and productively interactive — it also contains a super-entity, an authority akin to Sanger’s editor: the credentialed teacher (and plenty of other shadowy figures behind her — but we won’t go into that here). Whatever is peer-produced in such an environment will be some fairly complicated blend of authoritative fiat and collaborative discovery. It will be as forced as it is fortuitous — a provenance quite different from Wikipedia, but perhaps a bit like Citizendium. However quixotic Sanger’s dream of expertise within a collaborative framework may seem, and however displaced onto a grudge match with Wikipedia it may be, it is worth tracking from the curvy heart of the U.

Taking notes

Yo, can I borrow your notes?

Harkening back to the salad days of college, I seem to remember a free-floating faith in the power of someone else’s notes to fill in cracks of attendance & attention. I doubt that much significant learning took place in power-cramming sessions entirely reliant on someone else’s diligently indented transcription of wisdom. But I’m struck now, thinking back, by the instinct to herd together in such situations.

A study tool named stu.dicio.us has recently made its debut, promising del.icio.us-like value through aggregation of communal effort. Now maybe some stranger from West Virginia Tech will save you from the consequences of having slept through Chemistry. Or maybe that concept your prof seems so fond of has been dropped in another class somewhere, in a context just different enough to fuel your next paper. Or maybe you can meet that hottie on the far side of the lecture hall because you’ve done a search limited to your school and this class and lo & behold here you both are, believing in the power of networking your notes.

Sharing notes is not cheating, insists stu.dicio.us. Everyone should have every advantage possible in increasing individual knowledge. The site rather mysteriously claims to be created for students, by students, and is rather predictably in beta.

There are bugs, and slender participation makes any 2.0 service like this awkward at first, but give it time. After a little tour, I think that stu.dicio.us is actually more useful for its lightweight organizational tools. There’s a sortable todo function – handy even if you aren’t interested in checking peers’ todos. The basic Textile formatting for notes encourages precision (see this testimony), and auto-save is built in. You can use simple brackets for auto-links to Wikipedia, Google, or Google scholar. You can upload files and access them whenever you want –as long as the service remains online. For those times when you can’t get online, stu.dicio.us offers an offline mode.

Here are a couple of screenshots. First, my fake schedule, with grades, notes, files, todos, and (sadly) no friends. This would be useful, I’d say, especially if it were within a course management environment:

stu.dicio.us

… and someone’s notes, which i found by doing a search for history and columbia:

stu.dicio.us

Enlightening? I doubt it – but misery does love company – and if you’re casting around randomly for any mention of history in anyone’s notes, chances are that you’re feeling a bit miserable.

The end of EndNote?

You’ve wrangled that paper to a plausible conclusion — a bit of sleep is just around the corner — but hold on, not so fast, you’re Sisyphus after all. Citation formatting is a special curse, the inane labor at the end of hard work that holds all your effort hostage. Never does it seem less true that it’s the thought that counts.

The best portrait of this frustration that I know is Louis Menand’s New Yorker article from three years back, “The End Matter; The Nightmare of Citation.” (And no, I won’t properly cite it.) Menand mobilizes here a full sense of the tyranny that must be endured in the construction of endnotes —

Every error is an error of substance, a betrayal of ignorance and inexperience, the academic equivalent of the double dribble. That the decorums of citation are the arbitrary residue of ancient pedantries whose raisons d’etre are long past reconstructing does not reduce the penalties for nonconformity.

Surely technology should free us from such tiresome finish-line ambushes. And yet, as Menand observes,

The notion that the personal computer has eliminated the bone-crushing inefficiency of the typewriter, and turned composing The End Matter into a drive in the word-processing park, belongs to the myth that all work on a computer is “fun”-one of the Digital Age’s cruellest jokes.

Microsoft Word, as Menand observes, is too often a baffling mess when it comes to foot/endnote generation, plaguing you with random formatting and automatically generated annoyances. Too many options: the exhauster citer just wants to be faultless and to be done.

EndNote — which is a plug-in in my version of MS Word — might seem to be a lifesaver. Indeed, many of us have been happy to sit through earnest training in this and similar tools, entranced by the promise of metadata pulled down from a network, stored in a local database, and spit back out, effortlessly, into formatted endnotes. Oh, you wanted APA 5th, not Turabian? Hold on just a sec – (click, click) – here you go! Choose a style, any style: here are 1012 to choose from!

And yet, in my personal experience, EndNote endnotes are chock full of flaws. I’m not here to assign blame — maybe it was an incomplete OPAC record, maybe the library filter was off, maybe EndNote dropped a field — at the end of the day (rather, the night), citations are liable to look like nothing in that overstuffed, unloved red style manual (which is all but impervious, anyway, to the need to cite digital sources). Back to fixing, fretting, fudging. Only EndNote is liable to overwrite your corrections: surprise!

And yet the dream of escaping such frustrations through technology won’t die — and shouldn’t. It seems only fair that our Babylonian predicaments be ameliorated, at least somewhat, by computers–our vast interconnected ever-churning never-complaining prostheses.

George Mason’s Center for History & New Media (a seemingly ever-inventive group) has had a promising tool chugging down the pike for some time that offers a new glimmer of hope. It manages citations and other research information in a web environment. When first I heard about it , they were calling this tool Firefox Scholar – now it’s been rebranded to Zotero: a term loosely based on the Albanian word for acquiring/mastering. Whatever – let’s trust that this promising project will prove to be less obscure than such an etymology.

From what I can tell from the description of Zotero, bennies include:

  • Ability to capture & store PDFs, files, images, links, web pages in a browser platform.
  • A range of organization options, including folders & tagging & ‘smart’ collections.
  • iTunes-like interface.
  • Spotlight-like search-as-you-type.

…and, most relevant here:

  • Ability to sniff out a citation on a web page & capture it to your library
  • Citation export.

Zotero works with Firefox to sense when you are visiting a page with full bibliographic data (like an OPAC) and offers a little book icon; click it, and citation material comes flying into your computer.

Zotero in a Firefox browser bar

Since suddenly there’s a profusion of browser-based store-organize-share tools (SOS?) for scholars, Zotero will be all the more valuable if it can be jiggered to play with academic social software like Connotea or the aforeglimpsed CiteULike – and, while we’re dreaming, if it can feed stored items into networked repositories. Since it’s free and open source, one can imagine any kind of evolution for this “next generation research tool.”

Will researching and citing on the web actually get a little easier? We’ll see – Zotero is in private beta now, but should be in public beta by the end of the month.

Give unto Wikipedia

Reading Roy Rosenzweig’s thoughtful appraisal of Wikipedia in the current Journal of American History (“Can History Be Open Source? Wikipedia and the Future of the Past”), I was particularly struck by this passage:

If Wikipedia is becoming the family encyclopedia for the twenty-first century, historians probably have a professional obligation to make it as good as possible. And if every member of the Organization of American Historians devoted just one day to improving the entries in her or his areas of expertise, it would not only significantly raise the quality of Wikipedia, it would also enhance popular historical literacy.

Let’s step back and marvel at another indication of the power and sudden inexorability of Wikipedia — can you imagine a distinguished historian feeling that he owed it to the world to improve the Encyclopedia Britannica, and urging colleagues to do their part too? For no credit and no money?

If historians and other academic experts should really be raising the quality of Wikipedia, this begs the question of who their exertions would be for. An initial answer, I suspect, would be: not for each other, and not for their students. As Rosenzweig writes (in a peer-reviewed journal, of course, and not an encyclopedia),

Most readers of this journal have not relied heavily on encyclopedias since junior high school days. And most readers of this journal do not want their students to rely heavily on encyclopedias — digital or print, free or subscription, professionally written or amateur and collaborative — for research papers.

And so an obligation to Wikipedia seems outwardly directed, keyed to a general public’s understanding (that Cleaveresque ‘family’ using a family encyclopedia). This raises further questions. Are we seeing a technologically-enabled resurgence of the public intellectual? If so, what would it mean to take on this role in a communally edited space impervious to individual identity and, as Rosenzweig notes, suspicious of expertise?

Since an edifying or even identifiable relationship with Wikipedia users seems impossible, let’s posit that obligation to it is not primarily to a public, but really to a field of knowledge as it is represented in public. In other words, if the Wikipedia page on the American Revolution is becoming the de facto online summation of this event, and if historians don’t weigh in, their knowledge fails to apply where it’s most needed.

But I wonder about how good academics generally are at writing encyclopedia articles. In many cases, it’s not at all the kind of work they do when researching or teaching — it’s not what their intellectual life is about. In general encyclopedias have settled into tended repositories of knowledge, not the active sites of inquiry that universities strive to be.

As Rosenzweig says, “Wikipedia (like encyclopedias in general) summarizes and reports the conventional and accepted wisdom on a topic but does not break new ground.” To get a sense of the progressive quiescence of encyclopedias, you could look at Wikipedia’s entry on Diderot’s Encyclopédie:

The Encyclopédie played an important role in the intellectual ferment leading to the French Revolution. “No encyclopaedia perhaps has been of such political importance, or has occupied so conspicuous a place in the civil and literary history of its century. It sought not only to give information, but to guide opinion,” wrote the 1911 Encyclopædia Britannica.

This reliance on a hundred year old hedged claim in another encyclopedia about the political impact of a 200+-year-old encyclopedia may seem abundantly timid, but it exists — at least today — in an Wikipedia article whose neutrality is nonetheless flagged as disputed. Wikipedia strives to resolve dispute, to traffic in the indisputable — while a university that lived by that principle would be a zombie campus, at best.

Whether or not you believe in the power of online collectivism, and whether or not you think that Wikipedia represents that collectivism, you have to hand it to it (them?): Wikipedia knows what it is and what it is not. It couldn’t be more explicit about its limitations: it accepts no original research, no original ideas. And it does not pretend to satisfy research; its founder, Jimbo Wales, reportedly offers this advice to students: “For God sake, you’re in college; don’t cite the encyclopedia.”

So, again, why might thoughtful and original academics pay particular attention to an environment that is in many ways alien to them — and even entertain notions of obligation to it? I have a few guesses, all of them broad, none of them substantiated:

Academic publishing is sluggish — Is there any write-up about Wikipedia that does not refer to its vast coverage, its low barrier of entry, and what Rosenzweig calls its “open-source mode of production and distribution”? Academics yearn to see their work actually get distributed in the world, and they are caught in increasingly sluggish and narrow channels of communication. Wikipedia actually publishes effort, instantly and in retrievable form, to an audience that can respond to it.

No doubt about it, academic publishing constricts the discourse it should support, but the invigoration of it in a digital environment will probably be quite different from the structure and dynamics of a wildly popular collaborative encyclopedia. Wikipedia may have the most to teach us through its stubborn emphasis of what it is not: are we listening? This is a world in which, as the entry on ‘expert’ tells us (today, at least), “an intellectual elite may or may not be correct about a particular issue in their field of expertise.” The “may or may not” ambivalence about expertise, the faith in correctness at all cost… not exactly the environment for nuance, originality, or intellectual leadership.

The academic star system is stifling — This is a corollary to the above point, because recognized stars get into print more often, or at least can lean on the rusty gears of publication. And stars are stars — let’s face it — they energize events, they get the grants, they make things happen. But I suspect many academics — even stars — are titillated by Wikipedia’s oft-noted indifference to expertise. By depersonalizing and flattening and opening the field of contribution, Wikipedia seductively suggests that truth will prevail on its own — no lollygagging on laurels here.

Whatever we think of laurels, it is indisputable that peer-review, the basic engine of academic appraisal, depends on identification and reputation. Escaping the burdens of apprenticeship, labor-validation, review, and professional development may seem liberating, but a specified affiliation and whatever responsibility (or lack thereof) that implies are enabling conditions of academic discourse. A university can’t function without overt hierarchies–campus rituals are almost entirely organized around the individual’s passage through sanctified levels. Anonymity may prove surprisingly difficult for those whose sense of work is so deeply rooted in acknowledged position.

Neutrality is only fair — Wikipedia’s sternly enforced Neutral Point of View policy seems to offer respite from a world riddled with clashing theoretical frameworks. Humanists and scientists alike may feel that it’s exhausting to interpret morning noon and night — all the while moving practically through the world, negotiating its incoherencies. Wikipedia’s banishment of originality lightens the burden of this reconciliation; it sings the siren song of the incontestably evident.

The ban on spin attempts to keep things calm and cordial, but to what end? Wikipedia’s NPOV might seem related to the disinterested analysis beloved of academicians, but, as Rosenzweig points out, Wikipedian neutrality leads to a great deal of waffling and prim skirting of controversy. When it comes to the pursuit of knowledge, a polite series of self-cancelling on-the-other-hands proves a poor substitute for interpretive power and conviction. Poor and censorious. For a surprising little totalitarian chill, I recommend Wikipedia’s page about NPOV disputes : “there is a strong inductive argument that, if a page is in an NPOV dispute, it very probably is not neutral.”

Facts are simple, fact are good — A corollary, again, to the above point. Wikipedia leads us into a world of passive construction, where things have been proven, have been shown, have been accepted. Once all that messy agency is wiped out, we are left with qualified data in its proper place. Enjoy a small chuckle that the “Fact” entry in Wikipedia is today double-flagged as containing “disputed factual accuracy” and “original or unverified claims” . The fact remains that in Wikipedia, things are either proven or not, accepted or not, controversial or not — it’s an organized and binary landscape.

The pursuit of just the facts ma’m orients Wikipedia towards what’s been commonly agreed, but it can also lull thought to sleep. As a historian, Rosenzweig knows very well that “good historical writing requires not just factual accuracy but also a command of the scholarly literature, persuasive analysis and interpretations, and clear and engaging prose.” Let’s go back to that “Fact” entry in Wikipedia and partake of its droning tautology: “A fact that was once a fact and hence becomes disproven may once again become a fact if the factual evidence supporting its validity become increasingly factual in light of new and, ultimately, factual evidence.” ‘Nuff said.

Data is (are) cool — Though Rosenzweig gives props to the factual accuracy of Wikipedia — finding it to clock in somewhere in-between the Encyclopedia Britannica and the prohibitively expensive American National Biography Online — you can sense in his article a purer enthusiasm for Wikipedia as object. Its open content can be exported for research — “downloaded, manipulated, and ‘data mined’… Wikipedia can therefore be used for other purposes.” One of these purposes might start to feel like research: measuring activity in a somewhat transparent online environment. As faddish tracking of Wikipedia contrails suggests, passage through it becomes an enticing reflection of its users — you can trace patterns and behaviors to your heart’s content.

But what is all this data telling you? Who do Wikipedia’s users represent? How much should we take Wikipedia’s ground rules as exemplary? Tautology looms: we’re studying Wikipedia to learn how Wikipedia works. Take a research paper like “Ambiguity and conflict in the Wikipedian knowledge production system” — here’s how its it resolves: “Wikipedia is a fascinating topic of study and requires careful examination of its underlying social and cultural processes…. One of the most urgent items on the research agenda is to describe and explain the concrete processes by which knowledge and truth is produced and adjudicated.” What’s behind this compulsion — the requirement of examination, the urgency of such a research agenda? Could it be mirroring of Wikipedia’s own faith in neutral truth-production?

Again this feeling of compulsion attending Wikipedia. Maybe you feel it too. If so, it’s probably too late to suggest that another wiki, another platform, another construct might better deliver your truth.

Dear PennTags

Please don’t take this the wrong way. It’s not you, it’s me. It’s just that I was so excited to meet you — I had so many preconceptions, I had heard so much about you. And then when I actually met you, you seemed kind of standoff-ish and, I admit, sort of different from what I thought you’d be. But I still like you — don’t get me wrong.

When I first heard about you I thought: finally! A way for scholars to tag up an OPAC as well as electronic journals — a tool enabling social discovery by a defined community swimming through carefully selected resources. In short, I thought you’d be more sophisticated and more focused than del.icio.us. I thought: finally, it will be easy for a specific class or a set group of scholars to sift together through premium resources: collaborative discovery centered on the information source most unique to Penn, the Penn library.

But when we actually met you were so confusing (and I’m not alone in thinking so). Your home page hit me right off the bat with pictures of birds and a big tagcloud, a cloud that seemed more random than representative:

PennTags

What does it mean that Lauder_Institute_Area_Studies dwarfs united_states? I think it means that you haven’t gotten around enough to render a representative or even very interesting snapshot of the Penn community — so until you do, I suggest you don’t wear this raw data on your sleeve.

I know your type — you’re enamored of presenting data as it comes into your system — makes you seem extra dynamic. But until you get more play, you’re not delivering useful information with your overall clouds and ‘latest tagged’ lists. In fact, I doubt such look-ma-it’s-web2.0 features will ever be that useful to anyone, however big you get.

I guess my point is, first impressions are important — so you should use your home page to introduce yourself, rather than show off. I finally found my way to the “About” page (tiny button, my friend! why so shy?), a page that finally addresses the question, “What is PennTags”? And here you got kind of weird. You started pretending that del.icio.us doesn’t even exist. Or, to put it another way, you said almost nothing about yourself that couldn’t be said about del.icio.us. You bragged:

Have you ever bookmarked a web page and then can’t find it again in your mass of bookmarks? The beauty of PennTags is that it allows you to organize your bookmarks/resources exactly the way you want and it lets you share them with others. It’s both personal and portable.

Well ok, but I thought your beauty, PennTags, would be that you would be different from del.icio.us — that instead of letting anyone tag anything just ‘out there’ on the open web, you’d let a defined community — namely, Penn and sub-communities within Penn — tag things that are available by virtue of being at Penn. Otherwise, why reinvent the wheel? Ignoring the popular kid & just pretending to be him won’t impress many who are likely to be drawn to you in the first place.

Jumping into some of your posts, though, I found that your users are in fact using you as I thought they might — they are tagging your library’s catalog records, and they are tagging articles available in your library’s database, as well as outside websites. Following these links put me on quite different adventures.

When the item tagged is in the OPAC

OPAC tagging is pretty darn sweet — and you pulled this off with Voyager, no less. When I clicked on a post referring to a book on Godard, I didn’t get to access the book (obviously), but I was routed to its catalog record, and I found that the user-contributed tag and summary had made the trip with me, and appeared in a yellow box right in the OPAC:

PennTags

After seeing this trick, PennTags, I started to warm to you. People who know nothing about you or about tagging or even about bookmarking are bound to wonder what these yellow notes are on showing up on the bottom of OPAC records — maybe you’ll recruit more users this way, and get smarter. At the very least, you’re giving library records a sense of life; any way to enliven the OPAC with user contributions is a-ok with me.

But I wonder how you’ll manage any significant success — imagine ten such yellow PennTag records clinging onto a record in the catalog. You’ll have to be careful to keep a balance between authoritative metadata and folksonomy, between succinct official catalog records and long contributed summations.

When the item tagged is in a journal database

What about when someone posts and tags a journal article in you? I clicked on such a record, and, not to my surprise, got dumped at a Penn database log-in screen — which means that if I were affiliated with Penn, I’d go right to the article. Since I’m not, I see nothing — no user summations, no fun yellow boxes. This begs the questions again about who is using PennTags, and for what purpose. Frankly, I felt ignored by you here. If you are of, by, & for people behind Penn’s walls, then perhaps you should live behind that wall too — it’s not particularly interesting, for someone who can’t get at resources, to see how they’re being tagged.

That said, clicking on the title of another posted article, a JSTOR title, took me — much to my surprise — right into the article; I was ushered straight in thanks to my own institution. That experience started me dreaming again, PennTags, about an openURL world, filled with cross-institutional tagging of academic assets. At the very least it renewed my hope that I might find you of use while waiting for my own library to get tagging off the ground.

When the item tagged is an outside website

Then there are the outside websites that are being posted and tagged in you, just as they’re tagged in del.icio.us. As you know, I think it’s redundant and a little silly to use you just for this purpose, but I’m also warming to the idea of tagging websites right alongside OPAC records and journal articles. You see, PennTags, I’m open to persuasion; you just haven’t taken the time to articulate the benefits of this mix. You’re actually allowing your users to bring resources into your library, in a way. Rather than reinventing a wheel, you’re melting a wall. That’s a big step, and it’s one to think about — not take for granted.

Yeah, inside/outside tagging has plenty of potential, no doubt about it, but here again I’m a little let down. Here’s the deal, PennTags: I think you could be a little more proactive about what academic tagging could or even should be. Could it be hierarchical? Might it be user-faceted? Are there ways to enforce best practices? By offering little firm guidance, you’re once again playing pseudo-del.icio.us, leaving everything up to an undifferentiated swamp.

But look around, PennTags: you operate in a world full of productive distinctions. You even list some, shyly — they get buried in a section called “More Tagging Tips”:

PennTags

How hard would it be to invite your users to think along these lines, gently, somewhere in the tagging process? Can tagging evolve to something beyond a single ‘fill in whatever you want’ open field? I know you don’t want to come across as bossy or proscriptive or — god forbid — librarian-like, but I wonder if just a couple of criteria particularly useful to your academic community (say Topic and Relevance) could be quietly promoted, just as del.icio.us already subtly promotes tagging uniformity through ‘recommended tags.’

The thing to keep your eye on is use: how these tags are used by actual populations, in actual classes or other sub-groupings, for actual purposes. I find it pretty weird that you’re asking people to think about tagging with an uncle in mind — unless this is an uncle at Penn. Relevance is a subjective and fairly meaningless call against a wide-open horizon (where many uncles live), but within the context of english242 students working collectively on a presentation about Keats’s illness, say, “Relevance” becomes a powerful way of characterizing a resource.

Imagine, too, if you allowed any kind of distinction among users — how interestingly instructors and students, say, could interact within a classroom framework as what they are (in the institution’s eye) through you. Or professors and research assistants. Or members of a class and those outside the class. Or librarians. Or alumni. These distinctions shape the day-to-day life of your campus, and though I suspect you imagine yourself to be leveling the playing field in exciting new ways, you don’t have to dumb the field down that much. Nor do user distinctions need to control the way people use you. Building them in would only help when it become desirable to browse or subscribe to the tagging work of a certain subset of the campus community. Here’s your advantage over del.icio.us: you operate in a circumscribed world organized around definable purposes, roles, means, events.

I think you’d be even cooler if you presented yourself as not just another collective knowledge base, but as the way that only Penn could make the knowledge of the world work for definable ends. That’s why I think your most promising feature is ‘Projects’. Right now you only allow one owner post to a given project, but maybe in the future you’ll loosen up and let many users work on a given project — and maybe even specified classes of users. Then, I suspect, the RSS functionality you’ve already built in would start to be useful not merely to the curious, but to a much more involved user-base: the tasked.

Well, PennTags, you can guess by the way I’ve gone on here that I actually am pretty attracted to you, and I look forward to seeing how you mature. You’re raising awareness of tagging in academic settings — and you’re not just sitting around wondering about what that might mean — you’re actually putting tags into motion. That’s the only way any of us is really going to learn how this 2.0 phenom might work for us. So — way to be, & keep in touch.

Your PennPal,
Mark

By indirections find resources out

OCLC’s recent report College Students’ Perceptions of Libraries and Information Resources resonates a bit with the Al Gore slideshow movie I saw this weekend: it deploys lots of slick graphs and charts to frame information that can only be received with dismay.

The almost 400 students surveyed by OCLC think of commercial search engines as a perfect fit for their lifestyle and their needs, and they turn to them first whenever looking for information. The respondents respect the libraries, and feel that they can find quality information through them, but they almost never delve into library websites first to find information. Their instant ‘brand’ identification for libraries is ‘book.’

In short, libraries seem to exist as a point of last resort in the minds of many college students — a complicated, confusing, sometimes outdated facility to be approached for information only when Google fails. The pull-quotes in the OCLC report are inflected with grammatical errors, just to rub salt in the wounds. Rampant illiteracy or OCLC sabotage? You decide:

OCLC survey

OCLC survey

OCLC survey

Hidebound notions of what academic libraries are actually doing these days make it all the more important to find new ways to expose services. The LibX Firefox Extension, for example, embeds links to library resources in a variety of more user-friendly websites (their screenshots show little logos popping up in Amazon and Google searches, as well as New York Times book reviews). LibX is another one of these nifty localizing extensions that Firefox has inspired — and it works with COinS.

A less technical way of exposing those expensive electronic library services is to take particular note of how students actually learn about them, according to the OCLC study. Have a look with me at this chart, which breaks down the ways college students (and broader populations, for comparison’s sake) find out about electronic information sources *besides* through search engines:

OCLC survey

Librarians themselves are way down on the chart — and they rate even lower for the non-college crowd. So what’s at the top? ‘Friends’ and ‘Links’: more reasons to make it easy for students to create, store, and share links to library resources. But look at who’s coming in third–beating out other media, advertising, and my cousin who works for CNN: Teachers. Teachers, way above librarians. While librarians are increasingly framing themselves as teachers — the ‘instructional librarian’ is a familiar role and position by now — such data suggests we think of teachers as front-line librarians, or at least librarian-proxies.

Consider, too, this chart showing “Cross-referencing Sources to Validate Information”:

OCLC survey

Though it’s hard to see in this small version, the chart shows that college students (in green) and the general population (in orange) validate the information they find on sites most often by comparing other websites with similar information (80-82%). But in second place, at least for the college crowd, here comes our unexpected resource champ, the Teacher, with an impressive 78%. That source of information validation beats out checking library materials (64%) and checking with a librarian (36%).

Given their relatively exalted position on the information food chain, teachers need all the training and support they can get from librarians. We should throw out the assumption that just because someone wrote a dissertation, he knows all about how to use library resources and can pass on this wisdom to students. The ground is changing too fast, and the unsupported instructor will not have time to keep up. That’s not his job–it’s the librarian’s.

Case in point: a European history and philosophy librarian mentioned to me the other day that Blackwell Synergy is becoming a significant point of access to important journals in his areas. And perhaps you thought of this database (if you thought of it at all) as focused on science?

The point is, in a healthy educational environment, a teacher will be backed up with well-selected electronic resources that are ever one click away in the course management system, tended and manicured by librarians. This is indirect, ongoing training – for teachers as well as for their students – in the use of resources, delivered at the point where it’s most needed. Such targeted support could actually minimize class disruption (no need for librarians to come point out where resources are, if they’re already being well-delivered), while letting students hold on to the fantasy (which they evidently need in these perilous times) that the library is all about books.

Mining the machines

Last year at the ARL symposium called Managing Digital Assets, I smiled inwardly to think of the grumbling likely to be kicked off by observations such as this by Donald Waters of the Mellon Foundation:

…what unites our interest in digitization and open access in a digital world is that the material becomes ‘processable,’ or subject to computational processing. That is, the growth in the market of readers is not among groups of humans, but of machines, which are programmed to index, manipulate, mine, aggregate, decompose, and build up scholarly and other forms of content by algorithm. It is this machine ‘processability’ that makes digitized objects and open access materials most valuable to scholars.

Protest, fume, rail against the subjection of your most exquisitely developed thought to the dumb imperatives of ones and zeros — Waters is absolutely right. You want influence? Or, more to the point, you want to avoid obliteration in the vast digital swamp? You’d better know how to demarcate, classify, and optimize your work for machine crunching — or find someone who does. And pray that the stewards of such crunching, the information managers you never thought about, have your best interests in mind.

All this occurred to me while reading a new D-Lib piece by Daniel Cohen, director of research projects at the very creative Center for History and New Media at George Mason University. Cohen also spoke at that ARL session, and at the time he sold me on Firefox scholar. His new article, “From Babel to Knowledge: Data Mining Large Digital Collections”, offers two nice examples of humantist-friendly manipulation of machine “processability.”

First: Syllabus Finder. Where was this godsend when I was inefficiently wandering around the chaff of the web, trying to crib ideas for my own syllabi? It’s a very sensible, very needed genre-based search tool. First, it defines “document classification” through a very simple dictionary of keywords endemic to syllabi (“assignment,” “office hours,” etc.). This classification is fed into Google through its API service, along with the search query, for optimized searches. The results can then be further refined through more automated analysis or combined with other search results.

I gave it a spin, using canonical writers from the Romantic era as search terms. To my happy surprise, good old Ashes Sparks & Hypertext, a six year old syllabus for a seminar I taught back in the day at UC Berkeley, kept showing up — and at or near the top of results. #1 for Coleridge, #2 for Byron, #1 for Wordsworth, #2 for Blake, #4 for Hemans. Yeah, baby! But we drop down to #14 for Keats, alas, and as for Shelley, he just kept coming up as a “fatal error,” an “Uncaught SoapFault exception.” So Syllabus Finder is a little buggy — but, dare we say it, a little poetic too. Maybe we’re just overly pleased by taking the silver for Byron:

Ashes Sparks is the second syllabus listed for Byron

I don’t know what to make of the way this tool seems to like the Ashes Sparks syllabus — certainly I indulged in no optimization — no thought about how the thing would be retrieved. The only distinguishing feature of that document, really, is that it’s been online steadily for six years. It’s just one of those Google-blessed mysteries. Perhaps cannier post-processing could promote syllabi more deserving of prominence. But Syllabus Finder works pretty well–I’d recommend it to a fledgling (and not-so-fledgling) instructor. As Cohen puts it, it does a surprisingly good job at achieving its modest goal – on most topics for every ten documents it retrieves, about nine are syllabi – and it has thus far found and catalogued over 600,000 syllabi, synthesizing a collection of course materials considerably larger than any created or maintained by a professional organization, educational institution, or library, or by any other effort on the web to aggregate syllabi.

A second and more complex treat today from the George Mason wizards: H-Bot. This is an automated historical fact finder that can field natural language queries. (Or at least ones that begin with ‘what’ or ‘when’ or ‘who’; it’s not ready to handle where, which, how, or why). The algorithm here is “question answering” — which involves the identification of relevant documents, some natural language processing (to interpret queries), and statistical/linguistic analysis of retrieved documents. (In addition to the D-Lib article, there’s more on H-bot here)

Playing with H-Bot is fun. When did Hitler die? The answer in an eyeblink, as the Germans say: April 30, 1945. When did Gandhi die? Here’s a quirk:

Fun with H-Bot

Well sure, but that wasn’t the Gandhi I meant. Interestingly, here’s what happens when I ask the same question but tell H-Bot not to “check trusted websites first”:

Fun with H-Bot

Here’s a case when the unfiltered swamp actually answered my question — or read my mind — better than “trusted websites.” Quantity over quality? Very sensibly, H-Bot demurs when I ask “Is God dead?” or “When did God die?” (“I’m sorry. I cannot provide any answer on that.”) But ask it “Who is God?” and H-Bot serves up a perky little answer:

Fun with H-Bot

Simple-minded? Sure. But viable. Arguments will rage, hairs will split, blood will spill, but our dumb machines have given us an efficient pulse of information in the midst of the cacophony, delivered by strategic sifting of great gobs of data.

Which brings us to a final point that Cohen makes about machine data-mining: “Quantity may make up for a lack of quality.” Even the most ardent humanist can’t deny: when it comes to information, we’ve got a whole lot of quantity these days. It’s how we draw from such quantity that counts.