Life in the taggregate

Friday, November 23, 2007

From its earliest days, the promise of the Semantic Web has been to bring networked computers closer to the forms and priorities of human inquiry. This promise depends on mark-up language that gives data some structure, and frameworks that bring such structure into recognizable relationships. As a May 2001 Scientific American piece by Tim Berners-Lee and colleagues put it, “for the semantic web to function, computers must have access to structured collections of information and sets of inference rules that they can use to conduct automated reasoning.”

Automated reasoning! This dream may be coming to life in e-science, with its highly structured and interoperable datasets, but in many other contexts the idea of a Semantic Web sits uneasily with the younger and more popular kid on the block, the Participatory Web. Web 2.0 environments amasses a lot of data and, more importantly, a lot of information about this data generated by humans downright impervious to the need of machines for identifiable and consistent structure. Such tags are generally free-form, non-hierarchical, not expressing relationships in a predictable and consistent way; they dance to “folksonomy” not “taxonomy”; they are blithely untethered to “ontologies,” to any URI-based language standards.

Nevertheless there is intriguing thought out there about the potential interplay of the Semantic Web and Web 2.0. The Tagcommons sites lays out Use Cases that envision sharing tags across databases, and sketches out some functional requirements to make that interoperability happen. Tom Gruber, in particular, has argued energetically for “collective intelligence systems” built from syntheses of structured data and social software; his travel-review site RealTravel uses a “snap-to-grid” model to disambiguate and structure user-supplied tags.

And now in Yahoo! Research Berkeley labs, algorithms are starting to take into account aggregate patterns in order to sift out meaning from vast oceans of community-generated tags despite all their unstructured messiness — or, as computer scientists like to say, despite all their “noise.” It’s a matter of inference and cluster analysis. Case in point: the photo-sharing site Flickr’s new experiments in extracting “practical information about the world” from the snapshots and tags poured into it by the great unwashed. The report “How flickr helps us make sense of the world: context and content in community-contributed media collections,” describes a layered process of tag and image analysis–one that can be conducted entirely by machines–that identifies representational tags as well as place and event semantics.

What does all this do for us? For one thing, it can improve a search through piles of community-contributed materials; my search for “Harlem” stands a better chance of coming up with the most representative picture of the neighborhood, or a set of iteratively varied views of the neighborhood, or even a conglomeration of views for a composite view. I could determine the most visited place in the neighborhood, or the scenes of important events. Yahoo!’s researchers are even thinking about automatic tagging of photos, or suggestions for tags, that are generated by visual content abetted by contextual and geographical cues.

Here are a couple of spins of Yahoo! Labs’ TagMaps:

Flickr World Browser Harlem

^ TagMap’s World Browser analyzes Flickr tags to locate “Harlem” on a map and offer a set of representative photos (on the right). Harlem seems pushed to the west, and the chicken picture is a little odd, but this machine-generated guess seems viable enough.

TagMap World Browser Paris

^ A search for ‘Paris’ in TagMap’s World Browser whisks us to a city in the middle of France, not Texas, and avoids any pictures of over-photographed heiresses. See: machines have taste too.

Teasing meaning out of cacophony, evaluating ‘where what & when’ through dumb processing of inconsistent human traces: it’s not hard to sense an artificial intelligence awakening here with its own priorities, despite the human decision (conscious or not) to ignore machine-oriented information conventions. What is the ultimate effect of algorithms trained to crunch through the idiosyncratic and identify the representational? Could such aggregate processing of unstructured data fuel a general regression to the mean, as alchemist Jonah Bossewitch muses? As a Trekkie (or is it Trekker?) might say, streaming into yet another convention, resistance is futile.

The fear of human conglomeration coming into sudden sentience is nothing new, of course. I just re-read Frankenstein with a set of fresh young readers, and alarmist correlations of that good old story to a improbably persistent, flexible, and collective-mashed form of AI doubtlessly come too easily to me now. But I do sometimes wonder whether we too will wake up from our most logocentric tagging idylls to sense senseless and unblinking eyes, watching us in the dark and hungry for more.

Archiving a tragedy

Thursday, May 3, 2007

Virginia Tech’s Center for Digital Discourse and Culture recently debuted The April 16 Archive, with some help from the prolific Center for History and New Media at George Mason,

…in order to support ongoing efforts of historians and archivists to preserve the record of this event by collecting first-hand accounts, on-scene images, blog postings, and podcasts.

It’s worth keeping an eye on this project as a model of user contributions, clustered around a contemporary and tragic event. How do we use new media to process such things? What does it enable us to capture and collect and learn?

So far the April 16 Archive is fairly bare-bones; it only accepts ‘images,’ ’stories,’ and the vaguely termed ‘other files’. And as of now it’s impossible to search, hard to browse. There is some tagging, but the lumped-up organization makes you wish for some other ways in to the content–perhaps a map interface along the lines of the CHMN’s last tragedy-archive, the Hurricane Digital Memory Bank. A simple uploading interface provides a cut-and-paste field for Virgina Tech stories, or an upload field for files (maximum 5 MB). You can choose to just contribute to the archive, or to have your contribution appear on the website (with or without your name). Submitters are told that they retain copyrights to anything they contribute, which broadly bans use for any public purpose without the permission of the April 16 Archive and the original contributor. No CC options here.

The April 16 Archive FAQs take on the question of veracity: How do I know that the content of the April 16 Archive is factual? The answer here:

Every submission to the April 16 Archive–even those that are erroneous, misleading, or dubious–contributes in some way to the historical record. A misleading individual account, for example, could reveal certain personal and emotional aspects of the event that would otherwise be lost in a strict authentication and appraisal process.

Besides, this FAQ rather blithely continues,

…the April 16 Archive harvests metadata from every contributor–including name, email address, location, zip code, gender, age, occupation, date received–and suggests that these metadata be examined in relation to one another, in relation to the content of the submission, and in relation to other authenticated records. Sound research technique is the basis of sound scholarship.

After picking my way around the Archive for a little while, I’m struck by the number of images of Second Life memorials. I just don’t know what to think of such screen grabs. Collective therapy, sure – but an historical record of this tragedy? You tell me.

Second life mourners

Taking notes

Tuesday, September 19, 2006

Yo, can I borrow your notes?

Harkening back to the salad days of college, I seem to remember a free-floating faith in the power of someone else’s notes to fill in cracks of attendance & attention. I doubt that much significant learning took place in power-cramming sessions entirely reliant on someone else’s diligently indented transcription of wisdom. But I’m struck now, thinking back, by the instinct to herd together in such situations.

A study tool named stu.dicio.us has recently made its debut, promising del.icio.us-like value through aggregation of communal effort. Now maybe some stranger from West Virginia Tech will save you from the consequences of having slept through Chemistry. Or maybe that concept your prof seems so fond of has been dropped in another class somewhere, in a context just different enough to fuel your next paper. Or maybe you can meet that hottie on the far side of the lecture hall because you’ve done a search limited to your school and this class and lo & behold here you both are, believing in the power of networking your notes.

Sharing notes is not cheating, insists stu.dicio.us. Everyone should have every advantage possible in increasing individual knowledge. The site rather mysteriously claims to be created for students, by students, and is rather predictably in beta.

There are bugs, and slender participation makes any 2.0 service like this awkward at first, but give it time. After a little tour, I think that stu.dicio.us is actually more useful for its lightweight organizational tools. There’s a sortable todo function – handy even if you aren’t interested in checking peers’ todos. The basic Textile formatting for notes encourages precision (see this testimony), and auto-save is built in. You can use simple brackets for auto-links to Wikipedia, Google, or Google scholar. You can upload files and access them whenever you want –as long as the service remains online. For those times when you can’t get online, stu.dicio.us offers an offline mode.

Here are a couple of screenshots. First, my fake schedule, with grades, notes, files, todos, and (sadly) no friends. This would be useful, I’d say, especially if it were within a course management environment:

stu.dicio.us

… and someone’s notes, which i found by doing a search for history and columbia:

stu.dicio.us

Enlightening? I doubt it – but misery does love company – and if you’re casting around randomly for any mention of history in anyone’s notes, chances are that you’re feeling a bit miserable.

Dear PennTags

Wednesday, June 14, 2006

Please don’t take this the wrong way. It’s not you, it’s me. It’s just that I was so excited to meet you — I had so many preconceptions, I had heard so much about you. And then when I actually met you, you seemed kind of standoff-ish and, I admit, sort of different from what I thought you’d be. But I still like you — don’t get me wrong.

When I first heard about you I thought: finally! A way for scholars to tag up an OPAC as well as electronic journals — a tool enabling social discovery by a defined community swimming through carefully selected resources. In short, I thought you’d be more sophisticated and more focused than del.icio.us. I thought: finally, it will be easy for a specific class or a set group of scholars to sift together through premium resources: collaborative discovery centered on the information source most unique to Penn, the Penn library.

But when we actually met you were so confusing (and I’m not alone in thinking so). Your home page hit me right off the bat with pictures of birds and a big tagcloud, a cloud that seemed more random than representative:

PennTags

What does it mean that Lauder_Institute_Area_Studies dwarfs united_states? I think it means that you haven’t gotten around enough to render a representative or even very interesting snapshot of the Penn community — so until you do, I suggest you don’t wear this raw data on your sleeve.

I know your type — you’re enamored of presenting data as it comes into your system — makes you seem extra dynamic. But until you get more play, you’re not delivering useful information with your overall clouds and ‘latest tagged’ lists. In fact, I doubt such look-ma-it’s-web2.0 features will ever be that useful to anyone, however big you get.

I guess my point is, first impressions are important — so you should use your home page to introduce yourself, rather than show off. I finally found my way to the “About” page (tiny button, my friend! why so shy?), a page that finally addresses the question, “What is PennTags”? And here you got kind of weird. You started pretending that del.icio.us doesn’t even exist. Or, to put it another way, you said almost nothing about yourself that couldn’t be said about del.icio.us. You bragged:

Have you ever bookmarked a web page and then can’t find it again in your mass of bookmarks? The beauty of PennTags is that it allows you to organize your bookmarks/resources exactly the way you want and it lets you share them with others. It’s both personal and portable.

Well ok, but I thought your beauty, PennTags, would be that you would be different from del.icio.us — that instead of letting anyone tag anything just ‘out there’ on the open web, you’d let a defined community — namely, Penn and sub-communities within Penn — tag things that are available by virtue of being at Penn. Otherwise, why reinvent the wheel? Ignoring the popular kid & just pretending to be him won’t impress many who are likely to be drawn to you in the first place.

Jumping into some of your posts, though, I found that your users are in fact using you as I thought they might — they are tagging your library’s catalog records, and they are tagging articles available in your library’s database, as well as outside websites. Following these links put me on quite different adventures.

When the item tagged is in the OPAC

OPAC tagging is pretty darn sweet — and you pulled this off with Voyager, no less. When I clicked on a post referring to a book on Godard, I didn’t get to access the book (obviously), but I was routed to its catalog record, and I found that the user-contributed tag and summary had made the trip with me, and appeared in a yellow box right in the OPAC:

PennTags

After seeing this trick, PennTags, I started to warm to you. People who know nothing about you or about tagging or even about bookmarking are bound to wonder what these yellow notes are on showing up on the bottom of OPAC records — maybe you’ll recruit more users this way, and get smarter. At the very least, you’re giving library records a sense of life; any way to enliven the OPAC with user contributions is a-ok with me.

But I wonder how you’ll manage any significant success — imagine ten such yellow PennTag records clinging onto a record in the catalog. You’ll have to be careful to keep a balance between authoritative metadata and folksonomy, between succinct official catalog records and long contributed summations.

When the item tagged is in a journal database

What about when someone posts and tags a journal article in you? I clicked on such a record, and, not to my surprise, got dumped at a Penn database log-in screen — which means that if I were affiliated with Penn, I’d go right to the article. Since I’m not, I see nothing — no user summations, no fun yellow boxes. This begs the questions again about who is using PennTags, and for what purpose. Frankly, I felt ignored by you here. If you are of, by, & for people behind Penn’s walls, then perhaps you should live behind that wall too — it’s not particularly interesting, for someone who can’t get at resources, to see how they’re being tagged.

That said, clicking on the title of another posted article, a JSTOR title, took me — much to my surprise — right into the article; I was ushered straight in thanks to my own institution. That experience started me dreaming again, PennTags, about an openURL world, filled with cross-institutional tagging of academic assets. At the very least it renewed my hope that I might find you of use while waiting for my own library to get tagging off the ground.

When the item tagged is an outside website

Then there are the outside websites that are being posted and tagged in you, just as they’re tagged in del.icio.us. As you know, I think it’s redundant and a little silly to use you just for this purpose, but I’m also warming to the idea of tagging websites right alongside OPAC records and journal articles. You see, PennTags, I’m open to persuasion; you just haven’t taken the time to articulate the benefits of this mix. You’re actually allowing your users to bring resources into your library, in a way. Rather than reinventing a wheel, you’re melting a wall. That’s a big step, and it’s one to think about — not take for granted.

Yeah, inside/outside tagging has plenty of potential, no doubt about it, but here again I’m a little let down. Here’s the deal, PennTags: I think you could be a little more proactive about what academic tagging could or even should be. Could it be hierarchical? Might it be user-faceted? Are there ways to enforce best practices? By offering little firm guidance, you’re once again playing pseudo-del.icio.us, leaving everything up to an undifferentiated swamp.

But look around, PennTags: you operate in a world full of productive distinctions. You even list some, shyly — they get buried in a section called “More Tagging Tips”:

PennTags

How hard would it be to invite your users to think along these lines, gently, somewhere in the tagging process? Can tagging evolve to something beyond a single ‘fill in whatever you want’ open field? I know you don’t want to come across as bossy or proscriptive or — god forbid — librarian-like, but I wonder if just a couple of criteria particularly useful to your academic community (say Topic and Relevance) could be quietly promoted, just as del.icio.us already subtly promotes tagging uniformity through ‘recommended tags.’

The thing to keep your eye on is use: how these tags are used by actual populations, in actual classes or other sub-groupings, for actual purposes. I find it pretty weird that you’re asking people to think about tagging with an uncle in mind — unless this is an uncle at Penn. Relevance is a subjective and fairly meaningless call against a wide-open horizon (where many uncles live), but within the context of english242 students working collectively on a presentation about Keats’s illness, say, “Relevance” becomes a powerful way of characterizing a resource.

Imagine, too, if you allowed any kind of distinction among users — how interestingly instructors and students, say, could interact within a classroom framework as what they are (in the institution’s eye) through you. Or professors and research assistants. Or members of a class and those outside the class. Or librarians. Or alumni. These distinctions shape the day-to-day life of your campus, and though I suspect you imagine yourself to be leveling the playing field in exciting new ways, you don’t have to dumb the field down that much. Nor do user distinctions need to control the way people use you. Building them in would only help when it become desirable to browse or subscribe to the tagging work of a certain subset of the campus community. Here’s your advantage over del.icio.us: you operate in a circumscribed world organized around definable purposes, roles, means, events.

I think you’d be even cooler if you presented yourself as not just another collective knowledge base, but as the way that only Penn could make the knowledge of the world work for definable ends. That’s why I think your most promising feature is ‘Projects’. Right now you only allow one owner post to a given project, but maybe in the future you’ll loosen up and let many users work on a given project — and maybe even specified classes of users. Then, I suspect, the RSS functionality you’ve already built in would start to be useful not merely to the curious, but to a much more involved user-base: the tasked.

Well, PennTags, you can guess by the way I’ve gone on here that I actually am pretty attracted to you, and I look forward to seeing how you mature. You’re raising awareness of tagging in academic settings — and you’re not just sitting around wondering about what that might mean — you’re actually putting tags into motion. That’s the only way any of us is really going to learn how this 2.0 phenom might work for us. So — way to be, & keep in touch.

Your PennPal,
Mark

LibraryThings

Thursday, May 18, 2006

If it once took a special type of person to be a library cataloguer — one comfortable in back offices & around heavy rule books, methodical, perhaps quiet — now everyone wants to get in on the action. The rise of self-cataloguing has been one of the more inexorable effects of digital media. The discovery within cataloguing of social connections now seems to be another.

Of course long before all this web stuff we were being trained to collect content in various forms, and value assemblages as inherent identifyers of taste. Siva Vaidhyanathan’s recent presentation at Columbia’s Correcting Course forum carries this age-old ritual into my lifetime; he talks about a mass paperback industry that marketed (unread? unreadable?) books as class identification… VHRs marooned on shelves, monuments of their owner’s cinematic pleasures …. the fine art of of mixed tapes, now supplanted of course by playlists….

The fetishistic Mac application Delicious Library wraps a collection database into a pretty package so… so… well, so you can have a virtual representation of all your books — all your video games — all your DVDs, right on the hard drive of your computer. Scan the item’s UPC barcode with a webcam, and presto, metadata from Amazon flies right into your own library database — including cover art. Awesome, right?

Ok it’s actually fairly purposeless. You can assign items ratings, and you can designate their location in actual space, but I doubt many are actually relying on Delicious Library to find stuff. If you lend out an item to a friend, you can track it with DL — but really, if you’re lending out more than you can remember & your friends can’t be trusted to return things, well, maybe a policy change is in order. And DL’s symbiosis with Amazon’s API is worrisome — Amazon-hosted One-Click Shopping recommendations are just a click away.

But describe Delicious Library to someone, and it’s possible that they’ll turn cataloguer right in front of your eyes: huh, my things in a database….

Delicious Library

^ Finding Nemo and other treasures: virtual shelving in Delicious Library

LibraryThing — straight outta Portland Maine, btw — is a web app significantly tastier than its desktop cousin because it networks people’s collections. LibraryThing still invites you to play with representations of your books on virtual shelves for yourself — but now you’re doing your assembling among & amid a myriad of intersecting libraries. Now metadata is up for grabs, unregulated by Amazon or any other detached entity: social tagging comes to the fore. You can hear the 2.0 pitch — it’s del.icio.us for books! — and lo, tagging abounds.

But just around books — LibraryThing valiantly resists the siren call of other media on favor of bibliomania. It links its bibliographic records to OCLC’s Find a Library as well as Amazon and library OPACs via the good old Z39.50 client server protocol, and hosts discussion of titles among those who share it in their libraries.

In short, if you love books, LibraryThing seems an unrigged communal playpen, as well as a self-inventory tool. It provides branching recommendations based on mutual ownership, not Amazonian purchases. It presents clouds of a book’s common tags unseeded by commerce. It offers RSS subscriptions for any given tag, so you can track books as collections, not products, come in.

LibraryThing Screenshot

^Adding to my library in LibraryThing: I enter in a title, and LT checks it against a bibliographic database of my choosing. And I choose LC! No snappy webcam scan, alas, though barcodes are acceptable identifiers.

LibraryThing screenshot

^Now that I’ve added my book to LibraryThing, I can see how others have tagged and rated it. Looks like some people don’t care for literary theory, and yet they own the book. Go figure. This title hasn’t been reviewed yet in LibraryThing, but many have.

LibraryThing screenshot

^My so-far small library (the books on my desk right now).

Most intriguing of all, LibraryThing has recently added Library of Congress subjects into the mix. The premise is that user-created tags can coexist with library-tended subject headings, that folksonomy can play off of controlled hierarchy. At times, tags and subject headers coincide. In other instances, they hardly ever do. LibraryThing has only just embarked on this odd tango, and who knows where it will lead — but at the very least it should generate some intriguing friction.

LibraryThing screenshot

^Exploring the tag “literary theory” on LibraryThing. I see heavy users of this tag, works most often tagged by the term, and the latest books into the system so tagged (and I can subscribe to the tag via RSS). I also see related LC Subject Headings, in case I feel like faceted browsing.

Already user-tags are sitting up a little straighter and paying more attention to themselves. Discussion on LibraryThing’s metablog, Thingology, has been spurred by subject headings to characterize — dare I say categorize — tags. Discussants finds tags to fall into recognizable camps: personal location notes (”living room,” “office”), personal use tags (”read,” “damaged,” “study”), broadcast opinion tags (”excellent,” “lame” ), and personal subject tags (anything in the uncontrolled descriptive universe). The half-hazard felicities of user-tag surfing is getting measured right up against the precision of subject headings.

All this driven by Tim Spalding, a web developer, not a librarian. Or is he? Should we settle for patron?

Clipboards go social

Monday, March 13, 2006

Social bookmarking is swell, but suddenly it seems so limited, so 2005. Or so it seems to me after watching Dan Chudnov’s screencast unAPI and the Gates of the Dawn of Social Clipboards a couple of times. I can attest that it’ll get you thinking — even if, like me, your programming skills extend not much beyond the coffee maker.

You know about gates, you know about dawn, and you should know that APIs are blending web services in dynamic ways. unAPI (’un’ pronounced as in “universal,” not as in ”poor Syd Barrett, he’s un’appy”) is, as the term might suggest, a simple website API convention that allows a broad array of services to be syndicated and harvested. This is a lightweight, generic tool, unlike an API tailor-made to a service (like, say, the GoogleMaps API). More on unAPI here. Now, for some hurried idea of how unAPI enables social clipboarding, get comfortable and spend some quality minutes with the dchud screencast:

D’ja get that? Social bookmarking = a straightjacketed social clipboard, in which we share only urls and tags. With something like unAPI, the straightjacket comes off, the information we share gets richer and more varied. Click, drag, and toss into the communal pot objects that are linked to full bibliographic metadata — toss even whole images in. Once, in order to share information on the web, you had to code in HTML and FTP your creation up to a server. Then, blogs, wikis, and various administration tools like let you publish content through a web interface. Soon, it seems, you’ll be clicking and dragging web objects around directly. It’s a weird feeling: try it at a demo for Microsoft’s similar new experiment, Live Clipboard.

Chudnov’s emphasis on the new social possibilities of clipboards seems typical of 2.0 library services. My professional mission as a librarian is this: (he’s written) Help people build their own libraries. That’s it. That’s all I care about. Note the plural ‘people.’ If web objects can be readily swapped, studied, shared — if their harvesting and dissemination is conducted, from beginning to end, in networked spaces — it’s easier than ever to see that ‘collection’ is molting ever more into a publicly driven and defined activity.

Librarians once spent time carefully assembling web links for their patrons, and what an onerous job — one plagued by link rot, bedeviled by the fluidity of the web. Social bookmarking is a welcome alternative to the professedly authoritative link collection because it leverages a vast range of expertise, instinct, and attention, while allowing for discovery and customization. A 2.0 librarian (for lack of a better term) will do everything he can to promote this kind of activity.

Similarly, digital collections were once mounted in standalone boxes, and left gathered in a corner of a library website. Social clipboarding is 2.0 collection because, once again, it drags assets out into the pale sunshine of use and interchange. The 2.0 librarian will do everything she can to ensure that a digital collection is easily discovered, harvested, tagged, swapped around, recontextualized, re-collected, and (whenever legal) re-published.

Such decentralized, user-driven, unpredictable shuffling of digital assets might seem to diminish the role of your library. You need not go there, you need not apply there for access, you need not be cognizant of the dimensions of its actual collection. But look at what’s going on behind the scenes, in terms of programming, standardization of conventions, preservation and exposure of assets. And in front of the scenes, you can bet that librarians will evolve ever more into consultants, offering strategies for the successful customization and manipulation of information. If APIs start scattering assets of all sorts onto communally shared clipboards, ‘collection’ takes another step towards the need-based, on-the-fly assemblage of information transforming our world (dare we say) into one big library.

Mmashamashsmashh

Wednesday, February 22, 2006

Oh to have been a fly on the wall at the just-wrapped Mashup Camp - a fly safely high up on the wall, because a) I’m no programmer and would likely be in the way, and b) its ‘geek dating’ program - a frenetic dance of speed demos and the “law of two feet” - sounds downright dangerous.

But I would have loved to buzz with the buzz, because it’s clear that the proliferation of web applications and reusable APIs is causing an explosion of tinkering, playing, discovering. As Web 2.0 guru Dion Hinchcliffe puts it, The theory is that you can be much more valuable to the rest of the world if your software can be reused in unintended ways. In other words, don’t just provide a fully created end-product for one pre-intended use. Encourage others to use the good pieces of what you provide in new and innovative ways. And thus the torrent of new services cobbled together with bits of preexisting web services — some of which is tracked by Mashup Feed.

What can nontechnical endusers can expect from all this mashing? More customized information and the power that goes with that, as data feeds get mixed for real-time information on weather, parking, airfare, restaurants, skiing, and general calamity.

A glance at David Schorr’s Weather Bonk confirms, at once, that the Mission is the only somewhat warm place in SF, and the GG Bridge is flowing pretty well at the moment:


Looking for more monetizable information? Flyspy is planning to bring to you a 30-day overview of airfares:

But no matter how clever or useful the mashup, it’s only as good as its datafeeds. Another mashup service, Cheap Gas, looks great until you notice that the gas prices you’re being quoted, contributed by ‘anonymous’ (maybe Eddy from Texaco down the street), dated from last summer:

Such flashy inaccuracy is bound to make people who are in the business of reliable information — for example, librarians — nervous. Many mashups are anarchic sandboxes, and who knows what use your data will be put to or what company it will be keeping or to what ends it will be mashed (that’s the point).

As Tom Owad demonstrated a little while ago , pinpointing ’subversive’ (yet acquisitive) persons is as easy as mashing up Amazon’s Wishlists with Yahoo People Search with Google Maps. Here’s a map of readers hoping someone buys them a shiny new copy of Orwell’s 1984:

And that’s all *legal* — just imagine what our government is up to.

Nevertheless, the rise of APIs may save libraries from the rusty chains of closed-box ILS packages , and allow them to dream up a range of new community-oriented services. Certainly we should be glad that programmers plugged into the potential of libraries, such as the Superpatron, were doing the monster mashup this week.

Scanning mashupfeed’s indexes… here are some mashups that strike me as library-intriguing, with pasted descriptive blurbs (ie, I didn’t write ‘em, because I didn’t try ‘em all):

Using GoogleMaps API

  • Blosh Blosh finds blogs mentioning locations and displays them on a map.
  • Boston RSS Alley This map displays the locations of some of the companies and bloggers actively working with RSS in the Boston area.
  • Find the Landmark Test your knowledge of US landmarks with interactive, timer-based Google Maps game.
  • Flyr Search Flickr for geotagged photos and then plot them on a Google Map. Nice nested map-within-a-map.
  • GeoWorldNews The latest worldwide stories from the Washington Post plotted on a Google Maps satellite image.
  • Healthia Use the Healthia doctor search to find doctors the United States. 800,000 doctors listed.
  • History Timeline Wiki A history plus geography wiki that allows readers to contribute items of historical interest and plot their locations. Initial dataset is US battles.
  • Libraries411 Find public libraries in the US and Canada. Data for more than 20,000 libraries available.
  • Maplandia Comprehensive searchable gazeteer based on Google Maps. Referenc guide has full world coverage.
  • Placeopedia Geographically place Wikipedia articles on top of Google maps:

Amazon API

  • Albumart.org Uses the Amazon API and an Ajax-style UI to retrieve CD/DVD covers from the Amazon catalog.
  • O’Reilly Book Page Mashup of Backpack and Amazon.com APIs to generate Backpack pages with Amazon.com book data.

Flickr API

  • flickr graph Social network visualization using Flickr API:

  • Flickr Related Tag Browser Search and visualization tool that lets you surf Flickr’s tag space. Flickr tags are keywords used to classify images. Related tags shown based on clustered usage analysis.
  • Flickrscape Enter a word and watch the flickr photo stream. Click to interrupt stream and try another word.
  • geobloggers Google Maps + Flickr photos. It also consumes del.icio.us for geotagged bookmarks and the Upcoming.org for US events, which it then geocodes.

del.icio.us API

  • Delancey This nice del.icio.us enhancement allows you to see which of your del.icio.us bookmarks are used most frequently.
  • thumblicious Use thumblicious to quickly preview the most popular sites bookmarked on del.icio.us via thumbnail screenshots.

Google API

  • Copyscape A website plagiarism search tool that uses the Google Search API.
  • DoubleTrust Shows the best search results from both Google and Yahoo in a new way. Also allows user to alter his trust in either engine to bais combined rankings.
  • QTSaver Uses Google and Yahoo APIs to extract microcontent from multiple sites and allows you to rearrange the excerpts.
  • SpellWeb Compares relative popularity of spellings or concepts based on web frequency. An experiment in sidesifting the Web for useful patterns of information:

You get the idea… you probably get a thousand ideas. That’s the problem with mashups — too many ideas, too many variously commercial or incomplete datastreams, too much sheer buzz. But quickly, perhaps within a fly’s lifespan, your library may truly catch on.

Sticking around

Thursday, February 16, 2006

Check out what’s new at that flagship of Library 2.0-ness — the plugged-in to plug-ins, blessed by superpatrons, interactively inventive Ann Arbor District Library: card catalogs!

Remember card catalogs? If you do, you’ll remember that uniquely tactile experience: the sliding out, the flipping through, the red-ink-mandated cross referencing, the peering & copying & replacing. You remember the yellowing card musk, the little codes and numbers, the misaligned typing of some librarian in some back office on some rainy afternoon in 1943.

There the cards were, so vulnerable in their long drawers, just waiting in to be smudged by indifferent sticky fingers, scribbled across by any lunatic with an agenda, ripped out by any patron too lazy to copy down call numbers. Card catalog maintenance must have been a heck of a job, Brownie–and good riddance.

Yet cards are where the public touched the library, and maybe that’s why (shaking ourselves out of pre-OPAC reverie) we see the inventive John Blyberg, AADL’s lead developer, reviving catalog cards in a virtual setting. None of the fuss, none of the muss — and now you don’t have to feel bad about writing on the cards, or grabbing them for yourself.

Here’s a look — the AADL OPAC listing for a book on marginalia offers a link to a “Card catalog image” (near the top of the record):

Click the link, and here’s the generated card — bottom perforation and everything. Someone has already scrawled a message on the card: Defacement is subjective. You, or anyone, could add another scrawl by entering text in one of the three position fields and clicking on that very 2.0 button, Add your marginalia!:

For patrons with accounts, cards can be gathered into personal collections which can, in turn, be shared with other patrons:

Blyberg writes in his description of the project that it was “black-ops” — no committee, no proposal, no approval, no testing, no advertising, no muss no fuss — so it remains a bit murky and provisional. Marginalia on a given card seems limited to three entries. A book can have several cards associated with it, and it’s not immediately clear how to look through all those cards. Also, I’m not sure whether or how cards gathered into one’s own collection can be inscribed by others.

If virtual card catalogs are merely proof-of-concept at this point, the concept reminds me a bit of a project that the Alchemical Muser and others were working on at Columbia’s CCNMTL called Plone Stickies. These Stickies initially allowed students to attach short notes to digital objects — but the fuller vision for them, I believe, involves client-side keyword tagging and community sharing.

What do virtual catalog cards and these stickies have in common, besides a general yellowness? They both draw on the desire to physically connect to thought-objects. As such objects recede into a intangible, fungible environment, it’s notable that old means of tracking them — those flopping and curling and awkward apparatuses of identification — persist in collective memory, and expand into markers of collectivity.

CiteI’dLike

Tuesday, February 7, 2006

If you were to invent del.icio.us for academics, how would it work? It would allow for bookmarking, tagging, and sharing. It would pull metadata from academic resource databases. It would allow me (the layprof) to organize collected essays and citations with a minimum of clickage. And it would do all these things in a browser, from on or off campus, independent of platform. In short, it would be quite like CiteULike.

This is a little story about my first pass into CiteULike, and if it’s not entirely a happy story, we should still bear in mind the possibilities, the promise, the 2.0ness of it all.

I abjectly learned about CiteULike just recently (designed by Richard Cameron over a year ago). Sitting through some screencasts made by Tannis Morgan at UBC , I saw how this social bookmarking tool could be useful not only as a way to track journal contents, specifically tagged articles, and other academics’ bookmarks — through RSS — but also as a means to build a library of collected resources — available anywhere and to all.

Holy digital hotness! said I. I’ll try it for myself! And here’s where minor chords start to well up in the background.

Creating an account on CiteULike was childsplay; in ten seconds I was ready to bookmark and collect. Stunned a bit by the possibilities, and revived a bit by narcissism, I decided to start a collection with articles I’ve written. Tough luck, bucko. Though CiteULike offers to browse through some 6500 journals, this roundup doesn’t include the ones that have sponsored my thoughts. In fact, many of the journals seem to be science-related. As ever, the humanist is the redheaded stepchild of resource sharing ventures.

That’s ok, said I. I’ll find some article that’s at least in my field. I saw that Nineteenth-Century Contexts was one of the proffered journals, and scanning a recent edition I saw listed an article about Mary Shelley by Diane Long Hoeveler. Very good, said I. I’ll collect that:

Two links offered to let me ‘view the article online’. Excellent idea! But these links led me to publisher sites, one of which offered a “free sample,” the other demanded $33.67 plus tax. Much disturbing mention of shopping carts. This will never do, said I. Since I am off campus, what I seem need is a way for CiteULike to create paths into Bowdoin’s collections.

So I added the citation to the mysterious Hoeveler article to my own collection, tagging it in the process. Only one-word tagging, please.

A couple of cool features to notice here: I (or anyone) can track my collection through RSS. And metadata from this collection can be gussied up for EndNote with just one click (note how my tags turned into keywords in this EndNote record):

But the problem remained: how to actually connect to the article? I dug around in CiteULike’s FAQs and felt more assured that offcampus proxy access to articles would make those shopping carts disappear. For this functionality, CiteULike pointed me to a COinS Browser Extension written by Dan Chudnov at Yale .

In order to install this little extension, I had to first install Greasemonkey in my Firefox browser — not too difficult, but, trust me, we’ve lost the layprofs by now. The COinS extension allowed me to designate my own institution’s OpenURL resolver, and plug that resolver into OpenURL links now ‘discovered’ in my browser. That way, theoretically, one could click on a resource link on any site and actually access that resource through one’s own institution. You can see this in action here: note the new link that invites me to “Check availability @ Bowdoin”.

But, alas, here’s what happened to me when I clicked that invitation to check availability@Bowdoin:

Note that none of the metadata for the article has been passed through except for the article’s date. At this point I had neither the time nor the skill nor the patience to figure out where the glitch was; I only knew that I was off campus and out of luck accessing an article I found on CiteULike.

Never give up, I told myself. With one last bit of inspiration, I decided to see whether the little bookmarklet that CiteULike distributes (”Post to CiteULike”, rather like del.icio.us’s “Remember this” bookmarklet) would work going the other way. That is, suppose I’m signed into Bowdoin’s databases, and I run across an article I’d like to post onto the CiteULike. That’s just a click of the button, right?

The FAQs warn me that automatic metadata export into CiteULike would only occur with supported databases, which are: AIP Scitation, Amazon, American Geophysical Union, American Meteorological Society, Anthrosource, Association for Computing Machinery (ACM) portal, BMJ, Blackwell Synergy, CiteSeer, HighWire, IEEE Xplore, IngentaConnect, IoP Electronic Journals, JSTOR, MathSciNet, MetaPress, NASA Astrophysics Data System, Nature, PLoS Biology, PubMed, PubMed Central, Science, ScienceDirect, SpringerLink, Usenix, Wiley InterScience, arXiv.org e-Print archive. (See what I mean about the humanities?) Well, JSTOR seemed my best bet, so I rooted around in Bowdoin’s library site until I found an article on Mary Shelley in JSTOR. Here was one from ELH: “Narratives of Seductions and the Seductions of Narrative: The Frame Structure of Frankenstein” (Ok I see what you mean about the humanities).

When I clicked my bookmarklet to Post to CiteULike, here’s what happened:

Hmm…. that really didn’t take the drudge out of drudgery, did it? I mean, yes, some barebones metadata is passed through, but all to the title field; I have a fair amount of tending, cutting, and pasting to do if I want this to be a real citation. If I feel like more work, I can download a PDF version of the article to my computer, then upload it into CiteULike so I can privately retrieve the article wherever I am. I can’t share the full text with other Mary Shelley aficionados, though: they have to try their own luck tunneling into their own publisher-paying institutions. Otherwise, you know, that’d be stealing.

I believe wholeheartedly that around the world, from within and without institutional walls, academics are happily collecting and sharing resources with CiteULike. I can see this happening minute by minute on the home page:

But at least right here & right now, I can’t fully play. And I feel swamped by “everyone”. How many of “everyone’s” tags link to articles I can understand, much less evaluate and collect?

Once the mechanics were ironed out, this would be my next wish for CiteULike: the creation of discipline-based communities, so I could track the tags of colleagues pondering British literature — and feel less intimidated by clustering geophysicists.

Minding our own business

Tuesday, January 31, 2006

We need no special issue of Techne to tell us that digital technology comes bundled with a host of political implications. We know that we’re newly vulnerable to tracking, that Google is noting our every search; we know that hackers and spies skulk through networks; we know that access, permissions, and digital rights policy is set by administrators answerable to… well, not us.

Graham Longford’s contribution to the special issue of Techne, Pedagogies of Digital Citizenship and the Politics of Code, enumerates the ways technological citizenship (his words in italics) has devolved. Unsurprisingly, it’s the standard postlapsarian plot, the dark invasion. It’s a colonization of cyberspace by proprietary code and various legislative initiatives designed to protect it; it’s a major renegotiation of the terms and conditions of cybercitizenship as embodied in the design of the early Internet. What can redeem us and restore the early design? Pray for it: open source, with its reconfiguration of existing protocol technologies.

I was a bit surprised, though, to see rounded up among the usual compromisers of digital freedom — privacy and rights-eroding identifiers such as cookies, autofill, and DRM — a less obvious villain: customization. You’d think that self-managed customization of web services would put some power back into the hands of end-users, but Longford’s having none of it.

Why not? Here are ways, according to his essay, that the proliferation of web portals through which users gain access to information and services customized to their specific needs and interestsimpinges on the nature of on-line citizenship:

  • pseudo-personalizable tools: customization options available to users through processes that are far from neutral, such as menus that support only certain kinds of activities on the web (shopping, sports, MSM breaking news, shopping, horoscopes, weather, shopping…)
  • the promotion of passivity, since users are encouraged to assume a posture of waiting for information to be brought to them
  • the creation of a self-edited ‘Daily Me’ delivered to… electronic doorsteps; your choices wall off the infinitude that is life: web portals and customization tools enculturate [sic] users into certain kinds of habits, conduct and expectations that condition their use and experience of the web, with the potential for spillover into the off-line world.
  • and, extending the last point, the inculcation of entitlement, the co-option of the web in favor of consumer empowerment and personal fulfillment rather than as a means to negotiate difference and overcome intolerance.

Longford and his sources [1] may have a point or two here, but these “impingements” seem tallied in a pre-RSS world. We’re no longer hostage to portal menus (though a Google toolbar might seduce you into surrendering); managing your own diet of feeds seems as much of a hunt, an active gathering and tending — and perhaps even a means of self-broadcasting — as it does a process of consumption.

Moreover, inveighing against customization — and defining the web, instead, as best used to confront difference — seems largely blind to the needs of actual, day-to-day work online. Who could get anything done with someone constantly tugging at one’s sleeve, like an unmanageable child, to look at something else, look at something else? There are times to cast one’s eye broadly over the world — to tear into a good international paper, or far-flung novel, or obscure recording, or whatever. But if one is seriously tracking developments in a field, one needs to be able to track. Maybe it’s time to use another term for this process, now that XML-based technology is allowing us to more efficiently harvest information for ourselves: not “customization,” but “cultivation.”

The application of such activity to an academic library environment is far from settled, or even defined. MyLibrary, an open source package allowing library users to configure their own resource lists, is a prominent first step, and as far as I can tell, the jury is out on its effectiveness. Lehigh deems its implementation successful, while NC State has issued a rather melancholy five-years-down-the-road report on the limits of MyLibrary — students, at least undergraduates, won’t use this tool much unless it’s tied into course requirements, ie a CMS.

Perhaps the specific problem with MyLibrary is that it was developed early, in the shadow of that first wave of menu-driven, static customization. Here’s a mock-up of its newest, 3.0 interface — not a whisper of RSS, not a hint of tagging here:


Helping patrons purposely chart their way through an ever-increasing universe of digital information is exactly what libraries should be doing, and ‘cultivation’ tools are the way to do it. Since it is open source, MyLibrary may well evolve into something more feed-based, more dynamic, more immediately useful; if not, another personalization tool will step into the breach.

Treating all patrons alike, enforcing a one-size-fits-all approach to the web, may correspond to a fantasy of global equality and universal dialogue. But in fact, if we are not to be bewildered or distracted by what’s out there — if we are to really apply the tradition of academic specialization to the web — we need to put these tools to work for our individually defined pursuits.

We may deplore, along with Rousseau, the unnatural fact of individualized labor; we may even agree with Wendell Berry (“The Unsettling of America”) that “the disease of modern culture is specialization… the abdication to specialists of various competencies and responsibilities that were once personal and universal.” The web’s ever-growing reach understandably feeds universalist fantasies.

And yet if you’re going to get work done in this environment, if you’re living among practical limitations of time and attention and self-cultivation, a platonic digital citizenship seems more viable: “‘This, then,’ I said, ‘my friend, if taken in a certain sense appears to be justice, this principle of doing one’s own business.’” (Republic, 433b)

[1] Lifted from Longford’s bibliography - some critics of customization:

Luke, Robert. 2002. “Habit@online: Web Portals as Purchasing Ideology.” Topia: A Canadian Journal of Cultural Studies 8: (Fall), 61-89.

Nakamura, Lisa. 2002. Cybertypes: Race, Ethnicity, and Identity on the Internet. New York: Routledge.

Patelis, Korinna. 2000. “E-Mediation by America Online.” In Preferred Placement: Knowledge Politics on the Web, ed. Richard Rogers. Maastricht: Jan van Eyck Editions, 49-63.

Sunstein, Cass. 2001. Republic.com. Princeton NJ: Princeton Univ. Press.