Xciting connections

In the perfect world we never seem to live in, migration of scholarship to the web would mean endlessly networked citations. It would mean new metrics for gauging the impact of any given publication, substantiating tenure/promotion and grant proposals with hard evidence. It would give us new tools to map the interplay of research in an interdisciplinary age. Machines would be prosthetic connectors of our truest thoughts.

Citation mapping is a step towards this promise. Academics have been diligently appending to their research footnotes and endnotes of attributions all along; the hooks are there, all we need to do is link them up. Easier said than done, of course, as the Tower of Babylon still smolders. Citation formats and database structures vary; the semantic web is under construction; too often software used to generate citations (MS Office, Endnote, Zotero & the like) is disconnected from the end version of an article, meaning that the article has to be OCR’d and citations re-interpreted. For these and other reasons, as this recent D-Lib article enumerating problems with citation counts points out, “the rates of citation data accuracy and completeness are not precise enough to make fair assessments.”

That’s not stopping efforts to corral citations into paths of discovery, and as usual the science data managers are out in front. Thompson Reuter’s Web of Science, in particular, has been innovating bibliometric analysis and visualization; its Citation Mapping Tool debuted last summer. The tool ‘maps’ articles into generations, allowing you to travel back and forth between cited and citing. Here’s a visualization of how one article cites others:

As this review notes, the tool is far from exhaustive, thanks to database quirks and variation of records across journals. Exporting a citation map is underwhelming at present: you can download it as a flat image, but there is no way to harvest the data into data management. The tool presents some color coding options, so you can sort out ‘types’ of references, but designation of these codes again relies on consistency across fields that cannot be taken for granted.

But perhaps the biggest drawback to this or any version of simple citation mapping is its inability to reflect conceptual relationships. Citations, after all, are made to a variety of sources for a variety of reasons, not all of them equally germane to what an article is about. An article may cite something it’s refuting, or may be cluttered with window-dressing references, or may go out of its way to cite the work of mentors or colleagues more out of a sense of politesse than necessity. Until this variation of citation quality is somehow addressed, along with improved metadata standardization and database interoperation, it seems doubtful that citation mapping can, in the words of the WOS mapping reviewer, “represent, and make access to, the historical progress of human inquiry, including its interdisciplinary aspects.”


Time to take another tack? As a recent NYT summary noted, data scientists at Los Alamos have come up with a new mapping of the connections between various disciplines. These connections are charted by tracking logs of click-throughs by researchers moving between journals. The project, detailed in PLoS, is seeking a more accurate way to measure and represent research interconnections than the more traditional citation mapping.

The PLoS report lists advantages of clickstream data: it is immediate information (versus the years that citation data can take to fall into place), it is based on private and actual navigation activity (versus the various motives for citation mentioned above). The report also notes a drawback to relying on clickstreams: “User interactions with scholarly web portals are shaped by many constraints, including citation links, search engine results, and user interface features.” It’s the same infrastructure problem haunting citation mapping.

In any case, the map of click-through connections is quite fun to look at – it’s color-coded by discipline. Humanities sort out to the middle, which is good and proper. Behold what the PLoS authors call a “first-ever glimpse of this terra incognita”:

