January 2010 Issue • Volume 38 • Issue 1

download pdfDownload full issue pdf

Clickstream Mapping of Scientific Activity—
Opportunity and Caution

by Donald Janelle and Michael Goodchild, Center for Spatial Studies and the Center for Spatially Integrated Social Science


What are the social and scientific implications of having every keystroke logged into a longitudinal archive? This may sound preposterous and invasive, but take a look at Bollen, Van de Sompel, et al. (2009). These researchers, affiliated with digital library and mathematical modeling teams from the Los Alamos National Laboratory and the Santa Fe Institute, assembled nearly a billion interactions from the user logs of leading scholarly web portals, including Thomson Reuters' Web of Science. Elsevier's Scopus, JSTOR, Ingenta, University of Texas, California State University, and several health institutions. From this corpus, they reconstructed the article-to-article and journal-to-journal sequential moves made by web users in 2006 and 2007 to create a stochastic model of search transitions from one journal to another. Using careful data assembly and validation methods and employing network analysis approaches for visualization, they demonstrate how the mapping of connectivities across knowledge space can help reveal patterns of interactions and clusters of journals (disciplines) that have been obscured by traditional approaches. The authors lay claim to the ". . . first ever map of science derived from scholarly log data" (p. 2).

Traditional maps of scientific activity have relied largely on citation data within discipline groupings, such as the sciences, engineering, social sciences, and humanities. Clickstream researchgoes a step further with potential for new insights about broader levels of knowledge exchange. The interesting news for sociology and other social sciences is that this mapping reveals higher levels of centrality for social science and humanities journals than traditionally accorded from citation analyses. Nonetheless, the traditional citation resources have distinct advantages in leveraging greater contextual background about interactions (e.g., authors' names, institutions, disciplines, references cited, and citations across time). Thus, clickstream data are likely to complement rather than supplant citation information.

The potential for visualization technologies to investigate novel data sets helps inform us about patterns and processes of knowledge development. This has emerged as a strong area of development in information science, with journals, such as Scientometrics and the Journal of Infometrics, which feature articles on the uses of citation data to analyze knowledge production and to identify clustering of scientific activity. Applications for such mappings include the evaluation and fine-tuning of science policy by funding agencies and organizational performance assessment by academic institutions and commercial enterprises. By broadening the corpus of source documents, the opportunities for use of such mappings seem limitless; but, as illustrated by careful review of the clickstream paper, caution is advised.

The Analysis

In the clickstream analysis* by Bollen and Van de Sompel, et al., the researchers were restricted in the levels of aggregation made available by the data proprietors (see http://www.plosone.org/article/info:doi/10.1371/journal.pone.0004803). Thus, nothing is known about the individuals who are searching the indexes. Key diagnostic clues (e.g., demographics, institution, geographical location, web domain) are missing. This restriction on context means there is no way to know the influence of casual lay visitors to the portals, as opposed to scholarly researchers, on the graphical portrayal of network outcomes. Other potential cleavages might include linguistic and national origins of users, or distinctions among corporate, public, academic, and personal agents and their corresponding motivations for conducting a literature search. In their absence, such attributes contribute noise that may hinder the interpretation of scientific activity.

Many of the assumptions of the clickstream investigation need to be probed for their biaing potential. Although the cycling of knowledge trends, from novel origins to routine practices to obsolescence, is occurring over shorter and shorter periods, the inference of process in this clickstream study is inhibited by a snapshot view of data aggregated to a single two-year period. Clearly, the potential for longitudinal strategies will yield more refined understandings of connectivity across journals and disciplines, an objective espoused by the authors.

The authors are careful to point out that portraying the results as a two-dimensional map entails subjective choices. Other instances of concern may relate to the (possibly accidental) omission of key disciplines (e.g., mathematics) in the labeling of journal clusters. Although the Humanities Citation Index (included as part of the Web of Science) covers journals in history, philosophy, and the arts, it is not clear whether literary journals that feature poetry, creative non-fiction, and fiction were included, or why magazines and leading newspapers should or should not be included. The corpus of human documented knowledge is indeed expansive. Increasingly, the democratization of information via the web has blurred some of the distinctions between professional, scholarly, and lay media. Other questions, noted by the authors, include the uncertainty of user motivations in accessing portals to scholarly literature and the impacts that the design of web interfaces might have on user behavior.

Visualization methodologies, including cartographic renderings from geo-spatial analysis (e.g., geographical information systems) and graphic representations of spatio-temporal processes (e.g., through agent-based modeling), are demonstrating strong capabilities to move scholarship beyond disciplinary silos and to forge new alliances for knowledge development and dissemination. Even the distinction between little science and big science (de Sola Price, 1963) needs reevaluation in a world where, potentially, anyone can take part in creative dialogues to solve problems and to create new social and technological realities.

The authors of the clickstream map of science deserve recognition for demonstrating ingenious approaches to accessing weblogs and to ordering complex interactions into discernable patterns of scientific activity. We hope that this is a building block that will invite even more cogent methodologies for uncovering secrets to the processes of innovation, revealing structures that encourage or hinder knowledge development, and for identifying the salient paths of knowledge delivery. These dimensions of the emerging information society represent new areas for investigation where sociological sensitivities to understanding human processes and organizations are required.


Bollen, J., Van de Sompel, H., Hagberg, A., Bettencourt, L., Chute, R., Rodriguez, M. A., and Balakireva, L. 2009. "Clickstream data yields high-resolution maps of science. PLoS ONE." 4(3): e4803. (www.plosone.org/article/info:doi/10.1371/journal.pone.0004803).

de Solla Price, D. J. 1963. Little Science, Big Science New York: Columbia Univ. Press.

*In The clickstream map of scientific activity (fig. 5) at www.plosone.org/article/info:doi/10.1371/journal.pone.0004803, circles represent individual journals; discipline names designate clusters of related journals.

Back to Top of Page

Print this article share this article discuss this article

Featured Advertiser:

rwjf ad

Back to Front Page of Footnotes