Note that I am no longer at CISTI and that I am now continuing this work at Carleton University - GN 2010 04 07



Semantic Journal Space of 2231 Journals

Scaled to Two Dimensions

needs Java on the browser

Using a custom LuSql filter, for each of 2231 journals, concatenate the full-text of all a journal's articles into a single document.

Using LuSql, create a Lucene index of all the journal documents (took ~14hrs, highly multithreaded on multicore, resulting in 43GB index)

Using Semantic Vectors BuildIndex, create a docvector index of all journal documents, with 512 dimensions (58 minutes, 3.4GB index)

Using Semantic Vectors Search , find the cosine distance between all journal documents (8 minutes) Build journal-journal distance matrix

Use R's multidimensional scaling (MDS) to scale distance matrix to 2-D

Build visualization using Processing

NB:

Built with Open Source Software.





Project Torngat is a research project here at NRC ] that looks to use the full-text of journal articles to construct semantic journal maps for use in -- among other things -- projecting article search results onto the map to visualize the results and support interactive exploration and discovery of related articles, term and journals.Starting with 5.7 million full-text articles from 2200+ journals (mostly science, technology and medical (STM)), and using LuSql R , and processing , a two dimensional mapping of a 512 dimension semantic space was created which revealed an excellent correspondence with the 23 human-created journal categories:This initial work was initiated to find a technique that would scale, and follow-up work is looking at integrating this with a search interface, and evaluating if better structure is revealed within semantic journal mappings of single categories.This may be the first time such large scale full-text is used in this fashion, without the help of article metadata.Try-out the prototype , which displays journals in the 2-D space.How it was done:all the above software are Open Source.You can read more about it in the preprint:Thanks to my collaborators, Alison Callahan and Michel Dumontier , Carleton University.