Abstract

Researchers publish papers to report their research results and, thus, contribute to a steadily growing corpus of knowledge. To not unintentionally repeat research and studies, researchers need to be aware of the existing corpus. For this purpose, they crawl digital libraries and conduct systematic literature reviews to summarize existing knowledge. However, there are several issues concerned with such approaches: Not all documents are available to every researcher, results may not be found due to ranking algorithms, and it requires time and effort to manually assess the quality of a document. In this paper, we provide an overview of the publicly available information of different digital libraries in computer science. Based on these results, we derive a taxonomy to describe the connections between this information and discuss their suitability for quality assessments. Overall, we observe that bibliographic data and simple citation counts are available in almost all libraries, with some of them providing rather unique information. Some of this information may be used to improve automated quality assessment, but with limitations.