A systematic metadata for analysing scientific networks

One of the disciplines behind the science of science is the study of scientific networks. This work focuses on scientific networks as a social network having different nodes and connections. Nodes can be represented by authors, articles or journals while connections by citation, co-citation or co-authorship. One of the challenges in creating scientific networks is the lack of publicly available comprehensive data set. It limits the variety of analyses on the same set of nodes of different scientific networks.

To supplement such analyses we have worked on publicly available citation metadata from Crossref and OpenCitatons. Using this data a workflow to create scientific networks. Analysis of these networks gives insights into academic research and scholarship. Different techniques of social network analysis have been applied in the literature to study these networks. It includes centrality analysis, community detection, and clustering coefficient. We have used metadata of Scientometrics journal, as a case study, to present our workflow. We did a sample run of the proposed workflow to identify prominent authors using centrality analysis. This work is not a bibliometric study of any field rather it presents replicable Python scripts to perform network analysis. With an increase in the popularity of open access and open metadata, we hypothesise that this workflow shall provide an avenue for understanding scientific scholarship in multiple dimensions.

A systematic workflow from data fetching to analysis. A series steps are required to apply centrality analysis on the author collaboration and authorcitation networks. Utilising the article citation network, available as citation index, these networks get created. All scripts were executed on Windows Server machine having

Quad-Core AMD Opteron (TM) Processor 6272 with 128 GB RAM installed. The initial processing of data requires heavy computation and memory once. Later, the data are converted to a compressed binary format using libraries for processing large networks. It can run on any standard laptop machine. Below, we provide details of the workflow to create scientific networks. Although the case study is limited to data of SCIM, we have made the process automated. This automation helps applying the same script for other journals with minimum changes. a researcher from any field will be able to analyse the scientific networks. Its application can be vast, from identifying reviewers for a manuscript (based on article’s references) to a graduate student finding a supervisor (through collaboration network). The time it takes to completely execute the workflow scripts are well under an hour, barring the two time intensive steps. First, saving citation index as a binary file which needs to be done only once. Second, downloading Crossref DOI files for individual nodes of ego-centered network can be optimised with a local copy. The workflow provides a means for fast and interactive analysis. Since using graphical tools is easier than executing the scripts so a future application of this study is to create a front-end tool. A web-based portal is also under construction where the user may be able to select the date range along with other filters, and the system will initiate the scripts at the back-end. This way the researchers who are not familiar with programming can also benefit. It would enhance the capability and usefulness of this workflow. Techniques for author name disambiguation and partial counting have not been included. For effective analysis these need to be incorporated in future.

You can send your manuscript at https://bit.ly/2GFUS3A

Media Contact:

Lina James

Managing Editor

Mail Id: computersci@scholarlypub.com

American Journal of Computer Science and Information Technology

Whatsapp number: + 1-504-608-2390

May 11, 2021

Author

Adina Bernice