Fine-tuning SciBERT to enable ASJC-based assessments of the disciplinary orientation of research collections

  • Michael Gusenbauer*
  • , Jochen Endermann
  • , Harald Huber
  • , Simon Strasser
  • , Andreas-Nizar Granitzer
  • , Thomas Ströhle
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Subject classification is essential for navigating scientific literature, yet the influential All Science Journal Classification (ASJC) has limited practical applicability. Its limitations stem from reliance on an incomplete source list restricted to Scopus content, and from journal-level classifications that often misrepresent individual documents. The most significant recent development in ASJC-based classification is OpenAlex, but it narrows the framework by reducing the number of categories and enforcing single-label assignments—both of which diminish classification accuracy. In response, this study introduces the first open, multi-label, implementation of the ASJC taxonomy that more accurately classifies individual documents, including those published in general science or interdisciplinary journals. We develop a fine-tuned SciBERT model for multi-label classification across 307 ASJC subjects, trained on a large-scale Crossref dataset using title, abstract, and source title metadata. The model achieves a weighted F1-score of 0.892 on 307 subjects and 0.934 on its 26 parent subjects on a Crossref test set with full metadata. It maintains respectable performance-0.532 and 0.694, respectively—even without the source title information that ASJC classification relies upon. Our fine-tuning strategy includes selective metadata omission to mitigate overfitting and data augmentation for underrepresented categories. In addition, we introduce a tailored label-averaging method that enables assessment of the disciplinary orientation and comparison of individual documents and larger collections—such as researcher portfolios, institutions, and entire databases. To promote transparency, reproducibility, and further research, we openly release our model via Hugging Face (https://huggingface.co/asjc-classification), providing ready-to-use ASJC-based subject classification.

Original languageEnglish
Number of pages38
JournalScientometrics
DOIs
Publication statusPublished - 01 Dec 2025

Fields of science

  • 502015 Innovation management
  • 502 Economics

JKU Focus areas

  • Sustainable Development: Responsible Technologies and Management
  • Digital Transformation

Cite this