Abstract
The medical field is rapidly evolving due to advancements in Artifical Intelligence, which allows machines to perform cognitive activities to achieve specific objectives using data as input. As a result, there is a growing demand for text-mining methods to extract useful insights from vast volumes of medical textual data. However, the application of Natural Language Processing techniques in medicine faces several challenges, including the need to adapt to medical terminologies and the differences between ordinary corpora and medical corpora. Deep learning approaches have made advances in text mining methods feasible, but they still face challenges such as the difficulties of scaling efficiently and the lack of domainspecific data.
An actual use case of Bloom Diagnostics GmbH, a start-up in the field of digital health that offers home access to blood tests and health advice, served as the basis for the thesis. In order to validate suggestions to users provided based on the blood tests, the medical team must review a vast amount of relevant scientific records to ensure accurate suggestions to their users. A systematic literature search process is required by the team to extract valuable information, which is time-consuming and manual. With the purpose of accelerating the literature search process, the thesis proposes a general concept for building a recommender system for research papers in the medical field, and focuses on ranking/ re-ranking passages based on their relevance to natural language questions.
Various NLP models, including BM25, BERT, and BioBERT, are compared to determine the most efficient setting. Apart from BM25 and BERT, which can already perform ranking/ re-ranking tasks, BioBERT needs to be fine-tuned separately with medical data so the model is comparable with other ones. By comparing and contrasting those models in different combinations and settings and evaluating the results using various criteria, the main contribution of this thesis is a solution that helps researchers save time by recommending the most relevant passages returned by these techniques. The recommender system, which focuses on recommendations of passages relevant for natural language questions, is built using the current literature search web application for medical researchers at Bloom Diagnostics.
Original language | English |
---|---|
Supervisors/Reviewers |
|
Publication status | Published - Aug 2023 |
Fields of science
- 102 Computer Sciences
- 102010 Database systems
- 102015 Information systems
- 102016 IT security
- 102025 Distributed systems
- 102027 Web engineering
- 102028 Knowledge engineering
- 102030 Semantic technologies
- 102033 Data mining
- 102035 Data science
- 509026 Digitalisation research
- 502050 Business informatics
- 502058 Digital transformation
- 503008 E-learning
JKU Focus areas
- Digital Transformation