Fine-Tuning Large Language Models for Ticket Classification at Doka GmbH

Selina Engelbrechtsmüller

Research output: ThesisMaster's / Diploma thesis

Abstract

As digitalization progresses, the number of IT applications in companies constantly increases. This leads to an increase in IT problems, which are usually handled by an IT ticket system in large companies. Doka GmbH also uses such a helpdesk to manually forward tickets to the responsible department for processing. This manual process is very time-consuming and costly. Automatic text classification is required to automate this process. In the past, text classification was performed using traditional classifiers and, later, deep neural networks (DNN). With the advent of the Transformer architecture, Large Language Models (LLM) such as GPT and BERT were developed. GPT is an LLM published by OpenAI, while Google developed BERT. These are very good at understanding general text but often fail to understand domain-specific text such as that found in IT tickets. For this reason, fine-tuned models of GPT and BERT are created and compared for the five attributes that describe a ticket at Doka GmbH. Fine-tuning involves training the LLM on a specific task and adjusting the model parameters to suit that task during training. It turns out that GPT outperforms BERT in terms of accuracy in 4 out of 5 cases. Data augmentation techniques such as EDA (Easier Data Augmentation) and AEDA (An Easier Data Augmentation) are used to achieve better performance with the fine-tuned BERT model. These use methods such as random insertion to ensure that additional labeled data can be generated. The research shows that both techniques can improve the performance of the BERT model.
Original languageEnglish
Supervisors/Reviewers
  • Schrefl, Michael, Supervisor
  • Straßer, Martin, Co-supervisor
Publication statusPublished - Jun 2024

Fields of science

  • 102 Computer Sciences
  • 102010 Database systems
  • 102015 Information systems
  • 102016 IT security
  • 102025 Distributed systems
  • 102027 Web engineering
  • 102028 Knowledge engineering
  • 102030 Semantic technologies
  • 102033 Data mining
  • 102035 Data science
  • 509026 Digitalisation research
  • 502050 Business informatics
  • 502058 Digital transformation
  • 503008 E-learning

JKU Focus areas

  • Digital Transformation

Cite this