Abstract
Predicting stock prices and determining the optimal moments to buy or sell stocks is a long-standing challenge for investors. Advances in natural language processing (NLP) allow for extracting valuable insights from unstructured text, and diverse studies have used news articles to predict the stock market, employing techniques such as lexicon-based sentiment analysis and topic modeling through Latent Dirichlet Allocation (LDA). These traditional approaches, however, do not consider the semantic relationships among words. Language models that use text embedding techniques, such as BERT, have gained popularity in the NLP field for their ability to consider the context of words.
This thesis evaluates the use of BERT-based topic modeling and sentiment analysis of financial news in the context of training a classifier to predict the direction of movement of the S? 500 index. On the one hand, this thesis evaluates BERT-based models that consider semantic relationships among words, specifically FinBERT and BERTopic, in conjunction with various classification algorithms, including Logistic Regression, Support Vector Machine (SVM), and Random Forest, among others. On the other hand, to provide a benchmark, the method is applied with the same classification algorithms using traditional techniques for sentiment analysis and topic modeling that do not consider word context. The benchmark sentiment analysis relies on a lexicon-based approach utilizing the Loughran and McDonald dictionary, while the topic modeling employs Latent Dirichlet Allocation (LDA).
The comparison between the BERT-based method and the selected benchmark involves evaluating the accuracy, precision, sensitivity, and other classification metrics. Furthermore, the research explores the influence of several factors on prediction outcomes, including the size and frequency of training the topic model and the impact of utilizing only the headline versus the full article. The results indicate that BERT-based methods marginally outperform traditional approaches in predicting stock price direction. However, it has become apparent that relying solely on sentiment information and topic models derived from financial news may not suffice for accurately forecasting the S? 500 index's direction.
Original language | English |
---|---|
Supervisors/Reviewers |
|
Publication status | Published - Jun 2024 |
Fields of science
- 102 Computer Sciences
- 102010 Database systems
- 102015 Information systems
- 102016 IT security
- 102025 Distributed systems
- 102027 Web engineering
- 102028 Knowledge engineering
- 102030 Semantic technologies
- 102033 Data mining
- 102035 Data science
- 509026 Digitalisation research
- 502050 Business informatics
- 502058 Digital transformation
- 503008 E-learning
JKU Focus areas
- Digital Transformation