Abstract
Abstract
Many companies and organisations worldwide bear the legal responsibility to collect and report value-added tax (VAT) over its sales. Typically, tax reporting is done manually by accountants. The complex-ities in tax legislation make the reporting of VAT error-prone and time-consuming. As a consequence, it occurs that tax is occasionally reported falsely. A potential solution to improve the compliance with tax regulations is a tax compliance system, i.e., a system that can automatically identify tax statements whereof the tax conditions are potentially reported incorrectly. The objective of this thesis is to design and implement a data-driven tax compliance system using machine learning (ML) techniques, specifi-cally for VAT on both incoming and outgoing invoices. The purpose of the tax compliance system is to support accountants in identifying and so preventing incorrect VAT reporting. Using classification, the tax conditions of invoices can be predicted. The thesis is conducted in cooperation with BDO, a con-sulting company which is specialised in tax consultancy services, including VAT reporting. Factors that determine the VAT payable are identified by a VAT domain expert. Real-world enterprise resource planning (ERP) data is used to predict the combination of tax conditions, i.e., the tax code. A framework is presented, where tax code deviations are considered anomalies, that is, potentially incorrect tax codes. A tax code deviation is an invoice whereof the predicted tax code deviates from the tax code assigned by the accountant. The data is pre-processed and a classification model is built for outgoing invoices, which can predict tax codes accurately with an accuracy of 98.9%. The classifier has the ability to learn from an ongoing stream of data and from human-generated feedback. This framework is implemented in a system, wherein the classification model actively attempts to identify incorrect VAT reporting by predicting tax codes of a stream of invoice data. Tax code deviations are highlighted in the system, such that users can interactively verify the correct tax code to prevent incorrect VAT re-porting. The definite tax codes, as entered by the user, are continuously used to retrain the classifier. The design artifact is this ML-based system to increase VAT compliance.
Keywords: data mining, machine learning, tax compliance, value-added tax
Original language | English |
---|---|
Supervisors/Reviewers |
|
Publication status | Published - Nov 2024 |
Fields of science
- 102 Computer Sciences
- 102010 Database systems
- 102015 Information systems
- 102016 IT security
- 102025 Distributed systems
- 102027 Web engineering
- 102028 Knowledge engineering
- 102030 Semantic technologies
- 102033 Data mining
- 102035 Data science
- 509026 Digitalisation research
- 502050 Business informatics
- 502058 Digital transformation
- 503008 E-learning
JKU Focus areas
- Digital Transformation