TY - GEN
T1 - The ABLoTS Approach for Bug Localization: is it replicable and generalizable?
AU - Niu, FeiFei
AU - Mayr-Dorn, Christoph
AU - Guez Assuncao, Wesley Klewerton
AU - Huang, LiGuo
AU - Ge, Jidong
AU - Luo, Bin
AU - Egyed, Alexander
PY - 2023
Y1 - 2023
N2 - Bug localization is the task of recommending source code locations (typically files) that probably contain the cause of a bug and hence need to be changed to fix the bug. Along these lines, information retrieval-based bug localization (IRBL) approaches have been adopted, which identify the most bug-prone files from the source code space. In current practice, a series of state-of-the-art IRBL techniques leverage the combination of different components, e.g., similar reports, version history, code structure, to achieve better performance. ABLoTS is a recently proposed approach with the core component, TraceScore, that utilizes requirements and traceability information between different issue reports, i.e., feature requests and bug reports, to identify buggy source code snippets with promising results. To evaluate the accuracy of these results and obtain additional insights into the practical applicability of ABLoTS, supporting of future more efficient and rapid replication and comparison, we conducted a replication study of this approach with the original data set and also on an extended data set. The extended data set includes 16 more projects comprising 25,893 bug reports and corresponding source code commits. While we find that the TraceScore component as the core of ABLoTS produces comparable results with the extended data set, we also find that the ABLoTS approach no longer achieves promising results, due to an overlooked side effect of incorrectly choosing a cut-off date that led to training data leaking into test data with significant effects on performance.
AB - Bug localization is the task of recommending source code locations (typically files) that probably contain the cause of a bug and hence need to be changed to fix the bug. Along these lines, information retrieval-based bug localization (IRBL) approaches have been adopted, which identify the most bug-prone files from the source code space. In current practice, a series of state-of-the-art IRBL techniques leverage the combination of different components, e.g., similar reports, version history, code structure, to achieve better performance. ABLoTS is a recently proposed approach with the core component, TraceScore, that utilizes requirements and traceability information between different issue reports, i.e., feature requests and bug reports, to identify buggy source code snippets with promising results. To evaluate the accuracy of these results and obtain additional insights into the practical applicability of ABLoTS, supporting of future more efficient and rapid replication and comparison, we conducted a replication study of this approach with the original data set and also on an extended data set. The extended data set includes 16 more projects comprising 25,893 bug reports and corresponding source code commits. While we find that the TraceScore component as the core of ABLoTS produces comparable results with the extended data set, we also find that the ABLoTS approach no longer achieves promising results, due to an overlooked side effect of incorrectly choosing a cut-off date that led to training data leaking into test data with significant effects on performance.
UR - https://www.scopus.com/pages/publications/85166293566
U2 - 10.1109/MSR59073.2023.00083
DO - 10.1109/MSR59073.2023.00083
M3 - Conference proceedings
T3 - Proceedings - 2023 IEEE/ACM 20th International Conference on Mining Software Repositories, MSR 2023
SP - 576
EP - 587
BT - 20th International Conference on Mining Software Repositories (MSR)
ER -