TY - GEN
T1 - Partial Pre-Post Code Tree: A Memory-Efficient Tree Structure for Conjunctive Rule Mining
AU - Huynh, Van Quoc Phuong
AU - Beck, Florian
AU - Fürnkranz, Johannes
PY - 2025/7/20
Y1 - 2025/7/20
N2 - State-of-the-art rule mining algorithms rely on summarizing the training set into efficient data structures which allow to quickly answer arbitrary conjunctive queries about the data. The key limitation of such techniques is their memory consumption. Pre-post code trees (PPC-trees) which are the basis of several efficient association and classification rule mining algorithms, are only constructed as an intermediate representation and subsequently converted into a much more efficient N-lists structure. In this paper, we introduce partial pre-post code trees (P3C-trees), which are based on the idea that partial trees are iteratively constructed, and immediately converted into N-lists. This tight integration of these phases allows to avoid the memory bottleneck of a full PPC-tree construction, and thus enables these algorithms to tackle the memory scalability problem posed by large-scale datasets. Our experiments with big datasets confirm that the memory used by P3C-tree is orders of magnitude smaller than the memory consumed by PPC-tree, and the generated N-lists are also more effective than alternative structures such as Tidset or Diffset. Moreover, the N-list construction can also be considerably sped up with the P3C-tree structure.
AB - State-of-the-art rule mining algorithms rely on summarizing the training set into efficient data structures which allow to quickly answer arbitrary conjunctive queries about the data. The key limitation of such techniques is their memory consumption. Pre-post code trees (PPC-trees) which are the basis of several efficient association and classification rule mining algorithms, are only constructed as an intermediate representation and subsequently converted into a much more efficient N-lists structure. In this paper, we introduce partial pre-post code trees (P3C-trees), which are based on the idea that partial trees are iteratively constructed, and immediately converted into N-lists. This tight integration of these phases allows to avoid the memory bottleneck of a full PPC-tree construction, and thus enables these algorithms to tackle the memory scalability problem posed by large-scale datasets. Our experiments with big datasets confirm that the memory used by P3C-tree is orders of magnitude smaller than the memory consumed by PPC-tree, and the generated N-lists are also more effective than alternative structures such as Tidset or Diffset. Moreover, the N-list construction can also be considerably sped up with the P3C-tree structure.
UR - https://www.scopus.com/pages/publications/105014317266
U2 - 10.1145/3690624.3709303
DO - 10.1145/3690624.3709303
M3 - Conference proceedings
VL - 1
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 565
EP - 576
BT - KDD 2025 - Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining
A2 - Sun, Yizhou
A2 - Chierichetti, Flavio
A2 - Lauw, Hady W.
A2 - Perlich, Claudia
A2 - Tok, Wee Hyong
A2 - Tomkins, Andrew
PB - ACM
CY - Toronto, ON, Canada
ER -