CW 258

L. Dehaspe and H. Toivonen
Frequent query discovery: a unifying ILP approach to association rule mining

Abstract

Discovery of frequent patterns has been studied in a variety of data mining (DM) settings. In its simplest form, known from association rule mining, the task is to find all frequent itemsets, i.e., to list all combinations of items that are found in a sufficient number of examples. A similar task in spirit, but at the opposite end of the complexity scale, is the Inductive Logic Programming (ILP) approach where the goal is to discover queries in first order logic that succeed with respect to a sufficient number of examples. We discuss the relationship of ILP to frequent pattern discovery. On one hand, our goal is to relate data mining problems to ILP. On another hand, we want to demonstrate how ILP can be used to solve both existing and new data mining problems. The fundamental task of association rule and frequent set discovery has been extended in various directions, allowing more useful patterns to be discovered. From an ILP viewpoint, however, it can be argued that these settings are all well-controlled subtasks of the full ILP counterpart of the problem. We try to restore the blurred picture by describing the existing approaches using a unified database representation. With the representation, we relate also the DM settings to each other and propose some interesting new areas to be explored. We analyse some aspects of the gradual change in the trade-off between expressivity and efficiency, as one moves from the frequent set problem towards ILP.

report.pdf / mailto: L. Dehaspe