Those who work in data mining or predictive analytics are familiar with the CRISP-DM process. Metaphorically, if not literally, that process description is taped to our wall. Tom Khabaza’s Nine Laws of Data Mining should be taped up right next to it.
Khabaza has published those laws as a series of blog posts, here. For each law, he has provided a short name, followed by a one-sentence summary, supported by a few paragraphs of explanation.
The value of these laws is that they help prepare us for what to expect as we do the work — and then they remind us of what we should have expected if we occasionally forget!
As I am a fan of brevity, I’m creating this post as a list of the single-sentence summaries. Occasionally I’ll add a short clarifying note. Here they are:
Tom Khabaza’s Nine Laws of Data Mining
- Business objectives are the origin of every data mining solution.
- Business knowledge is central to every step of the data mining process.
- Data preparation is more than half of every data mining process.
- The right model for a given application can only be discovered by experiment (aka “There is No Free Lunch for the Data Miner” NFL-DM).
- There are always patterns (aka “Watkin’s Law).
- Data mining amplifies perception in the business domain.
- Prediction increases information locally by generalization.
- The value of data mining results is not determined by the accuracy or stability of predictive models. (Rather, their value is found in more effective action and improved business strategy.)
- All patterns are subject to change. (Thus, data mining is not a once-and-done kind of undertaking.)
These laws, as Khabaza points out, are not telling us what we should do. Rather they are “simple truths,” describing brute facts that give shape to the landscape in which data mining is done. Their truth is empirical, discovered and verified by those who’ve been doing the work. So it’s best to keep these truths in mind and adapt our efforts accordingly, lest we pay the price for failing to acknowledge reality as it is.
If you’re intrigued, and want to read further, view Khabaza’s full post here. His exposition of these points is more than worth the time!