The Scope of Data Mining SBS Data / Database / Content

Feedback

The Scope of Data Mining

At SBS Data, data mining technology provides a basis for new products and for enhancements to existing offerings. For example, at DBIS, data mining tools can be used to automate more elements of the process of building risk models for a variety of markets. Data mining can present a Nielsen customer with the top ten most significant new buying patterns each week, or present an IMS customer with patterns of sales calls and marketing promotions that have significant impact within certain market niches.

Some of the most commonly used techniques in data mining are:

Artificial neural networks: Non-linear predictive models that learn through training and resemble biological neural networks in structure.
Genetic algorithms: Optimization techniques that use processes such as genetic combination, mutation, and natural selection in a design based on the concepts of natural evolution.
Decision trees: Tree-shaped structures that represent sets of decisions. These decisions generate rules for the classification of a dataset. Specific decision tree methods include Classification and Regression Trees (CART) and Chi Square Automatic Interaction Detection (CHAID) . CART and CHAID are decision tree techniques used for classification of a dataset. They provide a set of rules that you can apply to a new (unclassified) dataset to predict which records will have a given outcome. CART segments a dataset by creating 2-way splits while CHAID segments using chi square tests to create multi-way splits. CART typically requires less data preparation than CHAID.
Nearest neighbor method: A technique that classifies each record in a dataset based on a combination of the classes of the k record(s) most similar to it in a historical dataset (where k 1). Sometimes called the k-nearest neighbor technique.
Rule induction: The extraction of useful if-then rules from data based on statistical significance.
Data visualization: The visual interpretation of complex relationships in multidimensional data.