数据挖掘的决策树

决策树是一种结构，其中包括根节点，分支和叶子节点。每个内部节点表示在一个属性测试，每个分支表示测试的结果和每个叶节点包含类的标签。在树的最顶部的节点是根节点。

下面决策树是概念buy_computer，这表明在公司客户是否可能购买电脑或没有。每个内部节点表示在属性测试。每个叶节点代表一个类。

决策树的优点

它不需要任何领域知识。
这是很容易被人吸收
学习和分类步骤决策树是简单和快速。

决策树算法

名为J.罗斯昆兰在1980年一台机器研究员开发了一种决策树算法。这决策树算法被称为ID3（迭代Dichotomiser）。后来，他给了C4.5这是ID3的继任者。 ID3和C4.5采用贪心方法。在该算法中，没有回溯，树木是建于自上而下的递归的分而治之的方式。

Generating a decision tree form training tuples of data partition D
Algorithm : Generate_decision_tree

Input:
Data partition, D, which is a set of training tuples 
and their associated class labels.
attribute_list, the set of candidate attributes.
Attribute selection method, a procedure to determine the
splitting criterion that best partitions that the data 
tuples into individual classes. This criterion includes a 
splitting_attribute and either a splitting zaixian or splitting subset.

Output:
 A Decision Tree

Method
create a node N;
if tuples in D are all of the same class, C then
   return N as leaf node labeled with class C;
if attribute_list is empty then
   return N as leaf node with labeled 
   with majority class in D;|| majority voting
apply attribute_selection_method(D, attribute_list) 
to find the best splitting_criterion;
label node N with splitting_criterion;
if splitting_attribute is discrete-valued and
   multiway splits allowed then  // no restricted to binary trees
attribute_list = splitting attribute; // remove splitting attribute
for each outcome j of splitting criterion
   // partition the tuples and grow subtrees for each partition
   let Dj be the set of data tuples in D satisfying outcome j; // a partition
   if Dj is empty then
      attach a leaf labeled with the majority 
      class in D to node N;
   else 
      attach the node returned by Generate 
      decision tree(Dj, attribute list) to node N;
   end for
return N;

树木修剪

树木修剪是为了在训练数据中删除异常由于噪声或离群值执行。在修剪树木是更小，更复杂。

树木的修剪方法

下面是列出的树修剪途径：

修剪前 - 该树是由早期停止其建设修剪。
修剪后 - 此方法将删除子树的形式完全成长树。

成本复杂性

成本复杂性测量由以下两个参数：

树的叶子数量
树的误码率

上一篇：数据挖掘分类与预测下一篇：数据挖掘贝叶斯分类

决策树的优点

决策树算法

树木修剪

树木的修剪方法

成本复杂性

HTML / CSS

脚本语言

高级语言

Java技术

XML技术

大数据

开发工具

框架

软件测试

前端技术

数据库

其他技术