决策树是一种结构,其中包括根节点,分支和叶子节点。每个内部节点表示在一个属性测试,每个分支表示测试的结果和每个叶节点包含类的标签。在树的最顶部的节点是根节点。
下面决策树是概念buy_computer,这表明在公司客户是否可能购买电脑或没有。每个内部节点表示在属性测试。每个叶节点代表一个类。

决策树的优点
-
它不需要任何领域知识。
-
这是很容易被人吸收
-
学习和分类步骤决策树是简单和快速。
决策树算法
名为J.罗斯昆兰在1980年一台机器研究员开发了一种决策树算法。这决策树算法被称为ID3(迭代Dichotomiser)。后来,他给了C4.5这是ID3的继任者。 ID3和C4.5采用贪心方法。在该算法中,没有回溯,树木是建于自上而下的递归的分而治之的方式。
Generating a decision tree form training tuples of data partition D Algorithm : Generate_decision_tree Input: Data partition, D, which is a set of training tuples and their associated class labels. attribute_list, the set of candidate attributes. Attribute selection method, a procedure to determine the splitting criterion that best partitions that the data tuples into individual classes. This criterion includes a splitting_attribute and either a splitting zaixian or splitting subset. Output: A Decision Tree Method create a node N; if tuples in D are all of the same class, C then return N as leaf node labeled with class C; if attribute_list is empty then return N as leaf node with labeled with majority class in D;|| majority voting apply attribute_selection_method(D, attribute_list) to find the best splitting_criterion; label node N with splitting_criterion; if splitting_attribute is discrete-valued and multiway splits allowed then // no restricted to binary trees attribute_list = splitting attribute; // remove splitting attribute for each outcome j of splitting criterion // partition the tuples and grow subtrees for each partition let Dj be the set of data tuples in D satisfying outcome j; // a partition if Dj is empty then attach a leaf labeled with the majority class in D to node N; else attach the node returned by Generate decision tree(Dj, attribute list) to node N; end for return N;
树木修剪
树木修剪是为了在训练数据中删除异常由于噪声或离群值执行。在修剪树木是更小,更复杂。
树木的修剪方法
下面是列出的树修剪途径:
-
修剪前 - 该树是由早期停止其建设修剪。
-
修剪后 - 此方法将删除子树的形式完全成长树。
成本复杂性
成本复杂性测量由以下两个参数:
-
树的叶子数量
-
树的误码率