甲乙小朋友的房子

甲乙小朋友很笨,但甲乙小朋友不会放弃

0%

机器学习算法-sklearn GBDT 源码浅读

本次我们探究一下sk-learn的GBDT实现。

先从调包开始说起,一般来说我们是这样调用GBDT的:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import numpy as np
from sklearn.ensemble import GradientBoostingRegressor
# 初始化模型
gbdt=GradientBoostingRegressor(
loss='ls'
, learning_rate=0.1
, n_estimators=100
, subsample=1
, min_samples_split=2
, min_samples_leaf=1
, max_depth=3
, init=None
, random_state=None
, max_features=None
, alpha=0.9
, verbose=0
, max_leaf_nodes=None
, warm_start=False
)
# 训练
X = np.array([[1,2,3,4],[2,3,4,5],[5,6,7,8]])
y= np.array([1,0,1])
gbdt.fit(X,y)

# 预测
X_test = np.array([[1,2,3,4],[2,3,4,5],[5,6,7,8]])
pred=gbdt.predict(X_test)

for i in range(pred.shape[0]):
print (pred[i],test_id[i])

我们首先看一下GradientBoostingRegressor()这个类

GradientBoostingRegressor类

大概看一下这个类:

我们发现这个类只是实现了一个三个预测接口,并没有fit()方法。说明fit()方法完全是父类的。

接下来粗略看看这几种方法。

初始化__init__()

GradientBoostingRegressor()类的初始化函数如下所示:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
class GradientBoostingRegressor(BaseGradientBoosting, RegressorMixin):
def __init__(self, loss='ls', learning_rate=0.1, n_estimators=100,
subsample=1.0, criterion='friedman_mse', min_samples_split=2,
min_samples_leaf=1, min_weight_fraction_leaf=0.,
max_depth=3, min_impurity_decrease=0.,
min_impurity_split=None, init=None, random_state=None,
max_features=None, alpha=0.9, verbose=0, max_leaf_nodes=None,
warm_start=False, presort='auto'):

super(GradientBoostingRegressor, self).__init__(
loss=loss, learning_rate=learning_rate, n_estimators=n_estimators,
criterion=criterion, min_samples_split=min_samples_split,
min_samples_leaf=min_samples_leaf,
min_weight_fraction_leaf=min_weight_fraction_leaf,
max_depth=max_depth, init=init, subsample=subsample,
max_features=max_features,
min_impurity_decrease=min_impurity_decrease,
min_impurity_split=min_impurity_split,
random_state=random_state, alpha=alpha, verbose=verbose,
max_leaf_nodes=max_leaf_nodes, warm_start=warm_start,
presort=presort)

我们可以看到,GradientBoostingRegressor()这个类继承自BaseGradientBoosting()类,并且初始化函数只调用了父类BaseGradientBoosting()的初始化函数。

在下一大节我们会详细看BaseGradientBoosting()

预测predict()

预测函数如下所示:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
def predict(self, X):
"""Predict regression target for X.

Parameters
----------
X : array-like or sparse matrix, shape = [n_samples, n_features]
The input samples. Internally, it will be converted to
``dtype=np.float32`` and if a sparse matrix is provided
to a sparse ``csr_matrix``.

Returns
-------
y : array of shape = [n_samples]
The predicted values.
"""
X = check_array(X, dtype=DTYPE, order="C", accept_sparse='csr')
return self._decision_function(X).ravel()

我们可以看到预测函数只是调用了父类的_decision_function(X)

以上大概看了一下,发现主要内容还是在父类BaseGradientBoosting中。因此我们接下来详细看BaseGradientBoosting。

BaseGradientBoosting类

大概看一眼这个类:

好多方法哦。我们还是只看看fit方法吧。

训练 fit()

直接上代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
  def fit(self, X, y, sample_weight=None, monitor=None):
"""Fit the gradient boosting model.
"""
# ========== 检查输入是否符合要求 =================
# Check input
X, y = check_X_y(X, y, accept_sparse=['csr', 'csc', 'coo'], dtype=DTYPE)
n_samples, self.n_features_ = X.shape
if sample_weight is None:
sample_weight = np.ones(n_samples, dtype=np.float32)
else:
sample_weight = column_or_1d(sample_weight, warn=True)

check_consistent_length(X, y, sample_weight)

y = self._validate_y(y)

random_state = check_random_state(self.random_state)
self._check_params()
# ============ 初始化 ================
if not self._is_initialized():
# init state
self._init_state()

# fit initial model - FIXME make sample_weight optional
self.init_.fit(X, y, sample_weight)

# init predictions
y_pred = self.init_.predict(X)
begin_at_stage = 0
else:
# add more estimators to fitted model
# invariant: warm_start = True
if self.n_estimators < self.estimators_.shape[0]:
raise ValueError('n_estimators=%d must be larger or equal to '
'estimators_.shape[0]=%d when '
'warm_start==True'
% (self.n_estimators,
self.estimators_.shape[0]))
begin_at_stage = self.estimators_.shape[0]
y_pred = self._decision_function(X)
self._resize_state()
# ============= 对特征预排序 ============
X_idx_sorted = None
presort = self.presort
# Allow presort to be 'auto', which means True if the dataset is dense,
# otherwise it will be False.
if presort == 'auto' and issparse(X):
presort = False
elif presort == 'auto':
presort = True

if presort == True:
if issparse(X):
raise ValueError("Presorting is not supported for sparse matrices.")
else:
X_idx_sorted = np.asfortranarray(np.argsort(X, axis=0),
dtype=np.int32)
# ============= 训练 ========================
# fit the boosting stages
n_stages = self._fit_stages(X, y, y_pred, sample_weight, random_state,
begin_at_stage, monitor, X_idx_sorted)
# change shape of arrays after fit (early-stopping or additional ests)
if n_stages != self.estimators_.shape[0]:
self.estimators_ = self.estimators_[:n_stages]
self.train_score_ = self.train_score_[:n_stages]
if hasattr(self, 'oob_improvement_'):
self.oob_improvement_ = self.oob_improvement_[:n_stages]

return self

可以看到,代码主要分为四部分:

  • 检查输入
  • 参数初始化
  • 对特征预排序
  • 训练

前三部分不详细介绍了,我们主要说一下训练。训练其实调用的是_fit_stages()方法。我们再看看这个方法。

训练 _fit_stages()

上代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
def _fit_stages(self, X, y, y_pred, sample_weight, random_state,
begin_at_stage=0, monitor=None, X_idx_sorted=None):
"""Iteratively fits the stages.

For each stage it computes the progress (OOB, train score)
and delegates to ``_fit_stage``.
Returns the number of stages fit; might differ from ``n_estimators``
due to early stopping.
"""
# ============== 初始化 =============================
n_samples = X.shape[0]
do_oob = self.subsample < 1.0
sample_mask = np.ones((n_samples, ), dtype=np.bool)
n_inbag = max(1, int(self.subsample * n_samples))
loss_ = self.loss_

# Set min_weight_leaf from min_weight_fraction_leaf
if self.min_weight_fraction_leaf != 0. and sample_weight is not None:
min_weight_leaf = (self.min_weight_fraction_leaf *
np.sum(sample_weight))
else:
min_weight_leaf = 0.

if self.verbose:
verbose_reporter = VerboseReporter(self.verbose)
verbose_reporter.init(self, begin_at_stage)

X_csc = csc_matrix(X) if issparse(X) else None
X_csr = csr_matrix(X) if issparse(X) else None
# =========== 树迭代!=====================
# perform boosting iterations
i = begin_at_stage
for i in range(begin_at_stage, self.n_estimators):
# subsampling,采样
if do_oob:
sample_mask = _random_sample_mask(n_samples, n_inbag,
random_state)
# OOB score before adding this stage
old_oob_score = loss_(y[~sample_mask],
y_pred[~sample_mask],
sample_weight[~sample_mask])

# fit next stage of trees
y_pred = self._fit_stage(i, X, y, y_pred, sample_weight,
sample_mask, random_state, X_idx_sorted,
X_csc, X_csr)

# track deviance (= loss)
if do_oob:
self.train_score_[i] = loss_(y[sample_mask],
y_pred[sample_mask],
sample_weight[sample_mask])
self.oob_improvement_[i] = (
old_oob_score - loss_(y[~sample_mask],
y_pred[~sample_mask],
sample_weight[~sample_mask]))
else:
# no need to fancy index w/ no subsampling
self.train_score_[i] = loss_(y, y_pred, sample_weight)

if self.verbose > 0:
verbose_reporter.update(i, self)

if monitor is not None:
early_stopping = monitor(i, self, locals())
if early_stopping:
break
return i + 1

为了方便看,我们把迭代部分的主要代码抽出来:

1
2
3
4
5
6
7
8
9
for i in range(begin_at_stage, self.n_estimators):

# fit next stage of trees
y_pred = self._fit_stage(i, X, y, y_pred, sample_weight,
sample_mask, random_state, X_idx_sorted,
X_csc, X_csr)

self.train_score_[i] = loss_(y, y_pred, sample_weight)

我们可以看到在迭代的每一轮都有一个y_pred的预测值,并在下一轮将这个预测值y_pred再喂进去。而主要函数就是_fit_stage()。那么接下来我们看看这个东西。

训练 _fit_stage()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
def _fit_stage(self, i, X, y, y_pred, sample_weight, sample_mask,
random_state, X_idx_sorted, X_csc=None, X_csr=None):
"""Fit another stage of ``n_classes_`` trees to the boosting model. """

assert sample_mask.dtype == np.bool
loss = self.loss_
original_y = y
# k代表类别个数!
for k in range(loss.K):
if loss.is_multi_class:
y = np.array(original_y == k, dtype=np.float64)

residual = loss.negative_gradient(y, y_pred, k=k,
sample_weight=sample_weight)

# induce regression tree on residuals
tree = DecisionTreeRegressor(
criterion=self.criterion,
splitter='best',
max_depth=self.max_depth,
min_samples_split=self.min_samples_split,
min_samples_leaf=self.min_samples_leaf,
min_weight_fraction_leaf=self.min_weight_fraction_leaf,
min_impurity_decrease=self.min_impurity_decrease,
min_impurity_split=self.min_impurity_split,
max_features=self.max_features,
max_leaf_nodes=self.max_leaf_nodes,
random_state=random_state,
presort=self.presort)

if self.subsample < 1.0:
# no inplace multiplication!
sample_weight = sample_weight * sample_mask.astype(np.float64)

if X_csc is not None:
tree.fit(X_csc, residual, sample_weight=sample_weight,
check_input=False, X_idx_sorted=X_idx_sorted)
else:
tree.fit(X, residual, sample_weight=sample_weight,
check_input=False, X_idx_sorted=X_idx_sorted)

# update tree leaves
if X_csr is not None:
loss.update_terminal_regions(tree.tree_, X_csr, y, residual, y_pred,
sample_weight, sample_mask,
self.learning_rate, k=k)
else:
loss.update_terminal_regions(tree.tree_, X, y, residual, y_pred,
sample_weight, sample_mask,
self.learning_rate, k=k)

# add tree to ensemble
self.estimators_[i, k] = tree

return y_pred

为了方便起见,我们抽取出重要的几行:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
def _fit_stage(self, i, X, y, y_pred, sample_weight, sample_mask,
random_state, X_idx_sorted, X_csc=None, X_csr=None):
"""Fit another stage of ``n_classes_`` trees to the boosting model. """

assert sample_mask.dtype == np.bool
loss = self.loss_
original_y = y
for k in range(loss.K):
# ======== 计算损失函数的负梯度(残差)
residual = loss.negative_gradient(y, y_pred, k=k,
sample_weight=sample_weight)
# ======= 迭代一颗树
# induce regression tree on residuals
tree = DecisionTreeRegressor(
criterion=self.criterion,
splitter='best',
max_depth=self.max_depth,
min_samples_split=self.min_samples_split,
min_samples_leaf=self.min_samples_leaf,
min_weight_fraction_leaf=self.min_weight_fraction_leaf,
min_impurity_decrease=self.min_impurity_decrease,
min_impurity_split=self.min_impurity_split,
max_features=self.max_features,
max_leaf_nodes=self.max_leaf_nodes,
random_state=random_state,
presort=self.presort)

#=== 注意这里拟合目标是负梯度residual!
tree.fit(X, residual, sample_weight=sample_weight,
check_input=False, X_idx_sorted=X_idx_sorted)

loss.update_terminal_regions(tree.tree_, X, y, residual, y_pred,
sample_weight, sample_mask,
self.learning_rate, k=k)
# add tree to ensemble
self.estimators_[i, k] = tree
return y_pred

可以看到,就是很典型很典型的"梯度提升"过程啊!

  • 计算上一轮的残差residual
  • 迭代一颗新树,树的拟合目标y就是residual

嗯现在过程就很明朗了!接下来要解决的就是究竟如何生成单颗树了!那就让我们看看DecisionTreeRegressor类吧!

DecisionTreeRegressor类

大概看一眼这个类:

只有两个方法,而且方法里是直接调用的父类接口,那我们下面重点放在父类BaseDecisionTree类上!

BaseDecisionTree类

这个类写了一颗树是怎样训练的。

大概看一眼:

那么我们重点看fit。fit方法代码比较长,简短来看其实就是:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
def fit(self, X, y, sample_weight=None, check_input=True,
X_idx_sorted=None):

....省略很多代码....

# Build tree 建树

# criterion - 评价标准
criterion = CRITERIA_REG[self.criterion](self.n_outputs_,
n_samples)
# SPLITTERS - 分裂器
SPLITTERS = SPARSE_SPLITTERS if issparse(X) else DENSE_SPLITTERS

splitter = SPLITTERS[self.splitter](criterion,
self.max_features_,
min_samples_leaf,
min_weight_leaf,
random_state,
self.presort)
# 树
self.tree_ = Tree(self.n_features_, self.n_classes_, self.n_outputs_)


# 如果max_leaf_nodes没给定,就深度优先建树;如果给定了,就用BestFirst的策略建树
# Use BestFirst if max_leaf_nodes given; use DepthFirst otherwise
if max_leaf_nodes < 0:
builder = DepthFirstTreeBuilder(splitter, min_samples_split,
min_samples_leaf,
min_weight_leaf,
max_depth,
self.min_impurity_decrease,
min_impurity_split)
else:
builder = BestFirstTreeBuilder(splitter, min_samples_split,
min_samples_leaf,
min_weight_leaf,
max_depth,
max_leaf_nodes,
self.min_impurity_decrease,
min_impurity_split)

builder.build(self.tree_, X, y, sample_weight, X_idx_sorted)

self.n_classes_ = self.n_classes_[0]
self.classes_ = self.classes_[0]

return self

也就是说,在确定各种参数(评价标准、分裂标准、树生长方式等)后,用DF或BF方式生成树。我们主要看深度优先建树。

DepthFirstTreeBuilder类

大概扫一眼:

我们知道上面的代码调用了build()函数。

建树

那我们仔细看看这个,卧槽居然是空的,pass??? 不可以!中华民族的女人绝不服输!我去网上找找!

嗯原来这个是用Cython写的,在这里_tree.pyx

代码非常长。首先我们回顾一下深度优先的非递归写法,利用栈向左探底,再扔进去右。而这份代码也是遵从了这样的思路。总体架构:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# push root node onto stack
rc = stack.push(0, n_node_samples, 0, _TREE_UNDEFINED, 0, INFINITY, 0)
while not stack.is_empty():
# 栈顶弹出
stack.pop(&stack_record)

【根据stack_record计算当前节点是否应该继续分裂】

# 如果继续分裂,就把左右扔进去

if not is_leaf:
# Push right child on stack - 将左孩子扔进去
rc = stack.push(split.pos, end, depth + 1, node_id, 0,
split.impurity_right, n_constant_features)
# Push left child on stack - 将右孩子扔进去
rc = stack.push(start, split.pos, depth + 1, node_id, 1,
split.impurity_left, n_constant_features)

那么关键问题就是两个:

  1. 如何判断当前节点是否该继续分裂
  2. 分裂之后变成什么

我们一个个解决。

分裂条件

我们看一下分裂条件:

1
2
3
4
5
6
7
n_node_samples = end - start
splitter.node_reset(start, end, &weighted_n_node_samples)

is_leaf = (depth >= max_depth or
n_node_samples < min_samples_split or
n_node_samples < 2 * min_samples_leaf or
weighted_n_node_samples < 2 * min_weight_leaf)

如果当前节点是叶节点,那么它至少满足以下条件的其中之一:

  • 深度超了 —— depth >= max_depth
  • 样本数不够 —— n_node_samples < min_samples_split
  • 样本数不足以分裂 —— n_node_samples < 2 * min_samples_leaf
  • 样本的权重和小于阈值 —— weighted_n_node_samples < 2 * min_weight_leaf

分裂过程

1
2
3
4
5
if not is_leaf:
splitter.node_split(impurity, &split, &n_constant_features)
is_leaf = (is_leaf or split.pos >= end or
(split.improvement + EPSILON <
min_impurity_decrease))

上述代码表示,如果当前节点是叶节点,那么它至少满足以下条件的其中之一:

  • 分割位置超出范围 —— split.pos >= end
  • 本次分裂带来的信息增益提升不够大 —— split.improvement + EPSILON < min_impurity_decrease

那么重点就在于split如何计算的信息增益improvement了!我们看看。回退到之前BaseDecisionTree类的fit方法,我们看到分裂器是 SPLITTERS = SPARSE_SPLITTERS if issparse(X) else DENSE_SPLITTERS。也就是说如果数据是非稀疏的,那就是用DENSE_SPLITTERS来做分裂器。那么我们来看看DENSE_SPLITTERS吧。

1
2
DENSE_SPLITTERS = {"best": _splitter.BestSplitter,
"random": _splitter.RandomSplitter}

那么我们从BestSplitter入手。

BestSpliter类

源码:_splitter.pyx

主要看node_split()函数,就是在[start,end]之间寻找一个最优切分点。我把这个函数简化一下。首先我们定义几个feature的区间:

  • [:n_drawn_constant],已经用过了的连续(constant)特征。
  • [n_drawn_constant:n_known_constant],还没用过的连续特征。
  • [n_known_constant:n_total_constant],新发现的连续特征。
  • [n_total_constant:f_i],还没用过的非连续特征
  • [f_i:n_features]表示已经用过了的连续特征
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
cdef int node_split(self, double impurity, SplitRecord* split,
SIZE_t* n_constant_features) nogil except -1:
"""Find the best split on node samples[start:end]"""
初始化
f_i = n_features # 剩余的feature个数
while (f_i > n_total_constants and # 剩余feature个数过少时停止
(n_visited_features < max_features or
n_visited_features <= n_found_constants + n_drawn_constants)): # 至少有一个feature是非常数
# 随便选一个特征
f_j = rand_int(n_drawn_constants, f_i - n_found_constants,
random_state)
if f_j < n_known_constants:
# f_j是属于区间[n_drawn_constants, n_known_constants],也就是说是还没用过的连续特征
#把f_j放到[:n_drawn_constant]区间的末尾
tmp = features[f_j]
features[f_j] = features[n_drawn_constants]
features[n_drawn_constants] = tmp
n_drawn_constants += 1

else:
# f_j 在区间 [n_known_constants, f_i - n_found_constants],即f_j是新发现的连续特征
# 将f_j挪到n_total_constants, f_i]区间
f_j += n_found_constants
current.feature = features[f_j]
feature_offset = self.X_feature_stride * current.feature

排序工作,并将排序记录下来以便下次使用,省略代码

# 开始寻找分割区间
p = start

while p < end:
p += 1
if p < end:
current.pos = p
# Reject if min_samples_leaf is not guaranteed
# 如果当前分割点分割后,左右节点个数过少,就跳过
if (((current.pos - start) < min_samples_leaf) or
((end - current.pos) < min_samples_leaf)):
continue
# 计算当前节点分割点的评价标准
self.criterion.update(current.pos)

# Reject if min_weight_leaf is not satisfied
if ((self.criterion.weighted_n_left < min_weight_leaf) or
(self.criterion.weighted_n_right < min_weight_leaf)):
continue
current_proxy_improvement = self.criterion.proxy_impurity_improvement()
# 选择最好收益的作为当前分割点
if current_proxy_improvement > best_proxy_improvement:
best_proxy_improvement = current_proxy_improvement
# sum of halves is used to avoid infinite value
current.threshold = Xf[p - 1] / 2.0 + Xf[p] / 2.0

if ((current.threshold == Xf[p]) or
(current.threshold == INFINITY) or
(current.threshold == -INFINITY)):
current.threshold = Xf[p - 1]

best = current # copy

以上的步骤很明朗了。其实就是依次遍历每个特征的每个分裂点,找到一个最好的!

接下来我们来看看如何评价这个分裂点的好坏。

Criterion类

GBDT源码分析之二:决策树所述,这个类定义和实现了树结构的分裂策略。

_criterion.pyx里主要定义和实现了各种关于不纯度计算的类,包括(下表参考自机器学习搭便车指南–决策树):

Criterion接口

是一个抽象类。主要定义了如下接口:

  • update(new_pos):将分裂点从原来的pos移动到new_pos. 其实就是将[pos:new_pos]这些点从右节点移动到左节点上。
  • node_impurity():计算当前节点的不纯度
  • children_impurity(left,right):计算两个子节点的不纯度

实现了如下方法:

  • impurity_improvement():计算当前节点的不纯度缩减量。计算方式为: \[\frac{N_t}{N}*(Impurity - \frac{N_{tR}}{N_t}\times Impurity_{right} - \frac{N_{tL}}{N_t}\times Impurity_{left} )\] 其中,N是样本个数;N_t是当前节点的样本个数;N_tR 是右节点的样本个数,N_tL是左节点的样本个数
  • proxy_impurity_improvement():由于当前节点的不纯度是一个常数, 因此在比较不纯度缩减量时候有更加简单的方法, 这个方法在分裂过程中最常用, impurity_improvement仅当需要计算不纯度缩减绝对值时才用。其实就是以上公式的化简: \[- N_{tR}\times Impurity_{right} - N_{tL}\times Impurity_{left} \]

这里提一下Criterion表示分裂点和左右节点的方法。在Criterion类内置的变量里,有start、pos、end这样三个值,假设样本标签是samples(已排序的),则samples[start: pos]可代表分裂后的左子节点,samples[pos: end]可代表分裂后的右子节点。Criterion类在初始化时其实是空的,仅当构建树的时候才会被使用。

RegressionCriterion接口

是回归树的Criterion接口。主要实现了以下方法:

  • update(): 其实就是在更新上面公式的\(N_{tR}、N_{tL}\),即左右两边的样本个数,以及左右两边的样本标签y的和。

MSE 类

用均方误差实现不纯度impurity的定义。主要实现/重写了以下方法:

  • node_impurity():计算当前节点的不纯度。
  • proxy_impurity_improvement()
  • children_impurity()

总结

主要类 功能描述
abstract Criterion 定义了一系列接口:
Class ClassificationCriterion 继承 Criterion; 重写node_value方法, 适应分类树的计算
Class Entropy 继承ClassificationCriterion; 重写impurity各个方法, 符合entropy的定义
Class Gini 继承ClassificationCriterion;
Class RegressionCriterion 继承 Criterion; 重写node_value方法, 适应回归树的计算
class MSE 继承 RegressionCriterion主要实现上文提到回归树的不纯度计算方法

总结

Boosting:

单个树: