本次我们探究一下sk-learn的GBDT实现。

先从调包开始说起，一般来说我们是这样调用GBDT的：

import numpy as np
from sklearn.ensemble import GradientBoostingRegressor
# 初始化模型
gbdt=GradientBoostingRegressor(
      loss='ls'
    , learning_rate=0.1
    , n_estimators=100
    , subsample=1
    , min_samples_split=2
    , min_samples_leaf=1
    , max_depth=3
    , init=None
    , random_state=None
    , max_features=None
    , alpha=0.9
    , verbose=0
    , max_leaf_nodes=None
    , warm_start=False
)
# 训练
X = np.array([[1,2,3,4],[2,3,4,5],[5,6,7,8]])
y= np.array([1,0,1])
gbdt.fit(X,y)

# 预测
X_test = np.array([[1,2,3,4],[2,3,4,5],[5,6,7,8]])
pred=gbdt.predict(X_test)

for i in range(pred.shape[0]):
    print (pred[i],test_id[i])

我们首先看一下GradientBoostingRegressor()这个类

GradientBoostingRegressor类

大概看一下这个类：

我们发现这个类只是实现了一个三个预测接口，并没有fit()方法。说明fit()方法完全是父类的。

接下来粗略看看这几种方法。

初始化`init()`

GradientBoostingRegressor()类的初始化函数如下所示：

class GradientBoostingRegressor(BaseGradientBoosting, RegressorMixin):
    def __init__(self, loss='ls', learning_rate=0.1, n_estimators=100,
                 subsample=1.0, criterion='friedman_mse', min_samples_split=2,
                 min_samples_leaf=1, min_weight_fraction_leaf=0.,
                 max_depth=3, min_impurity_decrease=0.,
                 min_impurity_split=None, init=None, random_state=None,
                 max_features=None, alpha=0.9, verbose=0, max_leaf_nodes=None,
                 warm_start=False, presort='auto'):

        super(GradientBoostingRegressor, self).__init__(
            loss=loss, learning_rate=learning_rate, n_estimators=n_estimators,
            criterion=criterion, min_samples_split=min_samples_split,
            min_samples_leaf=min_samples_leaf,
            min_weight_fraction_leaf=min_weight_fraction_leaf,
            max_depth=max_depth, init=init, subsample=subsample,
            max_features=max_features,
            min_impurity_decrease=min_impurity_decrease,
            min_impurity_split=min_impurity_split,
            random_state=random_state, alpha=alpha, verbose=verbose,
            max_leaf_nodes=max_leaf_nodes, warm_start=warm_start,
            presort=presort)

我们可以看到，GradientBoostingRegressor()这个类继承自BaseGradientBoosting()类，并且初始化函数只调用了父类BaseGradientBoosting()的初始化函数。

在下一大节我们会详细看BaseGradientBoosting()。

预测`predict()`

预测函数如下所示：

def predict(self, X):
    """Predict regression target for X.

    Parameters
    ----------
    X : array-like or sparse matrix, shape = [n_samples, n_features]
        The input samples. Internally, it will be converted to
        ``dtype=np.float32`` and if a sparse matrix is provided
        to a sparse ``csr_matrix``.

    Returns
    -------
    y : array of shape = [n_samples]
        The predicted values.
    """
    X = check_array(X, dtype=DTYPE, order="C",  accept_sparse='csr')
    return self._decision_function(X).ravel()

我们可以看到预测函数只是调用了父类的_decision_function(X)。

以上大概看了一下，发现主要内容还是在父类BaseGradientBoosting中。因此我们接下来详细看BaseGradientBoosting。

BaseGradientBoosting类

大概看一眼这个类：

好多方法哦。我们还是只看看fit方法吧。

训练 `fit()`

直接上代码：

  def fit(self, X, y, sample_weight=None, monitor=None):
      """Fit the gradient boosting model.
      """
      # ========== 检查输入是否符合要求 =================
      # Check input
      X, y = check_X_y(X, y, accept_sparse=['csr', 'csc', 'coo'], dtype=DTYPE)
      n_samples, self.n_features_ = X.shape
      if sample_weight is None:
          sample_weight = np.ones(n_samples, dtype=np.float32)
      else:
          sample_weight = column_or_1d(sample_weight, warn=True)

      check_consistent_length(X, y, sample_weight)

      y = self._validate_y(y)

      random_state = check_random_state(self.random_state)
      self._check_params()
# ============ 初始化 ================
      if not self._is_initialized():
          # init state
          self._init_state()

          # fit initial model - FIXME make sample_weight optional
          self.init_.fit(X, y, sample_weight)

          # init predictions
          y_pred = self.init_.predict(X)
          begin_at_stage = 0
      else:
          # add more estimators to fitted model
          # invariant: warm_start = True
          if self.n_estimators < self.estimators_.shape[0]:
              raise ValueError('n_estimators=%d must be larger or equal to '
                               'estimators_.shape[0]=%d when '
                               'warm_start==True'
                               % (self.n_estimators,
                                  self.estimators_.shape[0]))
          begin_at_stage = self.estimators_.shape[0]
          y_pred = self._decision_function(X)
          self._resize_state()
  # ============= 对特征预排序 ============
      X_idx_sorted = None
      presort = self.presort
      # Allow presort to be 'auto', which means True if the dataset is dense,
      # otherwise it will be False.
      if presort == 'auto' and issparse(X):
          presort = False
      elif presort == 'auto':
          presort = True

      if presort == True:
          if issparse(X):
              raise ValueError("Presorting is not supported for sparse matrices.")
          else:
              X_idx_sorted = np.asfortranarray(np.argsort(X, axis=0),
                                               dtype=np.int32)
   # ============= 训练 ========================
      # fit the boosting stages
      n_stages = self._fit_stages(X, y, y_pred, sample_weight, random_state,
                                  begin_at_stage, monitor, X_idx_sorted)
      # change shape of arrays after fit (early-stopping or additional ests)
      if n_stages != self.estimators_.shape[0]:
          self.estimators_ = self.estimators_[:n_stages]
          self.train_score_ = self.train_score_[:n_stages]
          if hasattr(self, 'oob_improvement_'):
              self.oob_improvement_ = self.oob_improvement_[:n_stages]

      return self

可以看到，代码主要分为四部分：

检查输入
参数初始化
对特征预排序
训练

前三部分不详细介绍了，我们主要说一下训练。训练其实调用的是_fit_stages()方法。我们再看看这个方法。

训练 `_fit_stages()`

上代码：

def _fit_stages(self, X, y, y_pred, sample_weight, random_state,
                begin_at_stage=0, monitor=None, X_idx_sorted=None):
    """Iteratively fits the stages.

    For each stage it computes the progress (OOB, train score)
    and delegates to ``_fit_stage``.
    Returns the number of stages fit; might differ from ``n_estimators``
    due to early stopping.
    """
    # ============== 初始化 =============================
    n_samples = X.shape[0]
    do_oob = self.subsample < 1.0
    sample_mask = np.ones((n_samples, ), dtype=np.bool)
    n_inbag = max(1, int(self.subsample * n_samples))
    loss_ = self.loss_

    # Set min_weight_leaf from min_weight_fraction_leaf
    if self.min_weight_fraction_leaf != 0. and sample_weight is not None:
        min_weight_leaf = (self.min_weight_fraction_leaf *
                           np.sum(sample_weight))
    else:
        min_weight_leaf = 0.

    if self.verbose:
        verbose_reporter = VerboseReporter(self.verbose)
        verbose_reporter.init(self, begin_at_stage)

    X_csc = csc_matrix(X) if issparse(X) else None
    X_csr = csr_matrix(X) if issparse(X) else None
    # =========== 树迭代！=====================
    # perform boosting iterations
    i = begin_at_stage
    for i in range(begin_at_stage, self.n_estimators):
        # subsampling，采样
        if do_oob:
            sample_mask = _random_sample_mask(n_samples, n_inbag,
                                              random_state)
            # OOB score before adding this stage
            old_oob_score = loss_(y[~sample_mask],
                                  y_pred[~sample_mask],
                                  sample_weight[~sample_mask])

        # fit next stage of trees
        y_pred = self._fit_stage(i, X, y, y_pred, sample_weight,
                                 sample_mask, random_state, X_idx_sorted,
                                 X_csc, X_csr)

        # track deviance (= loss)
        if do_oob:
            self.train_score_[i] = loss_(y[sample_mask],
                                         y_pred[sample_mask],
                                         sample_weight[sample_mask])
            self.oob_improvement_[i] = (
                old_oob_score - loss_(y[~sample_mask],
                                      y_pred[~sample_mask],
                                      sample_weight[~sample_mask]))
        else:
            # no need to fancy index w/ no subsampling
            self.train_score_[i] = loss_(y, y_pred, sample_weight)

        if self.verbose > 0:
            verbose_reporter.update(i, self)

        if monitor is not None:
            early_stopping = monitor(i, self, locals())
            if early_stopping:
                break
    return i + 1

为了方便看，我们把迭代部分的主要代码抽出来：

for i in range(begin_at_stage, self.n_estimators):

    # fit next stage of trees
    y_pred = self._fit_stage(i, X, y, y_pred, sample_weight,
                             sample_mask, random_state, X_idx_sorted,
                             X_csc, X_csr)

   self.train_score_[i] = loss_(y, y_pred, sample_weight)

我们可以看到在迭代的每一轮都有一个y_pred的预测值，并在下一轮将这个预测值y_pred再喂进去。而主要函数就是_fit_stage()。那么接下来我们看看这个东西。

训练 `_fit_stage()`

def _fit_stage(self, i, X, y, y_pred, sample_weight, sample_mask,
                random_state, X_idx_sorted, X_csc=None, X_csr=None):
     """Fit another stage of ``n_classes_`` trees to the boosting model. """

     assert sample_mask.dtype == np.bool
     loss = self.loss_
     original_y = y
     # k代表类别个数！
     for k in range(loss.K):
         if loss.is_multi_class:
             y = np.array(original_y == k, dtype=np.float64)

         residual = loss.negative_gradient(y, y_pred, k=k,
                                           sample_weight=sample_weight)

         # induce regression tree on residuals
         tree = DecisionTreeRegressor(
             criterion=self.criterion,
             splitter='best',
             max_depth=self.max_depth,
             min_samples_split=self.min_samples_split,
             min_samples_leaf=self.min_samples_leaf,
             min_weight_fraction_leaf=self.min_weight_fraction_leaf,
             min_impurity_decrease=self.min_impurity_decrease,
             min_impurity_split=self.min_impurity_split,
             max_features=self.max_features,
             max_leaf_nodes=self.max_leaf_nodes,
             random_state=random_state,
             presort=self.presort)

         if self.subsample < 1.0:
             # no inplace multiplication!
             sample_weight = sample_weight * sample_mask.astype(np.float64)

         if X_csc is not None:
             tree.fit(X_csc, residual, sample_weight=sample_weight,
                      check_input=False, X_idx_sorted=X_idx_sorted)
         else:
             tree.fit(X, residual, sample_weight=sample_weight,
                      check_input=False, X_idx_sorted=X_idx_sorted)

         # update tree leaves
         if X_csr is not None:
             loss.update_terminal_regions(tree.tree_, X_csr, y, residual, y_pred,
                                          sample_weight, sample_mask,
                                          self.learning_rate, k=k)
         else:
             loss.update_terminal_regions(tree.tree_, X, y, residual, y_pred,
                                          sample_weight, sample_mask,
                                          self.learning_rate, k=k)

         # add tree to ensemble
         self.estimators_[i, k] = tree

     return y_pred

为了方便起见，我们抽取出重要的几行：

def _fit_stage(self, i, X, y, y_pred, sample_weight, sample_mask,
                random_state, X_idx_sorted, X_csc=None, X_csr=None):
     """Fit another stage of ``n_classes_`` trees to the boosting model. """

     assert sample_mask.dtype == np.bool
     loss = self.loss_
     original_y = y
     for k in range(loss.K):
         # ======== 计算损失函数的负梯度（残差）
         residual = loss.negative_gradient(y, y_pred, k=k,
                                           sample_weight=sample_weight)
         # ======= 迭代一颗树
         # induce regression tree on residuals
         tree = DecisionTreeRegressor(
             criterion=self.criterion,
             splitter='best',
             max_depth=self.max_depth,
             min_samples_split=self.min_samples_split,
             min_samples_leaf=self.min_samples_leaf,
             min_weight_fraction_leaf=self.min_weight_fraction_leaf,
             min_impurity_decrease=self.min_impurity_decrease,
             min_impurity_split=self.min_impurity_split,
             max_features=self.max_features,
             max_leaf_nodes=self.max_leaf_nodes,
             random_state=random_state,
             presort=self.presort)

         #=== 注意这里拟合目标是负梯度residual！
         tree.fit(X, residual, sample_weight=sample_weight,
                      check_input=False, X_idx_sorted=X_idx_sorted)

         loss.update_terminal_regions(tree.tree_, X, y, residual, y_pred,
                                          sample_weight, sample_mask,
                                          self.learning_rate, k=k)
         # add tree to ensemble
         self.estimators_[i, k] = tree
     return y_pred

可以看到，就是很典型很典型的"梯度提升"过程啊！

计算上一轮的残差residual
迭代一颗新树，树的拟合目标y就是residual

嗯现在过程就很明朗了！接下来要解决的就是究竟如何生成单颗树了！那就让我们看看DecisionTreeRegressor类吧！

DecisionTreeRegressor类

大概看一眼这个类：

只有两个方法，而且方法里是直接调用的父类接口，那我们下面重点放在父类BaseDecisionTree类上！

BaseDecisionTree类

这个类写了一颗树是怎样训练的。

大概看一眼：

那么我们重点看fit。fit方法代码比较长，简短来看其实就是：

def fit(self, X, y, sample_weight=None, check_input=True,
        X_idx_sorted=None):

  ....省略很多代码....

    # Build tree 建树

    # criterion - 评价标准
    criterion = CRITERIA_REG[self.criterion](self.n_outputs_,
                                                     n_samples)
    # SPLITTERS - 分裂器
    SPLITTERS = SPARSE_SPLITTERS if issparse(X) else DENSE_SPLITTERS

    splitter = SPLITTERS[self.splitter](criterion,
                                            self.max_features_,
                                            min_samples_leaf,
                                            min_weight_leaf,
                                            random_state,
                                            self.presort)
    # 树
    self.tree_ = Tree(self.n_features_, self.n_classes_, self.n_outputs_)


    # 如果max_leaf_nodes没给定，就深度优先建树；如果给定了，就用BestFirst的策略建树
    # Use BestFirst if max_leaf_nodes given; use DepthFirst otherwise
    if max_leaf_nodes < 0:
        builder = DepthFirstTreeBuilder(splitter, min_samples_split,
                                        min_samples_leaf,
                                        min_weight_leaf,
                                        max_depth,
                                        self.min_impurity_decrease,
                                        min_impurity_split)
    else:
        builder = BestFirstTreeBuilder(splitter, min_samples_split,
                                       min_samples_leaf,
                                       min_weight_leaf,
                                       max_depth,
                                       max_leaf_nodes,
                                       self.min_impurity_decrease,
                                       min_impurity_split)
 
    builder.build(self.tree_, X, y, sample_weight, X_idx_sorted)

    self.n_classes_ = self.n_classes_[0]
    self.classes_ = self.classes_[0]

    return self

也就是说，在确定各种参数（评价标准、分裂标准、树生长方式等）后，用DF或BF方式生成树。我们主要看深度优先建树。

DepthFirstTreeBuilder类

大概扫一眼：

我们知道上面的代码调用了build()函数。

建树

那我们仔细看看这个，卧槽居然是空的，pass??? 不可以！中华民族的女人绝不服输！我去网上找找！

嗯原来这个是用Cython写的，在这里_tree.pyx

代码非常长。首先我们回顾一下深度优先的非递归写法，利用栈向左探底，再扔进去右。而这份代码也是遵从了这样的思路。总体架构：

# push root node onto stack
rc = stack.push(0, n_node_samples, 0, _TREE_UNDEFINED, 0, INFINITY, 0)
while not stack.is_empty():
    # 栈顶弹出
    stack.pop(&stack_record)
    
    【根据stack_record计算当前节点是否应该继续分裂】
    
    # 如果继续分裂，就把左右扔进去
    
	if not is_leaf:
        # Push right child on stack - 将左孩子扔进去
        rc = stack.push(split.pos, end, depth + 1, node_id, 0,
                        split.impurity_right, n_constant_features)
        # Push left child on stack - 将右孩子扔进去
        rc = stack.push(start, split.pos, depth + 1, node_id, 1,
                            split.impurity_left, n_constant_features)

那么关键问题就是两个：

如何判断当前节点是否该继续分裂
分裂之后变成什么

我们一个个解决。

分裂条件

我们看一下分裂条件：

n_node_samples = end - start
splitter.node_reset(start, end, &weighted_n_node_samples)

is_leaf = (depth >= max_depth or
               n_node_samples < min_samples_split or
               n_node_samples < 2 * min_samples_leaf or
               weighted_n_node_samples < 2 * min_weight_leaf)

如果当前节点是叶节点，那么它至少满足以下条件的其中之一：

深度超了 —— depth >= max_depth
样本数不够 —— n_node_samples < min_samples_split
样本数不足以分裂 —— n_node_samples < 2 * min_samples_leaf
样本的权重和小于阈值 —— weighted_n_node_samples < 2 * min_weight_leaf

分裂过程

if not is_leaf:
    splitter.node_split(impurity, &split, &n_constant_features)
    is_leaf = (is_leaf or split.pos >= end or
               (split.improvement + EPSILON <
                min_impurity_decrease))

上述代码表示，如果当前节点是叶节点，那么它至少满足以下条件的其中之一：

分割位置超出范围 —— split.pos >= end
本次分裂带来的信息增益提升不够大 —— split.improvement + EPSILON < min_impurity_decrease

那么重点就在于split如何计算的信息增益improvement了！我们看看。回退到之前BaseDecisionTree类的fit方法，我们看到分裂器是 SPLITTERS = SPARSE_SPLITTERS if issparse(X) else DENSE_SPLITTERS。也就是说如果数据是非稀疏的，那就是用DENSE_SPLITTERS来做分裂器。那么我们来看看DENSE_SPLITTERS吧。

1 2	DENSE_SPLITTERS = {"best": _splitter.BestSplitter, "random": _splitter.RandomSplitter}

那么我们从BestSplitter入手。

BestSpliter类

源码：_splitter.pyx

主要看node_split()函数，就是在[start,end]之间寻找一个最优切分点。我把这个函数简化一下。首先我们定义几个feature的区间：

[:n_drawn_constant]，已经用过了的连续（constant）特征。
[n_drawn_constant:n_known_constant]，还没用过的连续特征。
[n_known_constant:n_total_constant]，新发现的连续特征。
[n_total_constant:f_i],还没用过的非连续特征
[f_i:n_features]表示已经用过了的连续特征

cdef int node_split(self, double impurity, SplitRecord* split,
                    SIZE_t* n_constant_features) nogil except -1:
    """Find the best split on node samples[start:end]"""
    初始化
    f_i = n_features # 剩余的feature个数
    while (f_i > n_total_constants and  # 剩余feature个数过少时停止
       (n_visited_features < max_features or 
        n_visited_features <= n_found_constants + n_drawn_constants)): # 至少有一个feature是非常数
	    # 随便选一个特征
        f_j = rand_int(n_drawn_constants, f_i - n_found_constants,
                           random_state)
        if f_j < n_known_constants:
            # f_j是属于区间[n_drawn_constants, n_known_constants]，也就是说是还没用过的连续特征
            #把f_j放到[:n_drawn_constant]区间的末尾
            tmp = features[f_j]
            features[f_j] = features[n_drawn_constants]
            features[n_drawn_constants] = tmp
            n_drawn_constants += 1

         else:
            # f_j 在区间 [n_known_constants, f_i - n_found_constants]，即f_j是新发现的连续特征
            # 将f_j挪到n_total_constants, f_i]区间
            f_j += n_found_constants
            current.feature = features[f_j]
            feature_offset = self.X_feature_stride * current.feature
            
            排序工作，并将排序记录下来以便下次使用，省略代码
            
            # 开始寻找分割区间
            p = start

            while p < end:
                p += 1
                if p < end:
                    current.pos = p
                    # Reject if min_samples_leaf is not guaranteed
                    # 如果当前分割点分割后，左右节点个数过少，就跳过
                    if (((current.pos - start) < min_samples_leaf) or
                        ((end - current.pos) < min_samples_leaf)):
                        continue
					  # 计算当前节点分割点的评价标准
                        self.criterion.update(current.pos)

                        # Reject if min_weight_leaf is not satisfied
                        if ((self.criterion.weighted_n_left < min_weight_leaf) or
                            (self.criterion.weighted_n_right < min_weight_leaf)):
                            continue
                        current_proxy_improvement = self.criterion.proxy_impurity_improvement()
				       # 选择最好收益的作为当前分割点
                        if current_proxy_improvement > best_proxy_improvement:
                            best_proxy_improvement = current_proxy_improvement
                            # sum of halves is used to avoid infinite value
                            current.threshold = Xf[p - 1] / 2.0 + Xf[p] / 2.0

                            if ((current.threshold == Xf[p]) or
                                (current.threshold == INFINITY) or
                                (current.threshold == -INFINITY)):
                                current.threshold = Xf[p - 1]

                                best = current  # copy

以上的步骤很明朗了。其实就是依次遍历每个特征的每个分裂点，找到一个最好的！

接下来我们来看看如何评价这个分裂点的好坏。

Criterion类

GBDT源码分析之二：决策树所述，这个类定义和实现了树结构的分裂策略。

_criterion.pyx里主要定义和实现了各种关于不纯度计算的类，包括（下表参考自机器学习搭便车指南–决策树）：

Criterion接口

是一个抽象类。主要定义了如下接口：

update(new_pos)：将分裂点从原来的pos移动到new_pos. 其实就是将[pos:new_pos]这些点从右节点移动到左节点上。
node_impurity()：计算当前节点的不纯度
children_impurity(left,right)：计算两个子节点的不纯度

实现了如下方法：

impurity_improvement()：计算当前节点的不纯度缩减量。计算方式为： \[\frac{N_t}{N}*(Impurity - \frac{N_{tR}}{N_t}\times Impurity_{right} - \frac{N_{tL}}{N_t}\times Impurity_{left} )\] 其中，N是样本个数；N_t是当前节点的样本个数；N_tR 是右节点的样本个数，N_tL是左节点的样本个数
proxy_impurity_improvement()：由于当前节点的不纯度是一个常数, 因此在比较不纯度缩减量时候有更加简单的方法, 这个方法在分裂过程中最常用, impurity_improvement仅当需要计算不纯度缩减绝对值时才用。其实就是以上公式的化简： \[- N_{tR}\times Impurity_{right} - N_{tL}\times Impurity_{left} \]

这里提一下Criterion表示分裂点和左右节点的方法。在Criterion类内置的变量里，有start、pos、end这样三个值，假设样本标签是samples（已排序的），则samples[start: pos]可代表分裂后的左子节点，samples[pos: end]可代表分裂后的右子节点。Criterion类在初始化时其实是空的，仅当构建树的时候才会被使用。

RegressionCriterion接口

是回归树的Criterion接口。主要实现了以下方法：

update(): 其实就是在更新上面公式的\(N_{tR}、N_{tL}\)，即左右两边的样本个数，以及左右两边的样本标签y的和。

MSE 类

用均方误差实现不纯度impurity的定义。主要实现/重写了以下方法：

node_impurity()：计算当前节点的不纯度。
proxy_impurity_improvement()
children_impurity()

总结

主要类	功能描述
abstract Criterion	定义了一系列接口：
Class ClassificationCriterion	继承 Criterion; 重写node_value方法, 适应分类树的计算
Class Entropy	继承ClassificationCriterion; 重写impurity各个方法, 符合entropy的定义
Class Gini	继承ClassificationCriterion;
Class RegressionCriterion	继承 Criterion; 重写node_value方法, 适应回归树的计算
class MSE	继承 RegressionCriterion主要实现上文提到回归树的不纯度计算方法

总结

Boosting:

单个树：

甲乙小朋友的房子

机器学习算法-sklearn GBDT 源码浅读

GradientBoostingRegressor类

初始化`init()`

预测`predict()`

BaseGradientBoosting类

训练 `fit()`

训练 `_fit_stages()`

训练 `_fit_stage()`

DecisionTreeRegressor类

BaseDecisionTree类

DepthFirstTreeBuilder类

建树

分裂条件

分裂过程

BestSpliter类

Criterion类

Criterion接口

RegressionCriterion接口

MSE 类

总结

总结

GradientBoostingRegressor类

初始化__init__()

预测predict()

BaseGradientBoosting类

训练 fit()

训练 _fit_stages()

训练 _fit_stage()

DecisionTreeRegressor类

BaseDecisionTree类

DepthFirstTreeBuilder类

建树

分裂条件

分裂过程

BestSpliter类

Criterion类

Criterion接口

RegressionCriterion接口

MSE 类

总结

总结

初始化`init()`

预测`predict()`

训练 `fit()`

训练 `_fit_stages()`

训练 `_fit_stage()`