Tree structure
- class cellhint.pct.PredictiveClusteringTree(*, max_depth: int | None = None, min_samples_split: int | float = 20, min_samples_leaf: int | float = 10, min_weight_fraction_leaf: float = 0.0, random_state: int | None = None, max_leaf_nodes: int | None = None, F_test_prune: bool = True, p_thres: float = 0.05)[source]
Bases:
DecisionTreeRegressorClass that uses predictive clustering trees (PCT) for multi-output prediction. Note this is a specialized PCT with the prototype as the mean vector and the distance measure as the (sum of) intra-cluster variance. Such a PCT is equivalent to CART regression tree (with specialized parameters and post-pruning).
- Parameters:
max_depth – Maximum possible depth of the tree, starting from the root node which has a depth of 0. Default to no limit.
min_samples_split – The minimum sample size (in absolute number or fraction) of a possible node. (Default: 20)
min_samples_leaf – The minimum sample size (in absolute number or fraction) of a possible leaf. (Default: 10)
min_weight_fraction_leaf – The minimum fraction out of total sample weights for a possible leaf. (Default: 0.0)
random_state – Random seed for column (feature) shuffling before selecting the best feature and threshold.
max_leaf_nodes – The maximum number of leaves, achieved by keeping high-quality (i.e., high impurity reduction) nodes. Default to no limit.
F_test_prune – Whether to use a F-test to prune the tree by removing unnecessary splits. (Default: True)
p_thres – p-value threshold for pruning nodes after F-test. (Default: 0.05)
- n_features_in_
Number of features.
- n_outputs_
Number of outputs.
- tree_
A
Treeobject structured by parallel arrays.
- p_value
F-test-based p-value for each node or leaf in the tree.
- F_test() None[source]
F test for each internal node. For each node, the corresponding F distribution has the degrees of freedom n_output * (n_sample - 1) and n_output * (n_sample - 2), and the value (q) of node_impurity * n_sample / (n_sample - 1) divided by (left_child_impurity * left_n_sample + right_child_impurity * right_n_sample) / (n_sample - 2).
- Returns:
Modified tree with F-test p-values. Leaves are assigned 1 constantly.
- Return type:
None
- fit(X, y, sample_weight=None) None[source]
Fit a PCT with the training dataset.
- Parameters:
X – Sample-by-feature array-like matrix.
y – Sample-by-output array-like matrix.
sample_weight – Sample weights. Default to equal sample weights.
- Returns:
Fitted and (possibly) pruned tree.
- Return type:
None
- is_node(index: int) bool[source]
Check whether a given index is a node.
- Parameters:
index – Index of the node/leaf in the arrays of the tree structure.
- Returns:
True or False indicating whether the given index is an internal node.
- Return type:
- prune_node(index: int) None[source]
Prune all descendents of a given node. This node will become a leaf.
- Parameters:
index – Index of the node/leaf in the tree structure.
- Returns:
Modified tree with all descendents of a given node pruned.
- Return type:
None
- prune_tree(p_thres: float = 0.05) None[source]
Prune a tree based on F-test p values.
- Parameters:
p_thres – p-value threshold to prune nodes. (Default: 0.05)
- Returns:
Modified tree with unnecessary splits removed.
- Return type:
None
- score(X, y, sample_weight=None) float[source]
Calculate the coefficient of determination between the prediction and truth. Different from multi-output problem where each output is calculated separately and the final R2 score is averaged across outputs, the score here is defined by considering each sample vector as a ‘real’ sample.
- Parameters:
X – Sample-by-feature query matrix.
y – Sample-by-output truth matrix.
sample_weight – Sample weights applied to squared distance of each sample.
- Returns:
Coefficient of determination.
- Return type: