Tree structure
- class cellhint.pct.PredictiveClusteringTree(*, max_depth: int | None = None, min_samples_split: int | float = 20, min_samples_leaf: int | float = 10, min_weight_fraction_leaf: float = 0.0, random_state: int | None = None, max_leaf_nodes: int | None = None, F_test_prune: bool = True, p_thres: float = 0.05)[source]
Bases:
DecisionTreeRegressor
Class that uses predictive clustering trees (PCT) for multi-output prediction. Note this is a specialized PCT with the prototype as the mean vector and the distance measure as the (sum of) intra-cluster variance. Such a PCT is equivalent to CART regression tree (with specialized parameters and post-pruning).
- Parameters:
max_depth – Maximum possible depth of the tree, starting from the root node which has a depth of 0. Default to no limit.
min_samples_split – The minimum sample size (in absolute number or fraction) of a possible node. (Default: 20)
min_samples_leaf – The minimum sample size (in absolute number or fraction) of a possible leaf. (Default: 10)
min_weight_fraction_leaf – The minimum fraction out of total sample weights for a possible leaf. (Default: 0.0)
random_state – Random seed for column (feature) shuffling before selecting the best feature and threshold.
max_leaf_nodes – The maximum number of leaves, achieved by keeping high-quality (i.e., high impurity reduction) nodes. Default to no limit.
F_test_prune – Whether to use a F-test to prune the tree by removing unnecessary splits. (Default: True)
p_thres – p-value threshold for pruning nodes after F-test. (Default: 0.05)
- n_features_in_
Number of features.
- n_outputs_
Number of outputs.
- tree_
A
Tree
object structured by parallel arrays.
- p_value
F-test-based p-value for each node or leaf in the tree.
- F_test() None [source]
F test for each internal node. For each node, the corresponding F distribution has the degrees of freedom n_output * (n_sample - 1) and n_output * (n_sample - 2), and the value (q) of node_impurity * n_sample / (n_sample - 1) divided by (left_child_impurity * left_n_sample + right_child_impurity * right_n_sample) / (n_sample - 2).
- Returns:
Modified tree with F-test p-values. Leaves are assigned 1 constantly.
- Return type:
None
- fit(X, y, sample_weight=None) None [source]
Fit a PCT with the training dataset.
- Parameters:
X – Sample-by-feature array-like matrix.
y – Sample-by-output array-like matrix.
sample_weight – Sample weights. Default to equal sample weights.
- Returns:
Fitted and (possibly) pruned tree.
- Return type:
None
- is_node(index: int) bool [source]
Check whether a given index is a node.
- Parameters:
index – Index of the node/leaf in the arrays of the tree structure.
- Returns:
True or False indicating whether the given index is an internal node.
- Return type:
- prune_node(index: int) None [source]
Prune all descendents of a given node. This node will become a leaf.
- Parameters:
index – Index of the node/leaf in the tree structure.
- Returns:
Modified tree with all descendents of a given node pruned.
- Return type:
None
- prune_tree(p_thres: float = 0.05) None [source]
Prune a tree based on F-test p values.
- Parameters:
p_thres – p-value threshold to prune nodes. (Default: 0.05)
- Returns:
Modified tree with unnecessary splits removed.
- Return type:
None
- score(X, y, sample_weight=None) float [source]
Calculate the coefficient of determination between the prediction and truth. Different from multi-output problem where each output is calculated separately and the final R2 score is averaged across outputs, the score here is defined by considering each sample vector as a ‘real’ sample.
- Parameters:
X – Sample-by-feature query matrix.
y – Sample-by-output truth matrix.
sample_weight – Sample weights applied to squared distance of each sample.
- Returns:
Coefficient of determination.
- Return type:
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') PredictiveClusteringTree
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- set_predict_request(*, check_input: bool | None | str = '$UNCHANGED$') PredictiveClusteringTree
Request metadata passed to the
predict
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed topredict
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it topredict
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') PredictiveClusteringTree
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.