Self-match function

cellhint.selfmatch(X: AnnData | DataFrame, columns: list | tuple | ndarray | Series | Index, calculate_distance: bool = False, use_rep: str | None = None, metric: str | None = None, normalize: bool = True, Gaussian_kernel: bool = False, minimum_unique_percents: list | tuple | ndarray | Series | Index | float = (0.4, 0.5, 0.6, 0.7, 0.8), minimum_divide_percents: list | tuple | ndarray | Series | Index | float = (0.1, 0.15, 0.2), reannotate: bool = True, prefix: str = '') DistanceAlignment[source]

Match different versions of cell type annotations (e.g., different resolutions of clustering) for cells from a single dataset.

Parameters:
  • X – An AnnData or DataFrame object containing information of different cell type annotations as multiple columns of cell metadata.

  • columns – Column names (keys) of cell metadata representing cell type annotations or clusterings.

  • calculate_distance – Whether to calculate the cell-by-cell-type distance matrix. This is usually not necessary as all annotations are in place for a single dataset. (Default: False)

  • use_rep – Representation used to calculate distances. This can be ‘X’ or any representations stored in .obsm. This argument will be ignored when calculate_distance = False (the default). Default to the PCA coordinates if present (if not, use the expression matrix X).

  • metric – Metric to calculate the distance between each cell and each cell type. Can be ‘euclidean’, ‘cosine’, ‘manhattan’ or any metrics applicable to sklearn.metrics.pairwise_distances(). This argument will be ignored when calculate_distance = False (the default). Default to ‘euclidean’ if latent representations are used for calculating distances, and to ‘correlation’ if the expression matrix is used.

  • normalize – Whether to normalize the distance matrix. This argument will be ignored when calculate_distance = False (the default). (Default: True)

  • Gaussian_kernel – Whether to apply the Gaussian kernel to the distance matrix. This argument will be ignored when calculate_distance = False (the default). (Default: False)

  • minimum_unique_percents – The minimum cell assignment fraction(s) to claim two cell types as uniquely matched. By default, five values will be tried (0.4, 0.5, 0.6, 0.7, 0.8) to find the one that produces least alignments in each harmonization iteration.

  • minimum_divide_percents – The minimum cell assignment fraction(s) to claim a cell type as divisible into two or more cell types. By default, three values will be tried (0.1, 0.15, 0.2) to find the one that produces least alignments in each harmonization iteration.

  • reannotate – Whether to reannotate cells into harmonized cell types. (Default: True)

  • prefix – Column prefix for the reannotation data frame.

Returns:

A DistanceAlignment object. Four important attributes within this class are: 1) base_distance, within-dataset distances between all cells and all cell types. 2) relation, the harmonization table. 3) groups, high-hierarchy cell types categorizing rows of the harmonization table. 4) reannotation, reannotated cell types and cell type groups.

Return type:

DistanceAlignment