Integration function

cellhint.integrate(adata: AnnData, batch: str, cell_type: str | None = None, use_rep: str | None = None, n_latent: int = 50, n_neighbors: int | None = None, n_meta_neighbors: int = 3, approx: bool = True, metric: str | function | DistanceMetric = 'euclidean', use_annoy: bool = True, annoy_n_trees: int = 10, pynndescent_n_neighbors: int = 30, pynndescent_random_state: int = 0, use_faiss: bool = True, set_op_mix_ratio: float = 1.0, local_connectivity: int = 1, trim: int | None = None, neighbor_random_state: int = 0, copy: bool = False) AnnData | None[source]

Cell type controlled k nearest neighbors. This is a variant of BBKNN by searching neighbors across matched cell groups in different batches. For a given cell belonging to cell type ‘c’, first determine the batches that contain ‘c’ and its neighboring cell types, and then in each batch, search nearest neighbors out of them.

  • adata – An AnnData object containing batch and cell type information in .obs, as well as latent space (e.g., ‘X_pca’) in .obsm.

  • batch – Column name (key) of cell metadata specifying batch information.

  • cell_type – Column name (key) of cell metadata specifying cell type information. Default to no cell type information provided (i.e., searching nearest neighbors in the entire batch space).

  • use_rep – Representation used to calculate distances. This can be any representations stored in .obsm. Default to the PCA coordinates (‘X_pca’) if present.

  • n_latent – Number of latent representations used. Default to min(50, number of available latent representations).

  • n_neighbors – Total number of nearest neighbors for each cell. This number will be contributed equally from batches that qualify. Default to max(15, n) where n is the number of batches times three, meaning that each qualified batch will provide at least 3 neighbors. For example, if one cell type exists exclusively in one batch, then this batch needs to provide 15 neighbors.

  • n_meta_neighbors – Total number of nearest meta neighbors for each cell type in each batch (calculated from cell centroids). The final nearest meta neighbors are the union across batches that contain this given cell type. The smaller this value, the stronger bonding of the same cell type. Setting to 1 will make each cell search nearest neighbors only in the cell type it belongs to (i.e., forcibly clustering the same cell types). (Default: 3)

  • approx – Whether to use fast approximate neighbor finding (annoy or pyNNDescent). (Default: True)

  • metric – Distance metric to use. (Default: ‘euclidean’)

  • use_annoy – Whether to use annoy for neighbor finding when approx = True. Setting use_annoy = False will use pyNNDescent instead. (Default: True)

  • annoy_n_trees – Number of trees to construct in the annoy forest when approx = True and use_annoy = True. (Default: 10)

  • pynndescent_n_neighbors – Number of neighbors to include in the approximate neighbor graph when approx = True and use_annoy = False. (Default: 30)

  • pynndescent_random_state – Random seed to use in pyNNDescent when approx = True and use_annoy = False. (Default: 0)

  • use_faiss – Whether to use the faiss package to compute nearest neighbors if installed when approx = False and metric = ‘euclidean’. (Default: True)

  • set_op_mix_ratio – Float between 0 and 1 controlling the blend between a connectivity matrix formed exclusively from mutual nearest neighbor pairs (0) and a union of all observed neighbor relationships with the mutual pairs emphasized (1). (Default: 1.0)

  • local_connectivity – UMAP connectivity computation parameter controlling how many nearest neighbors of each cell are assumed to be fully connected (with a connectivity value of 1). (Default: 1)

  • trim – Trim each cell to top trim connectivities. May help with population independence and improve the tidiness of clustering. Default to n_neighbors*10. Set to 0 to skip trimming.

  • neighbor_random_state – Random seed to use in assigning the remainder neighbors to batches. For example, assigning 10 nearest neighbors to 3 batches will make one remainder neighbor randomly assigned to one of the three batches. (Default: 0)

  • copy – Whether to copy the adata or modify in-place. (Default: False)


Depending on copy, return an updated or copied AnnData object with neighborhood graph included.

Return type:

Union[AnnData, None]