Distance structure
- class cellhint.distance.Distance(dist_mat: ndarray, cell: DataFrame, cell_type: DataFrame)[source]
Bases:
object
Class that deals with the cross-dataset cell-by-cell-type distance matrix.
- Parameters:
dist_mat – Cell-by-cell-type distance matrix.
cell – Cell meta-information including at least ‘dataset’, ‘ID’ and ‘cell_type’.
cell_type – Cell type meta-information including at least ‘dataset’ and ‘cell_type’.
- dist_mat
A cell-by-cell-type distance matrix.
- cell
Cell meta-information including ‘dataset’, ‘ID’ and ‘cell_type’.
- cell_type
Cell type meta-information including ‘dataset’ and ‘cell_type’.
- n_cell
Number of cells involved.
- n_cell_type
Number of cell types involved.
- shape
Tuple of number of cells and cell types.
- assignment
Assignment of each cell to the most similar cell type in each dataset (obtained through the assign method).
- assign() None [source]
Assign each cell to its most similar cell type in each dataset.
- Returns:
Modified object with the result of cell assignment added as .assignment.
- Return type:
None
- concatenate(*distances, by: str = 'cell', check: bool = False)[source]
Concatenate by either cells (rows) or cell types (columns).
- Parameters:
distances – A
Distance
object or a list of such objects.by – The direction of concatenation, joining either cells (‘cell’, rows) or cell types (‘cell_type’, columns). (Default: ‘cell’)
check – Check whether the concatenation is feasible. (Default: False)
- Returns:
A
Distance
object concatenated along cells (by = ‘cell’) or cell types (by = ‘cell_type’).- Return type:
- filter_cells(check_symmetry: bool = True) None [source]
Filter out cells whose gene expression profiles do not correlate most with the eigen cell they belong to (i.e., correlate most with other cell types).
- Parameters:
check_symmetry – Whether to check the symmetry of the distance matrix in terms of datasets and cell types. (Default: True)
- Returns:
A
Distance
object with undesirable cells filtered out.- Return type:
None
- static from_adata(adata: AnnData, dataset: str, cell_type: str, use_rep: str | None = None, metric: str | None = None, n_jobs: int | None = None, check_params: bool = True, **kwargs)[source]
Generate a
Distance
object from theAnnData
given.- Parameters:
adata – An
AnnData
object containing different datasets/batches and cell types. In most scenarios, the format of the expression .X in the AnnData is flexible (normalized, log-normalized, z-scaled, etc.). However, when use_rep is specified as ‘X’ (or X_pca is not detected in .obsm and no other latent representations are provided), .X should be log-normalized (to a constant total count per cell).dataset – Column name (key) of cell metadata specifying dataset information.
cell_type – Column name (key) of cell metadata specifying cell type information.
use_rep – Representation used to calculate distances. This can be ‘X’ or any representations stored in .obsm. Default to the PCA coordinates if present (if not, use the expression matrix X).
metric – Metric to calculate the distance between each cell and each cell type. Can be ‘euclidean’, ‘cosine’, ‘manhattan’ or any metrics applicable to
sklearn.metrics.pairwise_distances()
. Default to ‘euclidean’ if latent representations are used for calculating distances, and to ‘correlation’ if the expression matrix is used.n_jobs – Number of CPUs used. Default to one CPU. -1 means all CPUs are used.
check_params – Whether to check (or set the default) for dataset, cell_type, use_rep and metric. (Default: True)
**kwargs – Other keyword arguments passed to
sklearn.metrics.pairwise_distances()
.
- Returns:
A
Distance
object representing the cross-dataset cell-by-cell-type distance matrix.- Return type:
- normalize(Gaussian_kernel: bool = False, rank: bool = True, normalize: bool = True) None [source]
Normalize the distance matrix with a Gaussian kernel.
- Parameters:
Gaussian_kernel – Whether to apply the Gaussian kernel to the distance matrix. (Default: False)
rank – Whether to turn the matrix into a rank matrx. (Default: True)
normalize – Whether to maximum-normalize the distance matrix. (Default: True)
- Returns:
The
Distance
object modified with a normalized distance matrix.- Return type:
None
- symmetric() bool [source]
Check whether the distance matrix is symmetric in terms of datasets and cell types.
- Returns:
True or False indicating whether all datasets and cell types are included in the object (thus symmetric).
- Return type:
- to_binary(check_symmetry: bool = True)[source]
Turn the distance matrix into a binary matrix representing the estimated cell type membership across datasets.
- to_confusion(D1: str, D2: str, check: bool = True) tuple [source]
This function is deprecated. Use to_pairwise_confusion and to_multi_confusion instead. Extract the dataset1-by-dataset2 and dataset2-by-dataset1 confusion matrices. Note this function is expected to be applied to a binary membership matrix.
- Parameters:
D1 – Name of the first dataset.
D2 – Name of the second dataset.
check – Whether to check names of the two datasets are contained. (Default: True)
- Returns:
The dataset1-by-dataset2 and dataset2-by-dataset1 confusion matrices.
- Return type:
- to_meta(check_symmetry: bool = True, turn_binary: bool = False, return_symmetry: bool = True) DataFrame [source]
Meta-analysis of cross-dataset cell type dissimilarity or membership.
- Parameters:
check_symmetry – Whether to check the symmetry of the distance matrix in terms of datasets and cell types. (Default: True)
turn_binary – Whether to turn the distance matrix into a cell type membership matrix before meta analysis. (Default: False)
return_symmetry – Whether to return a symmetric dissimilarity matrix by averaging with its transposed form. (Default: True)
- Returns:
A
DataFrame
object representing the cell-type-level dissimilarity matrix (turn_binary = False) or membership matrix (turn_binary = True).- Return type:
- to_multi_confusion(relation: DataFrame, D: str, check: bool = True) tuple [source]
Extract the confusion matrices between meta-cell-types defined prior and cell types from a new dataset.
- Parameters:
relation – A
DataFrame
object representing the cell type harmonization result across multiple datasets.D – Name of the new dataset to be aligned.
check – Whether to check names of the datasets are contained. (Default: True)
- Returns:
The confusion matrices between meta-cell-types defined prior and cell types from a new dataset.
- Return type:
- to_pairwise_confusion(D1: str, D2: str, check: bool = True) tuple [source]
Extract the dataset1-by-dataset2 and dataset2-by-dataset1 confusion matrices.
- Parameters:
D1 – Name of the first dataset.
D2 – Name of the second dataset.
check – Whether to check names of the two datasets are contained. (Default: True)
- Returns:
The dataset1-by-dataset2 and dataset2-by-dataset1 confusion matrices.
- Return type: