Tools#

scpca.utils.data.state_diff(adata, model_key, states, factor, sign=1.0, variable='W', highest=10, lowest=0, ascending=False, threshold=1.96)#

Compute the dfference between two state loadings (logits) of a specified factor.

Parameters
  • adata (AnnData) – Annotated data matrix.

  • model_key (str) – Key to access the model in the adata object.

  • states (Union[List[str], Tuple[str, str], str]) – List containing two states for comparison. If a single str is provided the base state is assumed to be ‘Intercept’.

  • factor (int) – Factor index to consider for the diff calculation.

  • sign (Union[int, float]) – Sign to adjust the difference, either -1.0 or 1.0, by default 1.0.

  • variable (str) – Vector key to access in the model, by default “W”.

  • highest (int) – Number of highest diff genes to retrieve, by default 10.

  • lowest (int) – Number of lowest diff genes to retrieve, by default 0.

  • ascending (bool) – Whether to sort the results in ascending order, by default False.

  • threshold (float) – Threshold for significance, by default 1.96.

Returns

DataFrame containing differential genes, their magnitudes, differences, types, states, factors, indices, and significance.

Return type

pd.DataFrame

Notes

This function computes the differential genes between two states based on a given model. It first validates the sign, retrieves the model design, and computes the difference between the two states for a given factor. The function then retrieves the gene indices based on the highest and lowest differences and constructs a DataFrame with the results.

scpca.utils.data.state_loadings(adata, model_key, state, factor, sign=1.0, variable='W', highest=10, lowest=0, ascending=False)#

Extract and order genes from an AnnData object based on their loading weights.

This function retrieves genes based on their association with a specific loading weights and state from a given model. It allows for the selection of genes with the highest and lowest loading weight values, and returns them in a sorted DataFrame.

Parameters
  • adata (AnnData) – The annotated data matrix containing gene expression data.

  • model_key (str) – The key corresponding to the model in the AnnData object.

  • state (Union[List[str], Tuple[str, str], str]) – The state from which to extract gene information.

  • factor (int) – The index of the factor based on which genes are ordered.

  • sign (Union[int, float]) – Multiplier to adjust the direction of factor values.

  • variable (str) – The type of vector from which factor values are extracted.

  • highest (int) – The number of top genes with the highest factor values to retrieve.

  • lowest (int) – The number of genes with the lowest factor values to retrieve.

  • ascending (bool) – Whether to sort the genes in ascending order of factor values.

Returns

A DataFrame containing genes ordered by their factor values. Columns include gene name, magnitude of association, weight, type (highest/lowest), state, factor index, and gene index.

Return type

pd.DataFrame

Raises

ValueError – If the provided model key or state is not present in the AnnData object.

scpca.utils.data.umap(adata, model_key, neighbors_kwargs={}, umap_kwargs={})#

Performs UMAP dimensionality reduction on an AnnData object. Uses scanpy’s UMAP function but stores the nearest neighbors graph and UMAP coordinates in the anndata object with the a model_key prefix.

Parameters
  • adata (AnnData) – The AnnData object containing the data to be processed.

  • model_key (str) – The basis to use for the UMAP calculation.

  • neighbors_kwargs (Dict[str, Any]) – Additional keyword arguments to be passed to sc.pp.neighbors function. Default is an empty dictionary.

  • umap_kwargs (Dict[str, Any]) – Additional keyword arguments to be passed to sc.tl.umap function. Default is an empty dictionary.

Returns

Return type

None

Notes

This function performs UMAP dimensionality reduction on the input adata object using the specified embedding of the specified model. It first computes the neighbors graph using the sc.pp.neighbors function, with the option to provide additional keyword arguments via neighbors_kwargs. Then, it applies the UMAP algorithm using the sc.tl.umap function, with the option to provide additional keyword arguments via umap_kwargs. Finally, it stores the UMAP coordinates in the obsm attribute of the adata object under the key “{model_key}_umap” or “X_{model_key}_umap” respectively.

Example

>>> adata = AnnData(X)
>>> umap(adata, model_key="pca", neighbors_kwargs={"n_neighbors": 10}, umap_kwargs={"min_dist": 0.5})