Models#
scPCA#
- class scpca.pca.scPCA(adata, num_factors, layers_key=None, loadings_formula='1', intercept_formula='1', size_factor=None, subsampling=4096, device=None, seed=None, model_kwargs={}, training_kwargs={'loss': <class 'pyro.infer.trace_mean_field_elbo.TraceMeanField_ELBO'>, 'loss_kwargs': {'num_particles': 1}, 'num_epochs': 5000, 'optimizer': <function ClippedAdam>, 'optimizer_kwargs': {'betas': (0.95, 0.999), 'lr': 0.01}, 'scheduler': None})#
Single-cell Principal Component Analysis (scPCA) model.
This class provides an interface to perform scPCA on single-cell data. It allows for the extraction of principal components while accounting for batch effects and other covariates.
- Parameters
adata (
AnnData
) – Anndata object containing the single-cell data.num_factors (
int
) – Number of factors to fit.layers_key (
Optional
[str
]) – Key to extract single-cell count matrix from adata.layers. If None, scPCA will try to extract the count matrix from adata.X. Default is None.loadings_formula (
str
) – R style formula to construct the loadings design matrix from adata.obs. If None, scPCA fits a normal PCA. Default is “1”.intercept_formula (
str
) – R style formula to construct the intercept design matrix from adata.obs. Default is “1”, which fits a single mean offset for each gene across all cells.size_factor (
Union
[str
,ndarray
[Any
,dtype
[float32
]],None
]) – Optional size factor information for cells. Default is None, if no size factor is given scPCA computes simply computes the log sum of counts for each cell.subsampling (
int
) – Number of cells to subsample for training. Default is 4096.device (
Optional
[Literal
['cuda'
,'cpu'
]]) – Device to run the model on. A GPU is recommended. Default is GPU if available, else CPU.seed (
Optional
[int
]) – Random seed for reproducibility. Default is None.model_kwargs (
Dict
[str
,Any
]) – Additional keyword arguments for the model. Default values are provided.training_kwargs (
Dict
[str
,Any
]) – Additional keyword arguments for training. Default is DEFAULT.
- mean_to_anndata(model_key, num_samples=25, num_split=2048, variables=['W', 'Z'])#
Store the posterior mean estimates in the AnnData object for specified variables.
This method retrieves the posterior mean estimates for the given variables and stores them in the AnnData object. The variables can include weights (“W”), loadings (“V”), means (“μ”), latent factors (“Z”), among others.
- Parameters
model_key (
str
) – Key to store the model results in the AnnData object.num_samples (
int
) – Number of samples to draw from the posterior. Default is 25.num_split (
int
) – Number of splits for the data. Default is 2048.variables (
Sequence
[str
]) – List of variables for which the posterior mean estimates should be stored. Possible values include “W”, “V”, “μ”, “Z”, “α”, “σ”, and “offset”. Default is [“W”, “Z”].
- Return type
None
- Returns
The results are stored in the provided AnnData object.
- posterior_to_anndata(model_key, num_samples=25, variables=['W', 'Z'])#
Store the posterior samples in the AnnData object.
- Parameters
model_key (
str
) – Key to store the model in the AnnData object.num_samples (
int
) – Number of samples to draw from the posterior. Default is 25.variables (
Sequence
[str
]) – List of variables for which the posterior mean estimates should be stored. Possible values include “W”, “V”, “μ”, “Z”, “α”, “σ”, and “offset”. Default is [“W”, “Z”].
- Return type
None
dPCA#
- class scpca.pca.dPCA(adata, num_factors, layers_key=None, loadings_formula='1', intercept_formula='1', subsampling=4096, device=None, seed=None, model_kwargs={'z_sd': 1.0}, training_kwargs={'loss': <class 'pyro.infer.trace_mean_field_elbo.TraceMeanField_ELBO'>, 'loss_kwargs': {'num_particles': 1}, 'num_epochs': 5000, 'optimizer': <function ClippedAdam>, 'optimizer_kwargs': {'betas': (0.95, 0.999), 'lr': 0.01}, 'scheduler': None})#
Design Principal Component Analysis (dPCA) model.
- Parameters
adata (
AnnData
) – Anndata object containing the data to analyse.num_factors (
int
) – Number of factors to fit.layers_key (
Optional
[str
]) – Key to extract data matrix from adata.layers. If None, dPCA will try to extract the matrix from adata.X. Default is None.loadings_formula (
str
) – R style formula to construct the loadings design matrix from adata.obs. If None, dPCA fits a normal PCA/factor model. Default is ‘1’.batch_formula – R style formula to extract intercept design maxtrix from adata.obs. If None, dPCA assumes a single batch. Default is ‘1’.
subsampling (
int
) – Number of obs to subsample for training. Default is 4096.device (
Optional
[Literal
['cuda'
,'cpu'
]]) – Device to run the model on. A GPU is recommended. Default is GPU if available, else CPU.model_kwargs (
Dict
[str
,Any
]) – Additional keyword arguments for the model. Default values are provided.training_kwargs (
Dict
[str
,Any
]) – Additional keyword arguments for training. Default is SUBSAMPLE.
- mean_to_anndata(model_key, num_samples=25, num_split=2048, variables=['W', 'Z'])#
Store the posterior mean estimates in the AnnData object for specified variables.
This method retrieves the posterior mean estimates for the given variables and stores them in the AnnData object. The variables can include weights (“W”), loadings (“V”), means (“μ”), latent factors (“Z”), among others.
- Parameters
model_key (
str
) – Key to store the model results in the AnnData object.num_samples (
int
) – Number of samples to draw from the posterior. Default is 25.num_split (
int
) – Number of splits for the data. Default is 2048.variables (
Sequence
[str
]) – List of variables for which the posterior mean estimates should be stored. Possible values include “W”, “V”, “μ”, “Z”, “α”, “σ”, and “offset”. Default is [“W”, “Z”].
- Return type
None
- Returns
The results are stored in the provided AnnData object.
- posterior_to_anndata(model_key, num_samples=25, variables=['W', 'Z'])#
Store the posterior samples in the AnnData object.
- Parameters
model_key (
str
) – Key to store the model in the AnnData object.num_samples (
int
) – Number of samples to draw from the posterior. Default is 25.variables (
Sequence
[str
]) – List of variables for which the posterior mean estimates should be stored. Possible values include “W”, “V”, “μ”, “Z”, “α”, “σ”, and “offset”. Default is [“W”, “Z”].
- Return type
None