The preprocessing (pp) accessor

The preprocessing accessor provides several methods to subset and process image data.

class spatialproteomics.pp.preprocessing.PreprocessingAccessor(xarray_obj)

The image accessor enables fast indexing and preprocessing of the spatialproteomics object.

add_channel(channels: str | list, array: ndarray) Dataset

Adds channel(s) to an existing image container.

Parameters:
  • channels (Union[str, list]) – The name of the channel or a list of channel names to be added.

  • array (np.ndarray) – The numpy array representing the channel(s) to be added.

Returns:

The updated image container with added channel(s).

Return type:

xr.Dataset

add_feature(feature_name: str, feature_values: list | ndarray)

Adds a feature to the image container.

Parameters:
  • feature_name (str) – The name of the feature to be added.

  • feature_values – The values of the feature to be added.

Returns:

The updated image container with the added feature.

Return type:

xr.Dataset

add_layer(array: ndarray, key_added: str = '_mask') Dataset

Adds a layer (such as a mask highlighting artifacts) to the xarray dataset.

Parameters:
  • array (np.ndarray) – The array representing the layer to be added. Can either be 2D or 3D (in this case, the first dimension should be the number of channels).

  • key_added (str, optional) – The name of the added layer in the xarray dataset. Default is ‘_mask’.

Returns:

The updated dataset with the added layer.

Return type:

xr.Dataset

Raises:

AssertionError – If the array is not 2-dimensional or its shape does not match the image shape.

Notes

This method adds a layer to the xarray dataset, where the layer has the same shape as the image field. The array should be a 2-dimensional numpy array representing the segmentation mask or layer to be added. The layer is created as a DataArray with the same coordinates and dimensions as the image field. The name of the added layer in the xarray dataset can be specified using the key_added parameter. The amended xarray dataset is returned after merging the original dataset with the new layer.

add_layer_from_dataframe(df: DataFrame, key_added: str = '_la_layers') Dataset

Adds a dataframe as a layer to the xarray object. This is similar to add_obs, with the only difference that it can be used to add any kind of data to the xarray object. Useful to add things like string-based labels or other metadata.

Parameters:

df (pd.DataFrame) – A dataframe with the observation values.

Returns:

The amended image container.

Return type:

xr.Dataset

add_obs_from_dataframe(df: DataFrame) Dataset

Adds an observation table to the image container. Columns of the dataframe have to match the feature coordinates of the image container, and the index of the dataframe has to match the cell coordinates of the image container.

Parameters:

df (pd.DataFrame) – A dataframe with the observation values.

Returns:

The amended image container.

Return type:

xr.Dataset

add_observations(properties: str | list | tuple = ('label', 'centroid'), layer_key: str = '_segmentation', return_xarray: bool = False) Dataset

Adds properties derived from the segmentation mask to the image container.

Parameters:
  • properties (Union[str, list, tuple]) – A list of properties to be added to the image container. See skimage.measure.regionprops_table for a list of available properties.

  • layer_key (str) – The key of the layer that contains the segmentation mask.

  • return_xarray (bool) – If true, the function returns an xarray.DataArray with the properties instead of adding them to the image container.

Returns:

The amended image container.

Return type:

xr.Dataset

add_quantification(func: str | Callable = 'intensity_mean', key_added: str = '_intensity', layer_key: str = '_image', return_xarray=False) Dataset

Quantify channel intensities over the segmentation mask.

Parameters:
  • func (Callable or str, optional) – The function used for quantification. Can either be a string to specify a function from skimage.measure.regionprops_table or a custom function. Default is ‘intensity_mean’.

  • key_added (str, optional) – The key under which the quantification data will be stored in the image container. Default is ‘_intensity’.

  • layer_key (str, optional) – The key of the layer to be quantified. Default is ‘_image’.

  • return_xarray (bool, optional) – If True, the function returns an xarray.DataArray with the quantification data instead of adding it to the image container.

Returns:

The updated image container with added quantification data or the quantification data as a separate xarray.DataArray.

Return type:

xr.Dataset or xr.DataArray

add_quantification_from_dataframe(df: DataFrame, key_added: str = '_intensity') Dataset

Adds an observation table to the image container. Columns of the dataframe have to match the channel coordinates of the image container, and the index of the dataframe has to match the cell coordinates of the image container.

Parameters:
  • df (pd.DataFrame) – A dataframe with the quantification values.

  • key_added (str, optional) – The key under which the quantification data will be added to the image container.

Returns:

The amended image container.

Return type:

xr.Dataset

add_segmentation(segmentation: str | ndarray | None = None, reindex: bool = True, keep_labels: bool = True, add_obs: bool = True) Dataset

Adds a segmentation mask field to the xarray dataset. This will be stored in the ‘_segmentation’ layer.

Parameters:
  • segmentation (str or np.ndarray) – A segmentation mask, i.e., a np.ndarray with image.shape = (x, y), that indicates the location of each cell, or a layer key.

  • mask_growth (int) – The number of pixels by which the segmentation mask should be grown.

  • reindex (bool) – If true the segmentation mask is relabeled to have continuous numbers from 1 to n.

  • keep_labels (bool) – When using cellpose on multiple channels, you may already get some initial celltype annotations from those. If you want to keep those annotations, set this to True. Default is True.

  • add_obs (bool) – If True, centroids are added to the xarray. Default is True.

Returns:

The amended xarray.

Return type:

xr.Dataset

apply(func: Callable, key: str = '_image', key_added: str = '_image', **kwargs)

Apply a function to each channel independently.

Parameters:
  • func (Callable) – The function to apply to the layer.

  • key (str) – The key of the layer to apply the function to. Default is ‘_image’.

  • key_added (str) – The key under which the updated layer will be stored. Default is ‘_image’ (i. e. the original image will be overwritten).

  • **kwargs (dict, optional) – Additional keyword arguments to pass to the function.

Returns:

The updated image container with the applied function.

Return type:

xr.Dataset

convert_to_8bit(key: str = '_image', key_added: str = '_image')

Convert the image to 8-bit.

Parameters:
  • key (str) – The key of the image layer in the object. Default is ‘_image’.

  • key_added (str) – The key to assign to the 8-bit image in the object. Default is ‘_image’, which overwrites the original image.

Returns:

The object with the image converted to 8-bit.

Return type:

xr.Dataset

downsample(rate: int)

Downsamples the entire dataset by selecting every rate-th element along the x and y dimensions.

Parameters:

rate (int) – The downsampling rate. Only every rate-th pixel (or coordinate) is kept.

Returns:

The downsampled dataset with updated x and y coordinates.

Return type:

xr.Dataset

drop_layers(layers: str | list | None = None, keep: str | list | None = None, drop_obs: bool = True, suppress_warnings: bool = False) Dataset

Drops layers from the image container. Can either drop all layers specified in layers or drop all layers but the ones specified in keep.

Parameters:
  • layers (Union[str, list]) – The name of the layer or a list of layer names to be dropped.

  • keep (Union[str, list]) – The name of the layer or a list of layer names to be kept.

  • drop_obs (bool) – If True, the observations are removed when the label or neighborhood properties are dropped. Default is True.

  • suppress_warnings (bool) – If True, warnings are suppressed. Default is False.

Returns:

The updated image container with dropped layers.

Return type:

xr.Dataset

filter_by_obs(col: str, func: Callable, segmentation_key: str = '_segmentation')

Filter the object by observations based on a given feature and filtering function.

Parameters:
  • col (str) – The name of the feature to filter by.

  • func (Callable) – A filtering function that takes in the values of the feature and returns a boolean array.

  • segmentation_key (str) – The key of the segmentation mask in the object. Default is Layers.SEGMENTATION.

Returns:

The filtered object with the selected cells and updated segmentation mask.

Return type:

xr.Dataset

Raises:

AssertionError – If the feature does not exist in the object’s observations.

Notes

  • This method filters the object by selecting only the cells that satisfy the filtering condition.

  • It also updates the segmentation mask to remove cells that are not selected and relabels the remaining cells.

Example

To filter the object by the feature “area” and keep only the cells with an area greater than 70px: obj = obj.pp.add_observations(‘area’).pp.filter_by_obs(‘area’, lambda x: x > 70)

get_bbox(x_slice: slice, y_slice: slice) Dataset

Returns the bounds of the image container.

Parameters:
  • x_slice (slice) – The slice representing the x-coordinates for the bounding box.

  • y_slice (slice) – The slice representing the y-coordinates for the bounding box.

Returns:

The updated image container.

Return type:

xr.Dataset

get_channels(channels: List[str] | str) Dataset

Retrieve the specified channels from the dataset.

Parameters:

channels (Union[List[str], str]) – The channels to retrieve. Can be a single channel name or a list of channel names.

Returns:

The dataset containing the specified channels.

Return type:

xr.Dataset

get_disconnected_cell() int

Returns the first disconnected cell from the segmentation layer.

Returns:

The first disconnected cell from the segmentation layer.

Return type:

np.ndarray

get_layer_as_df(layer: str = '_obs', celltypes_to_str: bool = True, neighborhoods_to_str: bool = True, idx_to_str: bool = False) DataFrame

Returns the specified layer as a pandas DataFrame.

Parameters:
  • layer (str) – The name of the layer to retrieve. Defaults to Layers.OBS.

  • celltypes_to_str (bool) – Whether to convert celltype labels to strings. Defaults to True.

  • neighborhoods_to_str (bool) – Whether to convert neighborhood labels to strings. Defaults to True.

  • idx_to_str (bool) – Whether to convert the index to strings. Defaults to False.

Returns:

The layer data as a DataFrame.

Return type:

pandas.DataFrame

grow_cells(iterations: int = 2, suppress_warning: bool = False) Dataset

Grows the segmentation masks by expanding the labels in the object.

Parameters:
  • iterations (int) – The number of iterations to grow the segmentation masks. Default is 2.

  • suppress_warning (bool) – Whether to suppress the warning about recalculating the observations. Used internally, default is False.

Raises:

ValueError – If the object does not contain a segmentation mask.

Returns:

The object with the grown segmentation masks and updated observations.

Return type:

xr.Dataset

mask_cells(mask_key: str = '_mask', segmentation_key='_segmentation') Dataset

Mask cells in the segmentation mask.

Parameters:
  • mask_key (str) – The key of the mask to use for masking.

  • segmentation_key (str) – The key of the segmentation mask in the object. Default is Layers.SEGMENTATION.

Returns:

The object with the masked cells in the segmentation mask.

Return type:

xr.Dataset

mask_region(key: str = '_mask', image_key='_image', key_added='_image') Dataset

Mask a region in the image.

Parameters:
  • key (str) – The key of the region to mask.

  • image_key (str) – The key of the image layer in the object. Default is Layers.IMAGE.

  • key_added (str) – The key to assign to the masked image in the object. Default is Layers.IMAGE, which overwrites the original image.

Returns:

The object with the masked region in the image.

Return type:

xr.Dataset

merge_segmentation(layer_key: str, key_added: str = '_merged_segmentation', labels: List[str] | None = None, threshold: float = 0.8)

Merge segmentation masks. This can be done in two ways: either by merging a multi-dimensional array from the object directly, or by adding a numpy array. You can either just merge a multi-dimensional array, or merge to an existing 1D mask (e. g. a precomputed DAPI segmentation).

Parameters:
  • layer_key (Union[str, List[str]]) – The key(s) of the segmentation mask(s) to merge. Can be a single key (must be 3D) or a list of keys (each 2D).

  • key_added (str) – The name of the new segmentation mask to be added to the xarray object. Default is “_merged_segmentation”.

  • labels (Optional[List[str]]) – Optional. Labels corresponding to each segmentation mask. If provided, must match number of arrays.

  • threshold (float) – Optional. Threshold for merging cells. Default is 0.8.

Returns:

The xarray object with the merged segmentation mask.

Return type:

xr.Dataset

Raises:

AssertionError – If specified keys are not found or other input inconsistencies exist.

Notes

  • If the input array is 2D, it will be expanded to 3D.

  • If labels are provided, they need to match the number of arrays.

  • The merging process starts with merging the biggest cells first, then the smaller ones.

  • Disconnected cells in the input are handled based on the specified method.

normalize()

Performs a percentile normalization on each channel using the 3- and 99.8-percentile. Resulting values are in the range of 0 to 1.

Returns:

The image container with the normalized image stored in ‘_plot’.

Return type:

xr.Dataset

remove_outlying_cells(dilation_size: int = 25, threshold: int = 5, segmentation_key: str = '_segmentation')

Removes outlying cells from the image container. It does so by dilating the segmentation mask and removing cells that belong to a connected component with less than ‘threshold’ cells.

Parameters:
  • dilation_size (int) – The size of the dilation kernel. Default is 25.

  • threshold (int) – The minimum number of cells in a connected component required for the cells to be kept. Default is 5.

  • segmentation_key (str) – The key of the segmentation mask in the object. Default is ‘_segmentation’.

Returns:

The updated image container with the outlying cells removed.

Return type:

xr.Dataset

Raises:

ValueError – If the object does not contain a segmentation mask.

rescale(scale: int)

Rescales the image and segmentation mask in the object by a given scale.

Parameters:

scale (int) – The scale factor by which to rescale the image and segmentation mask.

Returns:

The rescaled object containing the updated image and segmentation mask.

Return type:

xr.Dataset

Raises:
  • - AssertionError – If no image layer is found in the object.:

  • - AssertionError – If no segmentation mask is found in the object.:

threshold(quantile: float | list | None = None, intensity: int | list | None = None, key_added: str | None = None, channels: str | list | None = None, shift: bool = True, **kwargs)

Apply thresholding to the image layer of the object. By default, shift is set to true. This means that the threshold value is subtracted from the image, and all negative values are set to 0. If you instead want to set all values below the threshold to 0 while retaining the rest of the image at the original values, set shift to False.

Parameters:
  • quantile (float) – The quantile value used for thresholding. If provided, the pixels below this quantile will be set to 0.

  • intensity (int) – The absolute intensity value used for thresholding. If provided, the pixels below this intensity will be set to 0.

  • key_added (Optional[str])) – The name of the new image layer after thresholding. If not provided, the original image layer will be replaced.

  • channels (Optional[Union[str, list]])) – The channels to apply the thresholding to. If None, the thresholding will be applied to all channels.

  • shift (bool) – If True, the thresholded image will be shifted so that values do not start at an arbitrary value. Default is True.

Returns:

The object with the thresholding applied to the image layer.

Return type:

xr.Dataset

Raises:

ValueError – If both quantile and intensity are None or if both quantile and intensity are provided.

transform_expression_matrix(method: str = 'arcsinh', key: str = '_intensity', key_added: str = '_intensity', cofactor: float = 5.0, min_percentile: float = 1.0, max_percentile: float = 99.0, **kwargs)

Transforms the expression matrix based on the specified mode.

Parameters:
  • method (str) – The transformation method. Available options are “arcsinh”, “zscore”, “minmax”, “double_zscore”, and “clip”.

  • key (str) – The key of the expression matrix in the object.

  • key_added (str) – The key to assign to the transformed matrix in the object.

  • cofactor (float) – The cofactor to use for the “arcsinh” transformation.

  • min_percentile (float) – The minimum percentile value to use for the “clip” transformation.

  • max_percentile (float) – The maximum percentile value to use for the “clip” transformation.

Returns:

The object with the transformed matrix added.

Return type:

xr.Dataset

Raises:
  • ValueError – If an unknown transformation mode is specified.

  • AssertionError – If no expression matrix is found at the specified layer.

spatialproteomics.pp.preprocessing.add_observations(sdata, properties: str | list | tuple = ('label', 'centroid'), segmentation_key: str = 'segmentation', table_key: str = 'table', copy: bool = False, **kwargs)

This function computes the observations for each region in the segmentation masks. It extracts the segmentation masks from the spatialdata object, computes the region properties, and adds the observations to the AnnData object stored in the tables attribute of the spatialdata object. The observations are computed using the regionprops_table function from skimage.measure. The properties to be computed can be specified as a string or a list/tuple of strings. The default properties are “label” and “centroid”, but other properties can be added as well.

Parameters:
  • sdata (spatialdata.SpatialData) – The spatialdata object containing the segmentation masks.

  • properties (Union[str, list, tuple], optional) – The properties to be computed for each region. Defaults to (“label”, “centroid”).

  • segmentation_key (str, optional) – The key for the segmentation masks in the spatialdata object. Defaults to segmentation.

  • table_key (str, optional) – The key under which the AnnData object is stored in the tables attribute of the spatialdata object. Defaults to table.

  • copy (bool, optional) – Whether to create a copy of the spatialdata object. Defaults to False.

spatialproteomics.pp.preprocessing.add_quantification(sdata, func: str | Callable = 'intensity_mean', key_added: str = 'table', image_key: str = 'image', segmentation_key: str = 'segmentation', layer_key: str | None = None, data_key: str | None = None, copy: bool = False, **kwargs)

This function computes the quantification of the image data based on the provided segmentation masks. It extracts the image data and segmentation masks from the spatialdata object, applies the quantification function, and adds the quantification results to the spatialdata object. The quantification results are stored in an AnnData object, which is added to the tables attribute of the spatialdata object. The quantification function can be specified as a string or a callable function.

Parameters:
  • sdata (spatialdata.SpatialData) – The spatialdata object containing the image data and segmentation masks.

  • func (Union[str, Callable], optional) – The quantification function to be applied. Defaults to “intensity_mean”. Can be a string or a callable function.

  • key_added (str, optional) – The key under which the quantification results will be stored in the tables attribute of the spatialdata object. Defaults to table.

  • image_key (str, optional) – The key for the image data in the spatialdata object. Defaults to image.

  • segmentation_key (str, optional) – The key for the segmentation masks in the spatialdata object. Defaults to segmentation.

  • layer_key (Optional[str], optional) – The key for the quantification results in the AnnData object. If None, a new layer will be created. Defaults to None.

  • data_key (Optional[str], optional) – The key for the image data in the spatialdata object. If None, the image_key will be used. Defaults to None.

  • copy (bool, optional) – Whether to create a copy of the spatialdata object. Defaults to False.

spatialproteomics.pp.preprocessing.apply(sdata, func: Callable, key_added: str = 'image', image_key: str = 'image', data_key: str | None = None, copy: bool = False, **kwargs)

This function applies a given function to the image data in the spatialdata object. It extracts the image data from the spatialdata object, applies the function, and adds the processed image back to the spatialdata object. The processed image is stored in the images attribute of the spatialdata object. The function can be any callable function that takes an image as input and returns a processed image.

Parameters:
  • sdata (spatialdata.SpatialData) – The spatialdata object containing the image data.

  • func (Callable) – The function to be applied to the image data. It should take an image as input and return a processed image.

  • image_key (str, optional) – The key for the image data in the spatialdata object. Defaults to image.

  • data_key (Optional[str], optional) – The key for the image data in the spatialdata object. If None, the image_key will be used. Defaults to None.

  • copy (bool, optional) – Whether to create a copy of the spatialdata object. Defaults to False.

  • **kwargs – Additional keyword arguments to be passed to the function.

spatialproteomics.pp.preprocessing.filter_by_obs(sdata, col: str, func: Callable, segmentation_key: str = 'segmentation', table_key: str = 'table', copy: bool = False)

Filter the object by observations based on a given feature and filtering function.

Parameters:
  • sdata (spatialdata.SpatialData) – The spatialdata object to filter.

  • col (str) – The name of the feature to filter by.

  • func (Callable) – A filtering function that takes in the values of the feature and returns a boolean array.

  • segmentation_key (str) – The key of the segmentation mask in the object. Default is SDLayers.SEGMENTATION.

  • table_key (str) – The key of the table in the object. Default is SDLayers.TABLE.

  • copy (bool) – If True, a copy of the object is returned. Default is False.

spatialproteomics.pp.preprocessing.grow_cells(sdata, iterations: int = 2, segmentation_key: str = 'segmentation', table_key: str = 'table', suppress_warning: bool = False, copy: bool = False) Dataset

Grows the segmentation masks by expanding the labels in the object.

Parameters:
  • sdata (spatialdata.SpatialData) – The spatialdata object containing the segmentation masks.

  • iterations (int) – The number of iterations to grow the segmentation masks. Default is 2.

  • segmentation_key (str) – The key of the segmentation mask in the object. Default is segmentation.

  • suppress_warning (bool) – Whether to suppress the warning about recalculating the observations. Used internally, default is False.

  • copy (bool) – If True, a copy of the object is returned. Default is False.

Raises:

ValueError – If the object does not contain a segmentation mask.

Returns:

The object with the grown segmentation masks and updated observations.

Return type:

xr.Dataset

spatialproteomics.pp.preprocessing.threshold(sdata, image_key: str = 'image', quantile: float | list | None = None, intensity: int | list | None = None, key_added: str = 'image', channels: str | list | None = None, shift: bool = True, copy: bool = False, **kwargs)

This function applies a threshold to the image data in the spatialdata object. It extracts the image data from the spatialdata object, applies the thresholding function, and adds the processed image back to the spatialdata object. The processed image is stored in the images attribute of the spatialdata object. The thresholding function can be specified using the quantile or intensity parameters.

Parameters:
  • sdata (spatialdata.SpatialData) – The spatialdata object containing the image data.

  • image_key (str, optional) – The key for the image data in the spatialdata object. Defaults to image.

  • quantile (Union[float, list], optional) – The quantile value(s) to be used for thresholding. If None, the intensity parameter will be used. Defaults to None.

  • intensity (Union[int, list], optional) – The intensity value(s) to be used for thresholding. If None, the quantile parameter will be used. Defaults to None.

  • key_added (str, optional) – The key under which the processed image will be stored in the images attribute of the spatialdata object. Defaults to image.

  • channels (Optional[Union[str, list]], optional) – The channel(s) to be used for thresholding. If None, all channels will be used. Defaults to None.

  • shift (bool, optional) – Whether to shift the intensities towards 0 after thresholding. Defaults to True.

  • copy (bool, optional) – Whether to create a copy of the spatialdata object. Defaults to False.

  • **kwargs – Additional keyword arguments to be passed to the thresholding function.

spatialproteomics.pp.preprocessing.transform_expression_matrix(sdata, method: str = 'arcsinh', table_key: str = 'table', cofactor: float = 5.0, min_percentile: float = 1.0, max_percentile: float = 99.0, copy: bool = False, **kwargs)

This function applies a transformation to the expression matrix in the spatialdata object. It extracts the expression matrix from the spatialdata object, applies the transformation function, and adds the transformed expression matrix back to the spatialdata object. The transformed expression matrix is stored in the tables attribute of the spatialdata object.

Parameters:
  • sdata (spatialdata.SpatialData) – The spatialdata object containing the expression matrix.

  • method (str, optional) – The transformation method to be applied. Defaults to “arcsinh”.

  • table_key (str, optional) – The key under which the expression matrix is stored in the tables attribute of the spatialdata object. Defaults to “table”.

  • cofactor (float, optional) – The cofactor to be used for the transformation. Defaults to 5.0.

  • min_percentile (float, optional) – The minimum percentile to be used for the transformation. Defaults to 1.0.

  • max_percentile (float, optional) – The maximum percentile to be used for the transformation. Defaults to 99.0.

  • copy (bool, optional) – Whether to create a copy of the spatialdata object. Defaults to False.

  • **kwargs – Additional keyword arguments to be passed to the transformation function.