FAQ

Here are some common questions and answers. If you can’t find what you’re looking for here, please file an issue on our GitHub page.

How can I optimize my memory usage?

Running out of memory is a common problem when dealing with large images. Here are a couple of things you could consider to make your workflow more memory efficient.

  • Using zarr: if you already have your objects stored in zarr format, it is more memory-efficient to load them using xr.open_zarr(file) than xr.load_dataset(file).

  • Subsetting: if you already know that you will require only a subset of your data, e. g. looking at certain channels, it is advised to perform subsetting as early as possible. This can be done with ds.pp[channels].

  • Deleting objects which are not required anymore: spatialproteomics deliberately does not perform in-place operations, but rather copies the existing object to return a new one. This can be heavy on memory, if you do not remove intermediate variables once you do not need them anymore. You could consider removing them with del like this:

ds_new = ds.pp.your_workflow()
del ds
  • Downsampling: when looking at large images, you can downsample the image before plotting using ds.pp.downsample(rate=8). When zooming into a specific area, you can omit the downsampling again.

  • Garbage collection: this is especially relevant if you perform operations in a for loop. If you try to store the dataset in the same variable, python’s garbage collection might have some troubles freeing up memory due to cyclical references (for more information, please refer to this post). In this case, calling the garbage collector manually can help alleviate some of those issues:

import gc
for ds in [...]:
    ds = ds.pp.your_workflow()
    gc.collect()  # manually calling the garbage collector after each iteration

When do I apply method directly on the object, and when do use sp.method()?

Spatialproteomics has two distinct backends: a xarray backend and a spatialdata backend. These follow slightly different philosophies.

The xarray backend is based on a functional programming design. This means that you can use it to call methods directly on your object, allowing you to pipe data from one step to the next. For example, this could look like my_data.pp.segment().la.predict_cell_types().pl.show(). Internally, spatialproteomics takes care of synchronizing shared dimensions across your data.

The spatialdata backend is for when you want to use spatialdata objects from the start. In this case, the syntax is slightly different, but more similar to syntax you might be used to from scverse packages (such as scanpy or squidpy). Here, your code would look like this:

import spatialproteomics as sp
sp.pp.segment(my_data)
sp.pp.predict_cell_types(my_data)

These operations modify your object in-place, unless you set copy=True in the method signature.