FAQ¶
Here are some common questions and answers. If you can’t find what you’re looking for here, please file an issue on our GitHub page.
How can I optimize my memory usage?¶
Running out of memory is a common problem when dealing with large images. Here are a couple of things you could consider to make your workflow more memory efficient.
Using zarr: if you already have your objects stored in
zarr
format, it is more memory-efficient to load them usingxr.open_zarr(file)
thanxr.load_dataset(file)
.Subsetting: if you already know that you will require only a subset of your data, e. g. looking at certain channels, it is advised to perform subsetting as early as possible. This can be done with
ds.pp[channels]
.Deleting objects which are not required anymore: spatialproteomics deliberately does not perform in-place operations, but rather copies the existing object to return a new one. This can be heavy on memory, if you do not remove intermediate variables once you do not need them anymore. You could consider removing them with
del
like this:
ds_new = ds.pp.your_workflow()
del ds
Downsampling: when looking at large images, you can downsample the image before plotting using
ds.pp.downsample(rate=8)
. When zooming into a specific area, you can omit the downsampling again.Garbage collection: this is especially relevant if you perform operations in a for loop. If you try to store the dataset in the same variable, python’s garbage collection might have some troubles freeing up memory due to cyclical references (for more information, please refer to this post). In this case, calling the garbage collector manually can help alleviate some of those issues:
import gc
for ds in [...]:
ds = ds.pp.your_workflow()
gc.collect() # manually calling the garbage collector after each iteration
When do I apply method directly on the object, and when do use sp.method()?¶
Spatialproteomics has two distinct backends: a xarray
backend and a spatialdata
backend. These follow slightly different philosophies.
The xarray
backend is based on a functional programming design. This means that you can use it to call methods directly on your object, allowing you to pipe data from one step to the next. For example, this could look like my_data.pp.segment().la.predict_cell_types().pl.show()
. Internally, spatialproteomics
takes care of synchronizing shared dimensions across your data.
The spatialdata
backend is for when you want to use spatialdata
objects from the start. In this case, the syntax is slightly different, but more similar to syntax you might be used to from scverse packages (such as scanpy or squidpy). Here, your code would look like this:
import spatialproteomics as sp
sp.pp.segment(my_data)
sp.pp.predict_cell_types(my_data)
These operations modify your object in-place, unless you set copy=True
in the method signature.