# Interoperability

This notebook shows some way that you can import and export data from `spatialproteomics`.

In [1]:
%reload_ext autoreload
%autoreload 2

import spatialproteomics as sp
import pandas as pd
import xarray as xr
import os
import shutil
import anndata

## Exporting Data

Once you are happy with your analysis, you will likely want to export the results. The easiest way to do this is by using the `zarr` format, but `csv`, `anndata`, and `spatialdata` are also supported.

In [2]:
# loading a test file which we will export later
# notice how easy it is to load the file from a zarr using xarray
ds = xr.open_zarr("../../tests/test_files/ds_neighborhoods.zarr")
ds

Unnamed: 0,Array,Chunk
Bytes,24.50 kiB,24.50 kiB
Shape,"(56, 56)","(56, 56)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,int64 numpy.ndarray,int64 numpy.ndarray
"Array Chunk Bytes 24.50 kiB 24.50 kiB Shape (56, 56) (56, 56) Dask graph 1 chunks in 2 graph layers Data type int64 numpy.ndarray",56  56,

Unnamed: 0,Array,Chunk
Bytes,24.50 kiB,24.50 kiB
Shape,"(56, 56)","(56, 56)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,int64 numpy.ndarray,int64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,49.81 kiB,49.81 kiB
Shape,"(5, 101, 101)","(5, 101, 101)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,uint8 numpy.ndarray,uint8 numpy.ndarray
"Array Chunk Bytes 49.81 kiB 49.81 kiB Shape (5, 101, 101) (5, 101, 101) Dask graph 1 chunks in 2 graph layers Data type uint8 numpy.ndarray",101  101  5,

Unnamed: 0,Array,Chunk
Bytes,49.81 kiB,49.81 kiB
Shape,"(5, 101, 101)","(5, 101, 101)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,uint8 numpy.ndarray,uint8 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.19 kiB,2.19 kiB
Shape,"(56, 5)","(56, 5)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 2.19 kiB 2.19 kiB Shape (56, 5) (56, 5) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",5  56,

Unnamed: 0,Array,Chunk
Bytes,2.19 kiB,2.19 kiB
Shape,"(56, 5)","(56, 5)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,896 B,896 B
Shape,"(56, 2)","(56, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,object numpy.ndarray,object numpy.ndarray
"Array Chunk Bytes 896 B 896 B Shape (56, 2) (56, 2) Dask graph 1 chunks in 2 graph layers Data type object numpy.ndarray",2  56,

Unnamed: 0,Array,Chunk
Bytes,896 B,896 B
Shape,"(56, 2)","(56, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,object numpy.ndarray,object numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,64 B,64 B
Shape,"(4, 2)","(4, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,object numpy.ndarray,object numpy.ndarray
"Array Chunk Bytes 64 B 64 B Shape (4, 2) (4, 2) Dask graph 1 chunks in 2 graph layers Data type object numpy.ndarray",2  4,

Unnamed: 0,Array,Chunk
Bytes,64 B,64 B
Shape,"(4, 2)","(4, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,object numpy.ndarray,object numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.75 kiB,1.75 kiB
Shape,"(56, 4)","(56, 4)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 1.75 kiB 1.75 kiB Shape (56, 4) (56, 4) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",4  56,

Unnamed: 0,Array,Chunk
Bytes,1.75 kiB,1.75 kiB
Shape,"(56, 4)","(56, 4)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,560 B,560 B
Shape,"(5, 2)","(5, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,,
"Array Chunk Bytes 560 B 560 B Shape (5, 2) (5, 2) Dask graph 1 chunks in 2 graph layers Data type",2  5,

Unnamed: 0,Array,Chunk
Bytes,560 B,560 B
Shape,"(5, 2)","(5, 2)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,,

Unnamed: 0,Array,Chunk
Bytes,2.62 kiB,2.62 kiB
Shape,"(56, 6)","(56, 6)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 2.62 kiB 2.62 kiB Shape (56, 6) (56, 6) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",6  56,

Unnamed: 0,Array,Chunk
Bytes,2.62 kiB,2.62 kiB
Shape,"(56, 6)","(56, 6)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,2.19 kiB,2.19 kiB
Shape,"(56, 5)","(56, 5)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 2.19 kiB 2.19 kiB Shape (56, 5) (56, 5) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",5  56,

Unnamed: 0,Array,Chunk
Bytes,2.19 kiB,2.19 kiB
Shape,"(56, 5)","(56, 5)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,79.70 kiB,79.70 kiB
Shape,"(101, 101)","(101, 101)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,int64 numpy.ndarray,int64 numpy.ndarray
"Array Chunk Bytes 79.70 kiB 79.70 kiB Shape (101, 101) (101, 101) Dask graph 1 chunks in 2 graph layers Data type int64 numpy.ndarray",101  101,

Unnamed: 0,Array,Chunk
Bytes,79.70 kiB,79.70 kiB
Shape,"(101, 101)","(101, 101)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,int64 numpy.ndarray,int64 numpy.ndarray


## Exporting to Zarr
This is the easiest file format to work with. It allows you to store and load the xarray objects with a single line of code. It is highly recommended to call `drop_encoding()` before exporting to zarr. There are several open issues linked to encoding problems, and this is the easiest way to circumvent them. For more references, refer to these issues: [issue 1](https://github.com/pydata/xarray/issues/3476), [issue 2](https://github.com/pydata/xarray/issues/9037).

In [3]:
zarr_path = "tmp.zarr"

# removing the zarr if it exists
if os.path.exists(zarr_path):
    shutil.rmtree(zarr_path)

# exporting as zarr
ds.drop_encoding().to_zarr("tmp.zarr")

<xarray.backends.zarr.ZarrStore at 0x7f1c5e212e40>

## Exporting Tables to CSV
Let's say you want to export some tables as csvs. This can be done with pandas.

In [5]:
df = ds.pp.get_layer_as_df("_obs")
df.head()

Unnamed: 0,CD4_binarized,CD8_binarized,_labels,_neighborhoods,centroid-0,centroid-1
1,0.0,0.0,B,Neighborhood 1,2103.768519,1607.277778
2,0.0,1.0,T_tox,Neighborhood 1,2103.857143,1630.741071
3,1.0,1.0,T_h,Neighborhood 3,2104.837037,1668.733333
4,0.0,1.0,T_tox,Neighborhood 3,2101.75,1677.0
5,0.0,1.0,B,Neighborhood 3,2104.416058,1685.627737


In [6]:
# exporting as csv
df.to_csv("tmp.csv")

## Exporting to AnnData
AnnData is a format used by scanpy, which can be useful to create interesting plots and downstream analyses. For this reason, you can export the xarray object as an AnnData object. Note that this object will only store the tabular data, but not the image or the segmentation layer.

In [9]:
# putting the expression matrix into an anndata object
adata = ds.tl.convert_to_anndata(
    expression_matrix_key="_intensity",
    additional_layers={"percentage_positive": "_percentage_positive"},
    additional_uns={"label_colors": "_la_properties"},
)
adata

AnnData object with n_obs × n_vars = 56 × 5
    obs: 'CD4_binarized', 'CD8_binarized', '_labels', '_neighborhoods', 'centroid-0', 'centroid-1'
    uns: '_labels_colors', 'label_colors'
    obsm: 'spatial'
    layers: 'percentage_positive'

In [10]:
# writing to disk as hdf5
adata.write("tmp.h5ad")

... storing '_labels' as categorical
... storing '_neighborhoods' as categorical


## Exporting to SpatialData
SpatialData is a data format which is commonly used for spatial omics analysis and combines the power of zarr with anndata. You can export to this data format as well.

In [11]:
spatialdata_object = ds.tl.convert_to_spatialdata(expression_matrix_key="_intensity")
spatialdata_object



[34mINFO    [0m Transposing `data` of type: [1m<[0m[1;95mclass[0m[39m [0m[32m'dask.array.core.Array'[0m[1m>[0m to [1m([0m[32m'c'[0m, [32m'y'[0m, [32m'x'[0m[1m)[0m.                           
[34mINFO    [0m Transposing `data` of type: [1m<[0m[1;95mclass[0m[39m [0m[32m'dask.array.core.Array'[0m[1m>[0m to [1m([0m[32m'y'[0m, [32m'x'[0m[1m)[0m.                                


SpatialData object
├── Images
│     └── 'image': DataArray[cyx] (5, 101, 101)
├── Labels
│     └── 'segmentation': DataArray[yx] (101, 101)
└── Tables
      └── 'table': AnnData (56, 5)
with coordinate systems:
    ▸ 'global', with elements:
        image (Images), segmentation (Labels)

In [12]:
# storing as zarr file
spatialdata_object.write("tmp.zarr")

[34mINFO    [0m The Zarr backing store has been changed from [3;35mNone[0m the new file path: tmp.zarr                             


## Importing from Spatialdata

In the example workflow, you have already seen how to read data from a tiff file. If you already have your data in `spatialdata` format, you can also read it in from there. Reading in the data like this will convert the data from `spatialdata` format to `xarray` format, so that you can use the `xarray` backend of `spatialproteomics`.

In [13]:
ds = sp.read_from_spatialdata("../../data/spatialdata_example.zarr", image_key="raccoon")
ds

root_attr: multiscales
root_attr: omero
datasets [{'coordinateTransformations': [{'scale': [1.0, 1.0, 1.0], 'type': 'scale'}], 'path': '0'}]
  compressor, fill_value = _kwargs_compat(compressor, fill_value, kwargs)
resolution: 0
 - shape ('c', 'y', 'x') = (3, 768, 1024)
 - chunks =  ['3', '768', '1024']
 - dtype = uint8
root_attr: multiscales
root_attr: omero
Unsupported transform Identity , resetting coordinates for the spatialproteomics object.


Unnamed: 0,Array,Chunk
Bytes,2.25 MiB,2.25 MiB
Shape,"(3, 768, 1024)","(3, 768, 1024)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,uint8 numpy.ndarray,uint8 numpy.ndarray
"Array Chunk Bytes 2.25 MiB 2.25 MiB Shape (3, 768, 1024) (3, 768, 1024) Dask graph 1 chunks in 2 graph layers Data type uint8 numpy.ndarray",1024  768  3,

Unnamed: 0,Array,Chunk
Bytes,2.25 MiB,2.25 MiB
Shape,"(3, 768, 1024)","(3, 768, 1024)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,uint8 numpy.ndarray,uint8 numpy.ndarray
