{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Interoperability\n", "\n", "This notebook shows some way that you can import and export data from `spatialproteomics`." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [] }, "outputs": [], "source": [ "%reload_ext autoreload\n", "%autoreload 2\n", "\n", "import spatialproteomics as sp\n", "import pandas as pd\n", "import xarray as xr\n", "import os\n", "import shutil\n", "import anndata" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exporting Data\n", "\n", "Once you are happy with your analysis, you will likely want to export the results. The easiest way to do this is by using the `zarr` format, but `csv`, `anndata`, and `spatialdata` are also supported." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.Dataset> Size: 171kB\n",
       "Dimensions:               (cells: 56, cells_2: 56, channels: 5, y: 101, x: 101,\n",
       "                           la_features: 2, labels: 4, la_props: 2,\n",
       "                           neighborhoods: 5, nh_props: 2, features: 6)\n",
       "Coordinates:\n",
       "  * cells                 (cells) int64 448B 1 2 3 4 5 6 7 ... 51 52 53 54 55 56\n",
       "  * cells_2               (cells_2) int64 448B 1 2 3 4 5 6 ... 51 52 53 54 55 56\n",
       "  * channels              (channels) <U11 220B 'DAPI' 'PAX5' 'CD3' 'CD4' 'CD8'\n",
       "  * features              (features) <U14 336B 'CD4_binarized' ... 'centroid-1'\n",
       "  * la_features           (la_features) object 16B 'labels_0' 'labels_1'\n",
       "  * la_props              (la_props) <U6 48B '_color' '_name'\n",
       "  * labels                (labels) int64 32B 1 2 3 4\n",
       "  * neighborhoods         (neighborhoods) int64 40B 1 2 3 4 5\n",
       "  * nh_props              (nh_props) <U6 48B '_color' '_name'\n",
       "  * x                     (x) int64 808B 1600 1601 1602 1603 ... 1698 1699 1700\n",
       "  * y                     (y) int64 808B 2100 2101 2102 2103 ... 2198 2199 2200\n",
       "Data variables:\n",
       "    _adjacency_matrix     (cells, cells_2) int64 25kB dask.array<chunksize=(56, 56), meta=np.ndarray>\n",
       "    _image                (channels, y, x) uint8 51kB dask.array<chunksize=(5, 101, 101), meta=np.ndarray>\n",
       "    _intensity            (cells, channels) float64 2kB dask.array<chunksize=(56, 5), meta=np.ndarray>\n",
       "    _la_layers            (cells, la_features) object 896B dask.array<chunksize=(56, 2), meta=np.ndarray>\n",
       "    _la_properties        (labels, la_props) object 64B dask.array<chunksize=(4, 2), meta=np.ndarray>\n",
       "    _neighborhoods        (cells, labels) float64 2kB dask.array<chunksize=(56, 4), meta=np.ndarray>\n",
       "    _nh_properties        (neighborhoods, nh_props) <U14 560B dask.array<chunksize=(5, 2), meta=np.ndarray>\n",
       "    _obs                  (cells, features) float64 3kB dask.array<chunksize=(56, 6), meta=np.ndarray>\n",
       "    _percentage_positive  (cells, channels) float64 2kB dask.array<chunksize=(56, 5), meta=np.ndarray>\n",
       "    _segmentation         (y, x) int64 82kB dask.array<chunksize=(101, 101), meta=np.ndarray>
" ], "text/plain": [ " Size: 171kB\n", "Dimensions: (cells: 56, cells_2: 56, channels: 5, y: 101, x: 101,\n", " la_features: 2, labels: 4, la_props: 2,\n", " neighborhoods: 5, nh_props: 2, features: 6)\n", "Coordinates:\n", " * cells (cells) int64 448B 1 2 3 4 5 6 7 ... 51 52 53 54 55 56\n", " * cells_2 (cells_2) int64 448B 1 2 3 4 5 6 ... 51 52 53 54 55 56\n", " * channels (channels) \n", " _image (channels, y, x) uint8 51kB dask.array\n", " _intensity (cells, channels) float64 2kB dask.array\n", " _la_layers (cells, la_features) object 896B dask.array\n", " _la_properties (labels, la_props) object 64B dask.array\n", " _neighborhoods (cells, labels) float64 2kB dask.array\n", " _nh_properties (neighborhoods, nh_props) \n", " _obs (cells, features) float64 3kB dask.array\n", " _percentage_positive (cells, channels) float64 2kB dask.array\n", " _segmentation (y, x) int64 82kB dask.array" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# loading a test file which we will export later\n", "# notice how easy it is to load the file from a zarr using xarray\n", "ds = xr.open_zarr(\"../../tests/test_files/ds_neighborhoods.zarr\")\n", "ds" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exporting to Zarr\n", "This is the easiest file format to work with. It allows you to store and load the xarray objects with a single line of code. It is highly recommended to call `drop_encoding()` before exporting to zarr. There are several open issues linked to encoding problems, and this is the easiest way to circumvent them. For more references, refer to these issues: [issue 1](https://github.com/pydata/xarray/issues/3476), [issue 2](https://github.com/pydata/xarray/issues/9037)." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "zarr_path = \"tmp.zarr\"\n", "\n", "# removing the zarr if it exists\n", "if os.path.exists(zarr_path):\n", " shutil.rmtree(zarr_path)\n", "\n", "# exporting as zarr\n", "ds.drop_encoding().to_zarr(\"tmp.zarr\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exporting Tables to CSV\n", "Let's say you want to export some tables as csvs. This can be done with pandas." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CD4_binarizedCD8_binarized_labels_neighborhoodscentroid-0centroid-1
10.00.0BNeighborhood 12103.7685191607.277778
20.01.0T_toxNeighborhood 12103.8571431630.741071
31.01.0T_hNeighborhood 32104.8370371668.733333
40.01.0T_toxNeighborhood 32101.7500001677.000000
50.01.0BNeighborhood 32104.4160581685.627737
\n", "
" ], "text/plain": [ " CD4_binarized CD8_binarized _labels _neighborhoods centroid-0 \\\n", "1 0.0 0.0 B Neighborhood 1 2103.768519 \n", "2 0.0 1.0 T_tox Neighborhood 1 2103.857143 \n", "3 1.0 1.0 T_h Neighborhood 3 2104.837037 \n", "4 0.0 1.0 T_tox Neighborhood 3 2101.750000 \n", "5 0.0 1.0 B Neighborhood 3 2104.416058 \n", "\n", " centroid-1 \n", "1 1607.277778 \n", "2 1630.741071 \n", "3 1668.733333 \n", "4 1677.000000 \n", "5 1685.627737 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = ds.pp.get_layer_as_df(\"_obs\")\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "tags": [] }, "outputs": [], "source": [ "# exporting as csv\n", "df.to_csv(\"tmp.csv\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exporting to AnnData\n", "AnnData is a format used by scanpy, which can be useful to create interesting plots and downstream analyses. For this reason, you can export the xarray object as an AnnData object. Note that this object will only store the tabular data, but not the image or the segmentation layer." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "AnnData object with n_obs × n_vars = 56 × 5\n", " obs: 'CD4_binarized', 'CD8_binarized', '_labels', '_neighborhoods', 'centroid-0', 'centroid-1'\n", " uns: '_labels_colors', 'label_colors'\n", " obsm: 'spatial'\n", " layers: 'percentage_positive'" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# putting the expression matrix into an anndata object\n", "adata = ds.tl.convert_to_anndata(\n", " expression_matrix_key=\"_intensity\",\n", " additional_layers={\"percentage_positive\": \"_percentage_positive\"},\n", " additional_uns={\"label_colors\": \"_la_properties\"},\n", ")\n", "adata" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "... storing '_labels' as categorical\n", "... storing '_neighborhoods' as categorical\n" ] } ], "source": [ "# writing to disk as hdf5\n", "adata.write(\"tmp.h5ad\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exporting to SpatialData\n", "SpatialData is a data format which is commonly used for spatial omics analysis and combines the power of zarr with anndata. You can export to this data format as well." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/meyerben/meyerben/.conda/envs/tmp_env_3/lib/python3.10/site-packages/dask/dataframe/__init__.py:31: FutureWarning: The legacy Dask DataFrame implementation is deprecated and will be removed in a future version. Set the configuration option `dataframe.query-planning` to `True` or None to enable the new Dask Dataframe implementation and silence this warning.\n", " warnings.warn(\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mINFO \u001b[0m Transposing `data` of type: \u001b[1m<\u001b[0m\u001b[1;95mclass\u001b[0m\u001b[39m \u001b[0m\u001b[32m'dask.array.core.Array'\u001b[0m\u001b[1m>\u001b[0m to \u001b[1m(\u001b[0m\u001b[32m'c'\u001b[0m, \u001b[32m'y'\u001b[0m, \u001b[32m'x'\u001b[0m\u001b[1m)\u001b[0m. \n", "\u001b[34mINFO \u001b[0m Transposing `data` of type: \u001b[1m<\u001b[0m\u001b[1;95mclass\u001b[0m\u001b[39m \u001b[0m\u001b[32m'dask.array.core.Array'\u001b[0m\u001b[1m>\u001b[0m to \u001b[1m(\u001b[0m\u001b[32m'y'\u001b[0m, \u001b[32m'x'\u001b[0m\u001b[1m)\u001b[0m. \n" ] }, { "data": { "text/plain": [ "SpatialData object\n", "├── Images\n", "│ └── 'image': DataArray[cyx] (5, 101, 101)\n", "├── Labels\n", "│ └── 'segmentation': DataArray[yx] (101, 101)\n", "└── Tables\n", " └── 'table': AnnData (56, 5)\n", "with coordinate systems:\n", " ▸ 'global', with elements:\n", " image (Images), segmentation (Labels)" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "spatialdata_object = ds.tl.convert_to_spatialdata(expression_matrix_key=\"_intensity\")\n", "spatialdata_object" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[34mINFO \u001b[0m The Zarr backing store has been changed from \u001b[3;35mNone\u001b[0m the new file path: tmp.zarr \n" ] } ], "source": [ "# storing as zarr file\n", "spatialdata_object.write(\"tmp.zarr\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Importing from Spatialdata\n", "\n", "In the example workflow, you have already seen how to read data from a tiff file. If you already have your data in `spatialdata` format, you can also read it in from there. Reading in the data like this will convert the data from `spatialdata` format to `xarray` format, so that you can use the `xarray` backend of `spatialproteomics`." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "root_attr: multiscales\n", "root_attr: omero\n", "datasets [{'coordinateTransformations': [{'scale': [1.0, 1.0, 1.0], 'type': 'scale'}], 'path': '0'}]\n", "/home/meyerben/meyerben/.conda/envs/tmp_env_3/lib/python3.10/site-packages/zarr/creation.py:614: UserWarning: ignoring keyword argument 'read_only'\n", " compressor, fill_value = _kwargs_compat(compressor, fill_value, kwargs)\n", "resolution: 0\n", " - shape ('c', 'y', 'x') = (3, 768, 1024)\n", " - chunks = ['3', '768', '1024']\n", " - dtype = uint8\n", "root_attr: multiscales\n", "root_attr: omero\n", "Unsupported transform Identity , resetting coordinates for the spatialproteomics object.\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<xarray.Dataset> Size: 9MB\n",
       "Dimensions:        (channels: 3, y: 768, x: 1024, cells: 70, features: 2)\n",
       "Coordinates:\n",
       "  * channels       (channels) int64 24B 0 1 2\n",
       "  * y              (y) int64 6kB 0 1 2 3 4 5 6 7 ... 761 762 763 764 765 766 767\n",
       "  * x              (x) int64 8kB 0 1 2 3 4 5 6 ... 1018 1019 1020 1021 1022 1023\n",
       "  * cells          (cells) int64 560B 1 2 3 4 5 6 7 8 ... 64 65 66 67 68 69 70\n",
       "  * features       (features) <U10 80B 'centroid-0' 'centroid-1'\n",
       "Data variables:\n",
       "    _image         (channels, y, x) uint8 2MB dask.array<chunksize=(3, 768, 1024), meta=np.ndarray>\n",
       "    _segmentation  (y, x) int64 6MB 0 0 0 0 0 0 0 0 ... 69 69 69 69 69 69 69 69\n",
       "    _obs           (cells, features) float64 1kB 44.79 402.5 ... 736.5 890.5
" ], "text/plain": [ " Size: 9MB\n", "Dimensions: (channels: 3, y: 768, x: 1024, cells: 70, features: 2)\n", "Coordinates:\n", " * channels (channels) int64 24B 0 1 2\n", " * y (y) int64 6kB 0 1 2 3 4 5 6 7 ... 761 762 763 764 765 766 767\n", " * x (x) int64 8kB 0 1 2 3 4 5 6 ... 1018 1019 1020 1021 1022 1023\n", " * cells (cells) int64 560B 1 2 3 4 5 6 7 8 ... 64 65 66 67 68 69 70\n", " * features (features) \n", " _segmentation (y, x) int64 6MB 0 0 0 0 0 0 0 0 ... 69 69 69 69 69 69 69 69\n", " _obs (cells, features) float64 1kB 44.79 402.5 ... 736.5 890.5" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds = sp.read_from_spatialdata(\"../../data/spatialdata_example.zarr\", image_key=\"raccoon\")\n", "ds" ] } ], "metadata": { "kernelspec": { "display_name": "tmp_env_3", "language": "python", "name": "tmp_env_3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.0" } }, "nbformat": 4, "nbformat_minor": 4 }