viash run src/common/create_component/config.vsh.yaml -- \
--task label_projection \
--type control_method \
--name my_method_py \
--language python
Add a baseline method
A baseline method (or control method) is used to test the relative performance of all other methods, and also as a quality control for the pipeline as a whole. A baseline method can either be a positive control or a negative control. The positive control and negative control methods set a maximum and minimum threshold for performance, so any new method should perform better than the negative control methods and worse than the positive control method.
This guide will show you how to create a new Viash component. In the following we will show examples for both Python and R. Note that the Label Projection task is used throughout the guide, so make sure to replace any occurrences of "label_projection"
with your task of interest.
Make sure you have followed the “Getting started” guide.
Step 1: Create a new component
Use the create_component
component to start creating a new baseline method.
This creates a new folder at src/label_projection/control_methods/my_method_py
containing a Viash config and a script.
tree src/label_projection/control_methods/my_method_py
├── script.py Script for running the method.
├── config.vsh.yaml Config file for method.
└── ... Optional additional resources.
viash run src/common/create_component/config.vsh.yaml -- \
--task label_projection \
--type control_method \
--name my_method_r \
--language r
This creates a new folder at src/label_projection/control_methods/my_method_r
containing a Viash config and a script.
tree src/label_projection/control_methods/my_method_r
├── script.R Script for running the method.
├── config.vsh.yaml Config file for method.
└── ... Optional additional resources.
- A config file contains metadata of the component and the dependencies required to run it. In steps 2 and 3 we will fill in the required information.
- A script contains the code to run the method. In step 4 we will edit the script.
Use the command viash run src/common/create_component/config.vsh.yaml -- --help
to get information on all of the parameters if the create_component
component.
Step 2: Fill in metadata
The Viash config contains metadata of your method, which script is used to run it, and the required dependencies.
Generated config file
This is what the config.vsh.yaml
generated by the create_component
component looks like:
Contents of config.vsh.yaml
# The API specifies which type of component this is.
# It contains specifications for:
# - The input/output files
# - Common parameters
# - A unit test
__merge__: ../../api/comp_control_method.yaml
functionality:
name: my_method_py
# Metadata for your component (required)
info:
pretty_name: My Method Py
summary: 'FILL IN: A one sentence summary of this method.'
description: 'FILL IN: A (multiline) description of how this method works.'
preferred_normalization: log_cpm
# Component-specific parameters (optional)
# arguments:
# - name: "--n_neighbors"
# type: "integer"
# default: 5
# description: Number of neighbors to use.
# Resources required to run the component
resources:
# The script of your component
- type: python_script
path: script.py
platforms:
- type: docker
image: python:3.10
# Add custom dependencies here
setup:
- type: python
pypi: anndata~=0.8.0
- type: nextflow
directives:
label: [midmem, midcpu]
Contents of config.vsh.yaml
# The API specifies which type of component this is.
# It contains specifications for:
# - The input/output files
# - Common parameters
# - A unit test
__merge__: ../../api/comp_control_method.yaml
functionality:
name: my_method_r
# Metadata for your component (required)
info:
pretty_name: My Method R
summary: 'FILL IN: A one sentence summary of this method.'
description: 'FILL IN: A (multiline) description of how this method works.'
preferred_normalization: log_cpm
# Component-specific parameters (optional)
# arguments:
# - name: "--n_neighbors"
# type: "integer"
# default: 5
# description: Number of neighbors to use.
# Resources required to run the component
resources:
# The script of your component
- type: r_script
path: script.R
platforms:
- type: docker
image: eddelbuettel/r2u:22.04
# Add custom dependencies here
setup:
- type: apt
packages:
- libhdf5-dev
- libgeos-dev
- python3
- python3-pip
- python3-dev
- python-is-python3
- type: python
pypi: anndata~=0.8.0
- type: r
cran: anndata
- type: nextflow
directives:
label: [midmem, midcpu]
Required metadata fields
Please edit functionality.info
section in the config file to fill in the necessary metadata.
functionality.name
A unique identifier for the method. Must be written in snake case. Example: my_new_method
.
functionality.info.pretty_name
A label for the method used for visualisations and documentation. Example: "My new method"
.
functionality.info.subtype
Whether the method is a "positive_control"
or a "negative_control"
.
functionality.info.summary
A one sentence summary of the method. Used for creating short overviews of the components in a task.
functionality.info.description
An explanation for how the method works. Used for creating reference documentation of a task.
functionality.info.preferred_normalization
Which normalization method a component prefers. Possible values are l1_sqrt
, log_cpm
, log_scran_pooling
, sqrt_cpm
. Each value corresponds to a normalization component in the directory src/datasets/normalization
.
__merge__
The file specified in this field contains information regarding the input and output arguments of the component, as well as a unit test to ensure that the component is functioning properly. Normally you don’t need to change this if you gave the right arguments to the create_component
component.
Step 3: Add dependencies
Each component has it’s own set of dependencies, because different components might have conflicting dependencies.
In the platforms section of the config file update the setup
definition that describes the packages that need to be installed in the Docker image and are required for your method to run. Note that both anndata~=0.8.0
and pyyaml
are necessary Python package dependencies.
Please check out this guide for more information on how to add extra package dependencies.
Tip: After making changes to the components dependencies, you will need to rebuild the docker container as follows:
viash run src/label_projection/control_methods/my_method_py/config.vsh.yaml -- \
---setup cachedbuild
[notice] Building container 'ghcr.io/openproblems-bio/label_projection/control_methods/my_method_py:dev' with Dockerfile
output
[notice] Building container 'ghcr.io/openproblems-bio/label_projection/control_methods/my_method_py:dev' with Dockerfile
Step 4: Edit script
A component’s script typically has five sections:
- Imports and libraries
- Argument values
- Read input data
- Generate results
- Write output data to file
Generated script
This is what the script generated by the create_component
component looks like:
Contents of script.py
import anndata as ad
## VIASH START
= {
par 'input_train': 'resources_test/label_projection/pancreas/train.h5ad',
'input_test': 'resources_test/label_projection/pancreas/test.h5ad',
'input_solution': 'resources_test/label_projection/pancreas/solution.h5ad',
'output': 'output.h5ad'
}= {
meta 'functionality_name': 'my_method_py'
}## VIASH END
print('Reading input files', flush=True)
= ad.read_h5ad(par['input_train'])
input_train = ad.read_h5ad(par['input_test'])
input_test = ad.read_h5ad(par['input_solution'])
input_solution
print('Preprocess data', flush=True)
# ... preprocessing ...
print('Train model', flush=True)
# ... train model ...
print('Generate predictions', flush=True)
# ... generate predictions ...
print("Write output AnnData to file", flush=True)
= ad.AnnData(
output ={
obs'label_pred': obs_label_pred
},={
uns'dataset_id': input_train.uns['dataset_id'],
'normalization_id': input_train.uns['normalization_id'],
'method_id': meta['functionality_name']
}
)'output'], compression='gzip') output.write_h5ad(par[
Contents of script.R
library(anndata)
## VIASH START
<- list(
par input_train = "resources_test/label_projection/pancreas/train.h5ad",
input_test = "resources_test/label_projection/pancreas/test.h5ad",
input_solution = "resources_test/label_projection/pancreas/solution.h5ad",
output = "output.h5ad"
)<- list(
meta functionality_name = "my_method_r"
)## VIASH END
cat("Reading input files\n")
<- anndata::read_h5ad(par[["input_train"]])
input_train <- anndata::read_h5ad(par[["input_test"]])
input_test <- anndata::read_h5ad(par[["input_solution"]])
input_solution
cat("Preprocess data\n")
# ... preprocessing ...
cat("Train model\n")
# ... train model ...
cat("Generate predictions\n")
# ... generate predictions ...
cat("Write output AnnData to file\n")
<- anndata::AnnData(
output obs = list(
label_pred = obs_label_pred
),uns = list(
dataset_id = input_train$uns[["dataset_id"]],
normalization_id = input_train$uns[["normalization_id"]],
method_id = meta[["functionality_name"]]
)
)$write_h5ad(par[["output"]], compression = "gzip") output
Required sections
Imports and libraries
In the top section of the script you can define which packages/libraries the method needs. If you add a new or different package add the dependency to config.vsh.yaml
in the setup
field (see above).
Argument block
The Viash code block is designed to facilitate prototyping, by enabling you to execute directly by running python script.py
(or Rscript script.R
for R users). Note that anything between “VIASH START” and “VIASH END” will be removed and replaced with a CLI argument parser when the components are being built by Viash.
Here, the par
dictionary contains all the arguments
defined in the config.vsh.yaml
file (including those from the defined __merge__
file). When adding a argument
in the par
dict also add it to the config.vsh.yaml
in the arguments
section.
Read input data
This section reads any input AnnData files passed to the component.
Generate results
This is the most important section of your script, as it defines the core functionality provided by the component. It processes the input data to create results for the particular task at hand.
Write output data to file
The output stored in a AnnData object and then written to an .h5ad
file. The format is specified by the API file specified in the __merge__
field in the config file.
Step 5: Try component
Your component’s API file contains the necessary unit tests to check whether your component works and the output is in the correct format.
You can test your component by using the following command:
viash test src/label_projection/control_methods/my_method_py/config.vsh.yaml
Output
Running tests in temporary directory: '/tmp/viash_test_majority_vote6584456763614973106'
====================================================================
+/tmp/viash_test_majority_vote6584456763614973106/build_executable/majority_vote ---verbosity 6 ---setup cachedbuild
[notice] Building container 'ghcr.io/openproblems-bio/label_projection/control_methods/majority_vote:test' with Dockerfile
[info] Running 'docker build -t ghcr.io/openproblems-bio/label_projection/control_methods/majority_vote:test /tmp/viash_test_majority_vote6584456763614973106/build_executable -f /tmp/viash_test_majority_vote6584456763614973106/build_executable/tmp/dockerbuild-majority_vote-9ToMtz/Dockerfile'
Sending build context to Docker daemon 41.98kB
Step 1/7 : FROM python:3.10
---> fc98d03e6037
Step 2/7 : RUN pip install --upgrade pip && pip install --upgrade --no-cache-dir "anndata~=0.8.0" "pyyaml"
---> Running in 340965c0af9d
Requirement already satisfied: pip in /usr/local/lib/python3.10/site-packages (23.0.1)
Collecting pip
Downloading pip-23.1.2-py3-none-any.whl (2.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 25.3 MB/s eta 0:00:00
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 23.0.1
Uninstalling pip-23.0.1:
Successfully uninstalled pip-23.0.1
Successfully installed pip-23.1.2
[91mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
[0mCollecting anndata~=0.8.0
Downloading anndata-0.8.0-py3-none-any.whl (96 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 96.1/96.1 kB 5.1 MB/s eta 0:00:00
Collecting pyyaml
Downloading PyYAML-6.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (682 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 682.2/682.2 kB 23.8 MB/s eta 0:00:00
Collecting pandas>=1.1.1 (from anndata~=0.8.0)
Downloading pandas-2.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.3/12.3 MB 199.1 MB/s eta 0:00:00
Collecting numpy>=1.16.5 (from anndata~=0.8.0)
Downloading numpy-1.24.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.3/17.3 MB 227.1 MB/s eta 0:00:00
Collecting scipy>1.4 (from anndata~=0.8.0)
Downloading scipy-1.10.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (34.4 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 34.4/34.4 MB 228.2 MB/s eta 0:00:00
Collecting h5py>=3 (from anndata~=0.8.0)
Downloading h5py-3.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.6/4.6 MB 208.5 MB/s eta 0:00:00
Collecting natsort (from anndata~=0.8.0)
Downloading natsort-8.3.1-py3-none-any.whl (38 kB)
Collecting packaging>=20 (from anndata~=0.8.0)
Downloading packaging-23.1-py3-none-any.whl (48 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 48.9/48.9 kB 189.3 MB/s eta 0:00:00
Collecting python-dateutil>=2.8.2 (from pandas>=1.1.1->anndata~=0.8.0)
Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 247.7/247.7 kB 291.5 MB/s eta 0:00:00
Collecting pytz>=2020.1 (from pandas>=1.1.1->anndata~=0.8.0)
Downloading pytz-2023.3-py2.py3-none-any.whl (502 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 502.3/502.3 kB 309.6 MB/s eta 0:00:00
Collecting tzdata>=2022.1 (from pandas>=1.1.1->anndata~=0.8.0)
Downloading tzdata-2023.3-py2.py3-none-any.whl (341 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 341.8/341.8 kB 305.2 MB/s eta 0:00:00
Collecting six>=1.5 (from python-dateutil>=2.8.2->pandas>=1.1.1->anndata~=0.8.0)
Downloading six-1.16.0-py2.py3-none-any.whl (11 kB)
Installing collected packages: pytz, tzdata, six, pyyaml, packaging, numpy, natsort, scipy, python-dateutil, h5py, pandas, anndata
Successfully installed anndata-0.8.0 h5py-3.8.0 natsort-8.3.1 numpy-1.24.3 packaging-23.1 pandas-2.0.1 python-dateutil-2.8.2 pytz-2023.3 pyyaml-6.0 scipy-1.10.1 six-1.16.0 tzdata-2023.3
[91mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
[0mRemoving intermediate container 340965c0af9d
---> 4e5a35173a10
Step 3/7 : LABEL org.opencontainers.image.description="Companion container for running component label_projection/control_methods majority_vote"
---> Running in f64dd24cb407
Removing intermediate container f64dd24cb407
---> ab2d077ee88e
Step 4/7 : LABEL org.opencontainers.image.created="2023-05-06T00:07:29Z"
---> Running in d14053f986a0
Removing intermediate container d14053f986a0
---> 5386e2356d45
Step 5/7 : LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems-v2"
---> Running in 8dd70ff5fa24
Removing intermediate container 8dd70ff5fa24
---> 37f45798010e
Step 6/7 : LABEL org.opencontainers.image.revision="9438b8ad0cdd9cd2ed3ba6a01d0b4f075c059d64"
---> Running in 44c8eb828f1f
Removing intermediate container 44c8eb828f1f
---> ad7a3108d0b8
Step 7/7 : LABEL org.opencontainers.image.version="test"
---> Running in 9947336b8897
Removing intermediate container 9947336b8897
---> 223d4aff497b
Successfully built 223d4aff497b
Successfully tagged ghcr.io/openproblems-bio/label_projection/control_methods/majority_vote:test
====================================================================
+/tmp/viash_test_majority_vote6584456763614973106/test_check_method_config/test_executable
Load config data
Check general fields
Check info fields
All checks succeeded!
====================================================================
+/tmp/viash_test_majority_vote6584456763614973106/test_run_and_check_adata/test_executable
>> Checking whether input files exist
>> Running script as test
Load data
Compute majority vote
Create prediction object
Write output to file
>> Checking whether output file exists
>> Reading h5ad files and checking formats
Reading and checking input_train
AnnData object with n_obs × n_vars = 346 × 419
obs: 'label', 'batch'
var: 'hvg', 'hvg_score'
uns: 'dataset_id', 'normalization_id'
obsm: 'X_pca'
layers: 'counts', 'normalized'
Reading and checking input_test
AnnData object with n_obs × n_vars = 154 × 419
obs: 'batch'
var: 'hvg', 'hvg_score'
uns: 'dataset_id', 'normalization_id'
obsm: 'X_pca'
layers: 'counts', 'normalized'
Reading and checking input_solution
AnnData object with n_obs × n_vars = 154 × 419
obs: 'label', 'batch'
var: 'hvg', 'hvg_score'
uns: 'dataset_id', 'normalization_id'
obsm: 'X_pca'
layers: 'counts', 'normalized'
Reading and checking output
AnnData object with n_obs × n_vars = 154 × 419
obs: 'batch', 'label_pred'
var: 'hvg', 'hvg_score'
uns: 'dataset_id', 'method_id', 'normalization_id'
obsm: 'X_pca'
layers: 'counts', 'normalized'
All checks succeeded!
====================================================================
[32mSUCCESS! All 2 out of 2 test scripts succeeded![0m
Cleaning up temporary directory
Visit “Run tests” for more information on running unit tests and how to interpret common error messages.
You can also run your component on local files using the viash run
command. For example:
viash run src/label_projection/control_methods/my_method_py/config.vsh.yaml -- \
--input_train resources_test/label_projection/pancreas/train.h5ad \
--input_test resources_test/label_projection/pancreas/test.h5ad \
--output output.h5ad
Next steps
If your component works, please create a pull request.