Getting started

To get started with contributing to OpenProblems, you’ll need to fork and clone the OpenProblems repository to your local machine.

Step 1: Create a fork

Go to the OpenProblems repository at https://github.com/openproblems-bio/openproblems-v2 and click on the “Fork” button in the top right corner of the page.

This will create a copy of the repository under your GitHub account.

Tip

While you’re at it, why not click the “Star” button as well?

Step 2: Clone the repository

To clone this forked repository to your local machine, copy the URL of the forked repository by clicking the green “Code” button and selecting HTTPS or SSH.

In your terminal or command prompt, navigate to the directory where you want to clone the repository and enter the following command:

git clone <forked repository URL> openproblems-v2
cd openproblems-v2

This will download a copy of the repository to your local machine. You can now make changes to the code, add new functionality, and commit your changes.

Step 3: Download test resources

You will also need to download the test resources by running the following command. These are needed for testing the existing components and can be used for developing new unit tests. From the repository root, run:

viash run src/common/sync_test_resources/config.vsh.yaml
Output
Completed 1.6 KiB/10.8 MiB (6.7 KiB/s) with 23 file(s) remaining
download: s3://openproblems-data/resources_test/common/task_metadata/method_info.json to ../viash_automount/home/runner/work/website/website/documentation/_openproblems-v2/resources_test/common/task_metadata/method_info.json
Completed 1.6 KiB/10.8 MiB (6.7 KiB/s) with 22 file(s) remaining
Completed 2.2 KiB/10.8 MiB (9.0 KiB/s) with 22 file(s) remaining
download: s3://openproblems-data/resources_test/denoising/pancreas/scores.tsv to ../viash_automount/home/runner/work/website/website/documentation/_openproblems-v2/resources_test/denoising/pancreas/scores.tsv
Completed 2.2 KiB/10.8 MiB (9.0 KiB/s) with 21 file(s) remaining
Completed 5.1 KiB/10.8 MiB (20.0 KiB/s) with 21 file(s) remaining
download: s3://openproblems-data/resources_test/common/task_metadata/input_git_sha.json to ../viash_automount/home/runner/work/website/website/documentation/_openproblems-v2/resources_test/common/task_metadata/input_git_sha.json
Completed 5.1 KiB/10.8 MiB (20.0 KiB/s) with 20 file(s) remaining
Completed 56.6 KiB/10.8 MiB (189.0 KiB/s) with 20 file(s) remaining
download: s3://openproblems-data/resources_test/denoising/pancreas/magic_poisson.h5ad to ../viash_automount/home/runner/work/website/website/documentation/_openproblems-v2/resources_test/denoising/pancreas/magic_poisson.h5ad
Completed 56.6 KiB/10.8 MiB (189.0 KiB/s) with 19 file(s) remaining
Completed 312.6 KiB/10.8 MiB (1019.2 KiB/s) with 19 file(s) remaining
Completed 568.6 KiB/10.8 MiB (1.6 MiB/s) with 19 file(s) remaining   
Completed 610.6 KiB/10.8 MiB (1.7 MiB/s) with 19 file(s) remaining   
download: s3://openproblems-data/resources_test/dimensionality_reduction/pancreas/reduced.h5ad to ../viash_automount/home/runner/work/website/website/documentation/_openproblems-v2/resources_test/dimensionality_reduction/pancreas/reduced.h5ad
Completed 610.6 KiB/10.8 MiB (1.7 MiB/s) with 18 file(s) remaining
Completed 632.7 KiB/10.8 MiB (1.7 MiB/s) with 18 file(s) remaining
download: s3://openproblems-data/resources_test/dimensionality_reduction/pancreas/score.h5ad to ../viash_automount/home/runner/work/website/website/documentation/_openproblems-v2/resources_test/dimensionality_reduction/pancreas/score.h5ad
Completed 632.7 KiB/10.8 MiB (1.7 MiB/s) with 17 file(s) remaining
Completed 888.7 KiB/10.8 MiB (2.3 MiB/s) with 17 file(s) remaining
Completed 1.1 MiB/10.8 MiB (3.0 MiB/s) with 17 file(s) remaining  
Completed 1.3 MiB/10.8 MiB (3.5 MiB/s) with 17 file(s) remaining  
Completed 1.5 MiB/10.8 MiB (4.1 MiB/s) with 17 file(s) remaining  
download: s3://openproblems-data/resources_test/batch_integration/pancreas/bbknn.h5ad to ../viash_automount/home/runner/work/website/website/documentation/_openproblems-v2/resources_test/batch_integration/pancreas/bbknn.h5ad
Completed 1.5 MiB/10.8 MiB (4.1 MiB/s) with 16 file(s) remaining
Completed 1.8 MiB/10.8 MiB (4.8 MiB/s) with 16 file(s) remaining
Completed 2.0 MiB/10.8 MiB (5.3 MiB/s) with 16 file(s) remaining
download: s3://openproblems-data/resources_test/denoising/pancreas/test.h5ad to ../viash_automount/home/runner/work/website/website/documentation/_openproblems-v2/resources_test/denoising/pancreas/test.h5ad
Completed 2.0 MiB/10.8 MiB (5.3 MiB/s) with 15 file(s) remaining
Completed 2.3 MiB/10.8 MiB (5.7 MiB/s) with 15 file(s) remaining
Completed 2.4 MiB/10.8 MiB (5.8 MiB/s) with 15 file(s) remaining
Completed 2.4 MiB/10.8 MiB (5.8 MiB/s) with 15 file(s) remaining
download: s3://openproblems-data/resources_test/dimensionality_reduction/pancreas/scores.tsv to ../viash_automount/home/runner/work/website/website/documentation/_openproblems-v2/resources_test/dimensionality_reduction/pancreas/scores.tsv
Completed 2.4 MiB/10.8 MiB (5.8 MiB/s) with 14 file(s) remaining
download: s3://openproblems-data/resources_test/batch_integration/pancreas/scvi.h5ad to ../viash_automount/home/runner/work/website/website/documentation/_openproblems-v2/resources_test/batch_integration/pancreas/scvi.h5ad
Completed 2.4 MiB/10.8 MiB (5.8 MiB/s) with 13 file(s) remaining
Completed 2.6 MiB/10.8 MiB (6.4 MiB/s) with 13 file(s) remaining
Completed 2.9 MiB/10.8 MiB (6.9 MiB/s) with 13 file(s) remaining
Completed 3.1 MiB/10.8 MiB (7.4 MiB/s) with 13 file(s) remaining
Completed 3.4 MiB/10.8 MiB (8.0 MiB/s) with 13 file(s) remaining
Completed 3.6 MiB/10.8 MiB (8.5 MiB/s) with 13 file(s) remaining
Completed 3.6 MiB/10.8 MiB (8.5 MiB/s) with 13 file(s) remaining
download: s3://openproblems-data/resources_test/batch_integration/pancreas/unintegrated.h5ad to ../viash_automount/home/runner/work/website/website/documentation/_openproblems-v2/resources_test/batch_integration/pancreas/unintegrated.h5ad
Completed 3.6 MiB/10.8 MiB (8.5 MiB/s) with 12 file(s) remaining
Completed 3.9 MiB/10.8 MiB (9.1 MiB/s) with 12 file(s) remaining
Completed 4.0 MiB/10.8 MiB (9.4 MiB/s) with 12 file(s) remaining
download: s3://openproblems-data/resources_test/label_projection/pancreas/knn.h5ad to ../viash_automount/home/runner/work/website/website/documentation/_openproblems-v2/resources_test/label_projection/pancreas/knn.h5ad
Completed 4.0 MiB/10.8 MiB (9.4 MiB/s) with 11 file(s) remaining
Completed 4.3 MiB/10.8 MiB (9.8 MiB/s) with 11 file(s) remaining
Completed 4.5 MiB/10.8 MiB (10.3 MiB/s) with 11 file(s) remaining
Completed 4.8 MiB/10.8 MiB (10.9 MiB/s) with 11 file(s) remaining
Completed 5.0 MiB/10.8 MiB (11.0 MiB/s) with 11 file(s) remaining
download: s3://openproblems-data/resources_test/dimensionality_reduction/pancreas/train.h5ad to ../viash_automount/home/runner/work/website/website/documentation/_openproblems-v2/resources_test/dimensionality_reduction/pancreas/train.h5ad
Completed 5.0 MiB/10.8 MiB (11.0 MiB/s) with 10 file(s) remaining
Completed 5.0 MiB/10.8 MiB (10.9 MiB/s) with 10 file(s) remaining
Completed 5.2 MiB/10.8 MiB (11.5 MiB/s) with 10 file(s) remaining
Completed 5.5 MiB/10.8 MiB (12.0 MiB/s) with 10 file(s) remaining
download: s3://openproblems-data/resources_test/label_projection/pancreas/scores.tsv to ../viash_automount/home/runner/work/website/website/documentation/_openproblems-v2/resources_test/label_projection/pancreas/scores.tsv
Completed 5.5 MiB/10.8 MiB (12.0 MiB/s) with 9 file(s) remaining
Completed 5.7 MiB/10.8 MiB (12.5 MiB/s) with 9 file(s) remaining
Completed 6.0 MiB/10.8 MiB (12.7 MiB/s) with 9 file(s) remaining
Completed 6.2 MiB/10.8 MiB (13.2 MiB/s) with 9 file(s) remaining
Completed 6.4 MiB/10.8 MiB (13.5 MiB/s) with 9 file(s) remaining
download: s3://openproblems-data/resources_test/common/pancreas/dataset.h5ad to ../viash_automount/home/runner/work/website/website/documentation/_openproblems-v2/resources_test/common/pancreas/dataset.h5ad
Completed 6.4 MiB/10.8 MiB (13.5 MiB/s) with 8 file(s) remaining
Completed 6.7 MiB/10.8 MiB (14.0 MiB/s) with 8 file(s) remaining
Completed 6.9 MiB/10.8 MiB (14.4 MiB/s) with 8 file(s) remaining
Completed 7.2 MiB/10.8 MiB (14.9 MiB/s) with 8 file(s) remaining
Completed 7.2 MiB/10.8 MiB (14.9 MiB/s) with 8 file(s) remaining
Completed 7.2 MiB/10.8 MiB (15.0 MiB/s) with 8 file(s) remaining
download: s3://openproblems-data/resources_test/label_projection/pancreas/solution.h5ad to ../viash_automount/home/runner/work/website/website/documentation/_openproblems-v2/resources_test/label_projection/pancreas/solution.h5ad
Completed 7.2 MiB/10.8 MiB (15.0 MiB/s) with 7 file(s) remaining
download: s3://openproblems-data/resources_test/denoising/pancreas/train.h5ad to ../viash_automount/home/runner/work/website/website/documentation/_openproblems-v2/resources_test/denoising/pancreas/train.h5ad
Completed 7.2 MiB/10.8 MiB (15.0 MiB/s) with 6 file(s) remaining
Completed 7.5 MiB/10.8 MiB (15.2 MiB/s) with 6 file(s) remaining
Completed 7.7 MiB/10.8 MiB (15.5 MiB/s) with 6 file(s) remaining
Completed 7.9 MiB/10.8 MiB (15.9 MiB/s) with 6 file(s) remaining
download: s3://openproblems-data/resources_test/denoising/pancreas/magic.h5ad to ../viash_automount/home/runner/work/website/website/documentation/_openproblems-v2/resources_test/denoising/pancreas/magic.h5ad
Completed 7.9 MiB/10.8 MiB (15.9 MiB/s) with 5 file(s) remaining
Completed 8.1 MiB/10.8 MiB (16.2 MiB/s) with 5 file(s) remaining
Completed 8.4 MiB/10.8 MiB (16.6 MiB/s) with 5 file(s) remaining
Completed 8.6 MiB/10.8 MiB (17.1 MiB/s) with 5 file(s) remaining
Completed 8.8 MiB/10.8 MiB (17.4 MiB/s) with 5 file(s) remaining
download: s3://openproblems-data/resources_test/label_projection/pancreas/knn_accuracy.h5ad to ../viash_automount/home/runner/work/website/website/documentation/_openproblems-v2/resources_test/label_projection/pancreas/knn_accuracy.h5ad
Completed 8.8 MiB/10.8 MiB (17.4 MiB/s) with 4 file(s) remaining
Completed 8.9 MiB/10.8 MiB (17.5 MiB/s) with 4 file(s) remaining
download: s3://openproblems-data/resources_test/label_projection/pancreas/train.h5ad to ../viash_automount/home/runner/work/website/website/documentation/_openproblems-v2/resources_test/label_projection/pancreas/train.h5ad
Completed 8.9 MiB/10.8 MiB (17.5 MiB/s) with 3 file(s) remaining
Completed 9.1 MiB/10.8 MiB (17.6 MiB/s) with 3 file(s) remaining
Completed 9.2 MiB/10.8 MiB (17.7 MiB/s) with 3 file(s) remaining
download: s3://openproblems-data/resources_test/label_projection/pancreas/test.h5ad to ../viash_automount/home/runner/work/website/website/documentation/_openproblems-v2/resources_test/label_projection/pancreas/test.h5ad
Completed 9.2 MiB/10.8 MiB (17.7 MiB/s) with 2 file(s) remaining
Completed 9.4 MiB/10.8 MiB (17.7 MiB/s) with 2 file(s) remaining
Completed 9.6 MiB/10.8 MiB (18.1 MiB/s) with 2 file(s) remaining
download: s3://openproblems-data/resources_test/dimensionality_reduction/pancreas/test.h5ad to ../viash_automount/home/runner/work/website/website/documentation/_openproblems-v2/resources_test/dimensionality_reduction/pancreas/test.h5ad
Completed 9.6 MiB/10.8 MiB (18.1 MiB/s) with 1 file(s) remaining
Completed 9.9 MiB/10.8 MiB (14.1 MiB/s) with 1 file(s) remaining
Completed 10.1 MiB/10.8 MiB (11.8 MiB/s) with 1 file(s) remaining
Completed 10.4 MiB/10.8 MiB (10.2 MiB/s) with 1 file(s) remaining
Completed 10.6 MiB/10.8 MiB (9.4 MiB/s) with 1 file(s) remaining 
Completed 10.8 MiB/10.8 MiB (8.9 MiB/s) with 1 file(s) remaining 
download: s3://openproblems-data/resources_test/batch_integration/pancreas/combat.h5ad to ../viash_automount/home/runner/work/website/website/documentation/_openproblems-v2/resources_test/batch_integration/pancreas/combat.h5ad

The test resources are stored in the resources_test directory.

Ready, set, go!

That’s it! Now you should be able to test whether the existing components work as expected and then start adding functionality to the pipeline. Running an existing component is as simple as running a command in your terminal. Using test data as input, you can try this out immediately.

Run a component

Use the viash run command to run a Viash component. Everything after the -- separator counts as the arguments of the component itself. In this case, the knn component has an --input_train and --input_test argument to which the test resources are passed.

viash run src/label_projection/methods/knn/config.vsh.yaml -- \
  --input_train resources_test/label_projection/pancreas/train.h5ad \
  --input_test resources_test/label_projection/pancreas/test.h5ad \
  --output output.h5ad
Output
[notice] Checking if Docker image is available at 'ghcr.io/openproblems-bio/label_projection/methods/knn:dev'
[warning] Could not pull from 'ghcr.io/openproblems-bio/label_projection/methods/knn:dev'. Docker image doesn't exist or is not accessible.
[notice] Building container 'ghcr.io/openproblems-bio/label_projection/methods/knn:dev' with Dockerfile
Load input data
Fit to train data
Predict on test data
Write output to file

Testing components

Testing components is an important part of the development process. Each tasks comes with pre-defined unit tests that can be run using the viash test command.

viash test src/label_projection/methods/knn/config.vsh.yaml
Output
Running tests in temporary directory: '/tmp/viash_test_knn16356259405642263058'
====================================================================
+/tmp/viash_test_knn16356259405642263058/build_executable/knn ---verbosity 6 ---setup cachedbuild
[notice] Building container 'ghcr.io/openproblems-bio/label_projection/methods/knn:test' with Dockerfile
[info] Running 'docker build -t ghcr.io/openproblems-bio/label_projection/methods/knn:test /tmp/viash_test_knn16356259405642263058/build_executable -f /tmp/viash_test_knn16356259405642263058/build_executable/tmp/dockerbuild-knn-be4xrF/Dockerfile'
Sending build context to Docker daemon  39.94kB

Step 1/7 : FROM python:3.10
 ---> fc98d03e6037
Step 2/7 : RUN pip install --upgrade pip &&   pip install --upgrade --no-cache-dir "scikit-learn" "pyyaml" "anndata~=0.8.0"
 ---> Using cache
 ---> 1d35b64eb218
Step 3/7 : LABEL org.opencontainers.image.description="Companion container for running component label_projection/methods knn"
 ---> Using cache
 ---> f9833a51c1bc
Step 4/7 : LABEL org.opencontainers.image.created="2023-05-06T00:04:17Z"
 ---> Running in d9f843ad40bd
Removing intermediate container d9f843ad40bd
 ---> b047c55d46cf
Step 5/7 : LABEL org.opencontainers.image.source="https://github.com/openproblems-bio/openproblems-v2"
 ---> Running in c31e8d72c245
Removing intermediate container c31e8d72c245
 ---> 418d40122b17
Step 6/7 : LABEL org.opencontainers.image.revision="9438b8ad0cdd9cd2ed3ba6a01d0b4f075c059d64"
 ---> Running in 13df3cda6465
Removing intermediate container 13df3cda6465
 ---> 06f6ccf0e431
Step 7/7 : LABEL org.opencontainers.image.version="test"
 ---> Running in 990f0999147d
Removing intermediate container 990f0999147d
 ---> 5fb76e36ee43
Successfully built 5fb76e36ee43
Successfully tagged ghcr.io/openproblems-bio/label_projection/methods/knn:test
====================================================================
+/tmp/viash_test_knn16356259405642263058/test_check_method_config/test_executable
Load config data
Check general fields
Check info fields
All checks succeeded!
====================================================================
+/tmp/viash_test_knn16356259405642263058/test_run_and_check_adata/test_executable
>> Checking whether input files exist
>> Running script as test
Load input data
Fit to train data
Predict on test data
Write output to file
>> Checking whether output file exists
>> Reading h5ad files and checking formats
Reading and checking input_train
  AnnData object with n_obs × n_vars = 346 × 419
    obs: 'label', 'batch'
    var: 'hvg', 'hvg_score'
    uns: 'dataset_id', 'normalization_id'
    obsm: 'X_pca'
    layers: 'counts', 'normalized'
Reading and checking input_test
  AnnData object with n_obs × n_vars = 154 × 419
    obs: 'batch'
    var: 'hvg', 'hvg_score'
    uns: 'dataset_id', 'normalization_id'
    obsm: 'X_pca'
    layers: 'counts', 'normalized'
Reading and checking output
  AnnData object with n_obs × n_vars = 154 × 419
    obs: 'batch', 'label_pred'
    var: 'hvg', 'hvg_score'
    uns: 'dataset_id', 'method_id', 'normalization_id'
    obsm: 'X_pca'
    layers: 'counts', 'normalized'
All checks succeeded!
====================================================================
SUCCESS! All 2 out of 2 test scripts succeeded!
Cleaning up temporary directory

If you want to run the unit tests for all of the components of a task, you can use the viash ns test command.

viash ns test --query label_projection --parallel --platform docker
Output
             namespace        functionality             platform            test_name exit_code duration               result
label_projection/methods  logistic_regression               docker                start                                        
label_projection/methods               scanvi               docker                start                                        
label_projection/methods                  knn               docker                start                                        
label_projection/methods                  mlp               docker                start                                        
label_projection/metrics             accuracy               docker                start                                        
label_projection/metrics                   f1               docker                start
label_projection/methods  logistic_regression               docker     build_executable         0        4              SUCCESS
label_projection/methods  logistic_regression               docker      generic_test.py         0        9              SUCCESS
label_projection/metrics                   f1               docker     build_executable         0        7              SUCCESS
label_projection/metrics                   f1               docker      format_check.py         0        8              SUCCESS
label_projection/metrics             accuracy               docker     build_executable         0        8              SUCCESS
label_projection/metrics             accuracy               docker      format_check.py         0        7              SUCCESS
...