Getting started

Before writing any code, it's essential to carefully plan and document your task to ensure its relevance, feasibility, and compatibility with the OpenProblems framework.

Step 1: Check whether a similar task already exists

Please check the OpenProblems organization and task issue list to see whether a similar task has already been created. Tasks are created as separate repositories within the OpenProblems organization, and you can search for them using the Repository search bar and look for task.

Step 2: Reach out to the OpenProblems team

If not, reach out to the OpenProblems team by creating a new task proposal. This can be done by creating a new issue in the OpenProblems repository using the new task proposal template. You can also reach out through Discord.

This collaborative process will help ensure that your task is well-defined, relevant, and compatible with the OpenProblems framework. Additionally, it informs others of your ongoing work in this area and establishes a foundation for potential collaboration. Check out some examples by filtering on the 'task' label on GitHub.

Here's the GitHub markdown version:

Task motivation

Explain the motivation behind your proposed task. Describe the biological or computational problem you aim to address and why it's important. Discuss the current state of research in this area and any gaps or challenges that your task could help address. This section should convince readers of the significance and relevance of your task.

Task description

Provide a clear and concise description of your task, detailing the specific problem it aims to solve. Outline the input data types, the expected output, and any assumptions or constraints. Be sure to explain any terminology or concepts that are essential for understanding the task.

Proposed ground-truth in datasets

Describe the datasets you plan to use for your task. OpenProblems offers a standard set of datasets (See "Common datasets") which you can peruse through. Explain how these datasets will provide the ground-truth for evaluating the methods implemented in your task. If possible, include references or links to the datasets to facilitate reproducibility.

Initial set of methods to implement

List the initial set of methods you plan to implement for your task. Briefly describe each method's core ideas and algorithms, and explain why you think they are suitable for your task. Consider including both established and cutting-edge methods to provide a comprehensive benchmarking of the state-of-the-art.

Proposed control methods

Outline the control methods you propose for your task. These methods serve as a starting point to test the relative accuracy of new methods in the task and as quality control for the defined metrics. Include both positive controls, which are methods with known outcomes resulting in the best possible metric values, and negative controls, which are simple, naive, or random methods that do not rely on sophisticated techniques or domain knowledge. Explain the rationale for your chosen controls.

Proposed Metrics

Describe the metrics you propose for evaluating the performance of methods in your task. Explain the rationale for selecting these metrics and how they will accurately assess the methods' success in addressing the task's challenges. Consider including multiple metrics to capture different aspects of method performance.

Important

Next steps are only applicable if you have access to the OpenProblems organization.

Step 3: Create task repository

Upon approval of the task proposal, a new task repository within the OpenProblems-bio organization will be created using the OpenProblems template. If it is not yet created and you have access to the organization, you can create it yourself using the instructions below.

Click the "Use this template" button on the top right of the repository.
Use the Owner dropdown menu to select the openproblems-bio account.
Type a name for your repository (task_...), and a description.
Set the repository visibility to public.
Click "Create repository from template".

Step 4: Clone the task repository

To clone this forked repository to your local machine, copy the URL of the forked repository by clicking the green "Code" button and selecting HTTPS or SSH.

In your terminal or command prompt, navigate to the directory where you want to clone the repository and enter the following command:

 git clone --recursive <forked repository URL>
cd <repository name>

Note

If somehow there are no files visible in the common submodule after cloning using the above command. Use the following command to update the submodule:

 git submodule update --init --recursive

Step 5: Testing the repository

The task_template repository is a working task repository that contains all the necessary components to get started with developing a new task. Start by testing the existing components to ensure that everything is working as expected.

You will need to download the test resources by running the following command. These are needed for testing the existing components and can be used for developing new unit tests. From the repository root, run:

 scripts/sync_resources.sh

Output

 #| echo: false
rm -r resources_test
scripts/sync_resources.sh

The test resources are stored in the resources_test directory.

That's it! Now you should be able to test whether the existing components work as expected and then start adding functionality to the pipeline. Running an existing component is as simple as running a command in your terminal. Using test data as input, you can try this out immediately.

Run a component

Use the viash run command to run a Viash component. Everything after the -- separator counts as the arguments of the component itself. In this case, the knn component has an --input_train and --input_test argument to which the test resources are passed.

 viash run src/methods/logistic_regression/config.vsh.yaml -- \
  --input_train resources_test/task_template/cxg_mouse_pancreas_atlas/train.h5ad \
  --input_test resources_test/task_template/cxg_mouse_pancreas_atlas/test.h5ad \
  --output output.h5ad

Output

 #| echo: false
viash run src/methods/logistic_regression/config.vsh.yaml -- \
  --input_train resources_test/task_template/cxg_mouse_pancreas_atlas/train.h5ad \
  --input_test resources_test/task_template/cxg_mouse_pancreas_atlas/test.h5ad \
  --output output.h5ad

Testing components

Testing components is an important part of the development process. Each tasks comes with pre-defined unit tests that can be run using the viash test command.

 viash test src/methods/logistic_regression/config.vsh.yaml

Output

 #| echo: false
viash test src/methods/logistic_regression/config.vsh.yaml

If you want to run the unit tests for all of the components of a task, you can use the viash ns test command.

 viash ns test --parallel --engine docker

Output

namespace name runner engine test_name exit_code duration result control_methods true_labels executable docker start metrics accuracy executable docker start data_processors process_dataset executable docker start methods logistic_regression executable docker start data_processors process_dataset executable docker build_executable 0 2 SUCCESS data_processors process_dataset executable docker run_and_check_output.py 0 4 SUCCESS control_methods true_labels executable docker build_executable 0 2 SUCCESS control_methods true_labels executable docker run_and_check_output.py 0 4 SUCCESS control_methods true_labels executable docker check_config.py 0 3 SUCCESS metrics accuracy executable docker build_executable 0 2 SUCCESS metrics accuracy executable docker run_and_check_output.py 0 5 SUCCESS metrics accuracy executable docker check_config.py 0 2 SUCCESS methods logistic_regression executable docker build_executable 0 2 SUCCESS methods logistic_regression executable docker run_and_check_output.py 0 5 SUCCESS methods logistic_regression executable docker check_config.py 0 3 SUCCESS All 11 configs built and tested successfully

Step 6: Update the task config file

Now update the _viash.yaml file in the root of the task repository with the following information:

Update the name field to the name of the task in snake_case the name should start with task_.
Add a keyword to the keywords field that describes the task.
Update the task_template to the name of the task from step 1.
Update the label, summary, description and reference fields.
Replace the task_template to the name of the task (this will not yet be of relevance at the start of a task).
Update the authors field with the authors of the task.
Remove all of the comments of the steps you completed.
High five yourself!

Ready, set, go!

You are now well-equipped to begin designing the API for the new benchmarking task.

Docs Navigation