- Documentation >
- Create a new task >
- Getting started
Table of contents
Getting started
Before writing any code, it's essential to carefully plan and document your task to ensure its relevance, feasibility, and compatibility with the OpenProblems framework.
Step 1: Check whether a similar task already exists
Please check the OpenProblems organization and task issue list to see whether a similar task has already been created. Tasks are created as separate repositories within the OpenProblems organization, and you can search for them using the Repository search bar and look for task
.
Step 2: Reach out to the OpenProblems team
If not, reach out to the OpenProblems team by creating a new task proposal. This can be done by creating a new issue in the OpenProblems repository using the new task proposal template. You can also reach out through Discord.
This collaborative process will help ensure that your task is well-defined, relevant, and compatible with the OpenProblems framework. Additionally, it informs others of your ongoing work in this area and establishes a foundation for potential collaboration. Check out some examples by filtering on the 'task' label on GitHub.
Here's the GitHub markdown version:
Task motivation
Explain the motivation behind your proposed task. Describe the biological or computational problem you aim to address and why it's important. Discuss the current state of research in this area and any gaps or challenges that your task could help address. This section should convince readers of the significance and relevance of your task.
Task description
Provide a clear and concise description of your task, detailing the specific problem it aims to solve. Outline the input data types, the expected output, and any assumptions or constraints. Be sure to explain any terminology or concepts that are essential for understanding the task.
Proposed ground-truth in datasets
Describe the datasets you plan to use for your task. OpenProblems offers a standard set of datasets (See "Common datasets") which you can peruse through. Explain how these datasets will provide the ground-truth for evaluating the methods implemented in your task. If possible, include references or links to the datasets to facilitate reproducibility.
Initial set of methods to implement
List the initial set of methods you plan to implement for your task. Briefly describe each method's core ideas and algorithms, and explain why you think they are suitable for your task. Consider including both established and cutting-edge methods to provide a comprehensive benchmarking of the state-of-the-art.
Proposed control methods
Outline the control methods you propose for your task. These methods serve as a starting point to test the relative accuracy of new methods in the task and as quality control for the defined metrics. Include both positive controls, which are methods with known outcomes resulting in the best possible metric values, and negative controls, which are simple, naive, or random methods that do not rely on sophisticated techniques or domain knowledge. Explain the rationale for your chosen controls.
Proposed Metrics
Describe the metrics you propose for evaluating the performance of methods in your task. Explain the rationale for selecting these metrics and how they will accurately assess the methods' success in addressing the task's challenges. Consider including multiple metrics to capture different aspects of method performance.
Next steps are only applicable if you have access to the OpenProblems organization.
Step 3: Create task repository
Upon approval of the task proposal, a new task repository within the OpenProblems-bio organization will be created using the OpenProblems template. If it is not yet created and you have access to the organization, you can create it yourself using the instructions below.
- Click the "Use this template" button on the top right of the repository.
- Use the Owner dropdown menu to select the
openproblems-bio
account. - Type a name for your repository (task_...), and a description.
- Set the repository visibility to public.
- Click "Create repository from template".
Step 4: Clone the task repository
To clone this forked repository to your local machine, copy the URL of the forked repository by clicking the green "Code" button and selecting HTTPS or SSH.
In your terminal or command prompt, navigate to the directory where you want to clone the repository and enter the following command:
git clone --recursive <forked repository URL>
cd <repository name>
If somehow there are no files visible in the common
submodule after cloning using the above command. Use the following command to update the submodule:
git submodule update --init --recursive
Step 5: Testing the repository
The task_template repository is a working task repository that contains all the necessary components to get started with developing a new task. Start by testing the existing components to ensure that everything is working as expected.
You will need to download the test resources by running the following command. These are needed for testing the existing components and can be used for developing new unit tests. From the repository root, run:
scripts/sync_resources.sh
Output
#| echo: false
rm -r resources_test
scripts/sync_resources.sh
The test resources are stored in the resources_test
directory.
That's it! Now you should be able to test whether the existing components work as expected and then start adding functionality to the pipeline. Running an existing component is as simple as running a command in your terminal. Using test data as input, you can try this out immediately.
Run a component
Use the viash run
command to run a Viash component. Everything after the --
separator counts as the arguments of the component itself. In this case, the knn
component has an --input_train
and --input_test
argument to which the test resources are passed.
viash run src/methods/logistic_regression/config.vsh.yaml -- \
--input_train resources_test/task_template/cxg_mouse_pancreas_atlas/train.h5ad \
--input_test resources_test/task_template/cxg_mouse_pancreas_atlas/test.h5ad \
--output output.h5ad
Output
#| echo: false
viash run src/methods/logistic_regression/config.vsh.yaml -- \
--input_train resources_test/task_template/cxg_mouse_pancreas_atlas/train.h5ad \
--input_test resources_test/task_template/cxg_mouse_pancreas_atlas/test.h5ad \
--output output.h5ad
Testing components
Testing components is an important part of the development process.
Each tasks comes with pre-defined unit tests that can be run using the viash test
command.
viash test src/methods/logistic_regression/config.vsh.yaml
Output
#| echo: false
viash test src/methods/logistic_regression/config.vsh.yaml
If you want to run the unit tests for all of the components of a task, you can use the viash ns test
command.
viash ns test --parallel --engine docker
Output
namespace name runner engine test_name exit_code duration result
control_methods true_labels executable docker start
metrics accuracy executable docker start
data_processors process_dataset executable docker start
methods logistic_regression executable docker start
data_processors process_dataset executable docker build_executable 0 2 SUCCESS
data_processors process_dataset executable docker run_and_check_output.py 0 4 SUCCESS
control_methods true_labels executable docker build_executable 0 2 SUCCESS
control_methods true_labels executable docker run_and_check_output.py 0 4 SUCCESS
control_methods true_labels executable docker check_config.py 0 3 SUCCESS
metrics accuracy executable docker build_executable 0 2 SUCCESS
metrics accuracy executable docker run_and_check_output.py 0 5 SUCCESS
metrics accuracy executable docker check_config.py 0 2 SUCCESS
methods logistic_regression executable docker build_executable 0 2 SUCCESS
methods logistic_regression executable docker run_and_check_output.py 0 5 SUCCESS
methods logistic_regression executable docker check_config.py 0 3 SUCCESS
All 11 configs built and tested successfully
Step 6: Update the task config file
Now update the _viash.yaml file in the root of the task repository with the following information:
- Update the
name
field to the name of the task in snake_case the name should start withtask_
. - Add a keyword to the
keywords
field that describes the task. - Update the
task_template
to the name of the task from step 1. - Update the
label
,summary
,description
andreference
fields. - Replace the task_template to the name of the task (this will not yet be of relevance at the start of a task).
- Update the
authors
field with the authors of the task. - Remove all of the comments of the steps you completed.
- High five yourself!
Ready, set, go!
You are now well-equipped to begin designing the API for the new benchmarking task.