graph LR classDef component fill:#decbe4,stroke:#333 classDef anndata fill:#d9d9d9,stroke:#333 loader[/Dataset<br/>loader/]:::component dataset[Dataset]:::anndata method[/Method/]:::component output[Output]:::anndata metric[/Metric/]:::component score[Score]:::anndata loader --> dataset --- method --> output --- metric --> score
Philosophy
OpenProblems is a living benchmarking platform designed to address and measure progress towards open challenges in single-cell genomics. It follows the Common Task Framework (CTF), which has driven innovation in machine learning research by providing clear definitions and quantifications of progress (Donoho 2017).
The platform combines an open GitHub repository with community-defined tasks, an automated benchmarking workflow, and a website for exploring the results. Each task consists of datasets, methods, and metrics (Figure 1). Datasets define the input and ground truth for a task, methods attempt to solve the task, and metrics evaluate the success of a method on a given dataset.
A benchmarking pipeline in OpenProblems consists of AnnData datasets.(Virshup et al. 2021) and Viash components (Cannoodt et al. 2021), which both contribute to consistency and interoperability in OpenProblems by promoting a standardized data format and modular component structure.
AnnData, short for “Annotated Data”, is a file format designed for handling annotated, high-dimensional biological data. In OpenProblems, AnnData serves as the standard data format for both input and output files of components, ensuring a consistent and seamless exchange of data between different components of the benchmarking pipelines.
Viash is a tool that facilitates the creation of modular pipeline components by allowing developers to combine a code block or script with a small amount of metadata. Viash components are used in OpenProblems for dataset loaders, dataset processors, methods, and metrics, enabling developers to focus on the core functionality of their components without worrying about the chosen pipeline framework.
To facilitate seamless community involvement, OpenProblems has designed its infrastructure to take advantage of automated workflows through GitHub Actions, Nextflow, and AWS Batch, as well as the integration of Viash components. When a community member adds a task, dataset, method, or metric, the new contributions are automatically tested in the cloud. Once all tests pass and the new contribution is merged into the main repository, the results from the new contribution are automatically submitted to the OpenProblems website.
.
OpenProblems aims to raise the standards for method selection and evaluation in single-cell data science by offering a platform that quantitatively defines open challenges, determines current state-of-the-art solutions, promotes method development, and monitors progress towards these goals. By leveraging Viash components, the platform ensures consistency, modularity, and interoperability, making it a valuable resource for data analysts, method developers, and the single-cell genomics community at large.